More
referral
Increase your income with Hive. Invite your friends and earn real cryptocurrency!

Excessive GPU temperature rise. ETH

What kernel are you on? Latest?

hiveos was in the old version, when the temperature increased, I updated it to the latest version, I made the nvidia driver 510.60.02 and I upgraded trex to the latest version.
As soon as I did these, the temperature dropped, but it runs 5-10 degrees higher than before 9:10 am.
It happened again at 23:30, took 10 minutes and fell by itself. Hiveos live support gave the answer above. The report has been submitted to the developer team.

What kernel are you on? Can you post a screenshot of your worker overview screen? If it’s not 110, flash the latest stable image

5.4.0-hiveos #140 in this version

On the 3090 rig, only “5.4.80-hiveos” is written.
On the 3080 rig, it says “5.4.0-hiveos #140”.

So those are both about a year old. First rule when troubleshooting after having any kind of issue is make sure everything is up to date, as there’s a good chance any issue you’re having was already fixed in the last year.

Which version should I install I made a new update.5.10.0-hiveos #110
but it still did not fall to the temperature value before 9:10 yesterday morning. Yesterday morning at 9:10, the temperature is on two different rigs, the same brand and model card, one 3080 and the other 3090 reacts at the same time and the temperature is 80+, it did it again at 23:30 at night, but it took 10 minutes. I upgraded to the kernel version I specified, the stable one in the list.

#110 is what you want. are you running autofan? can you post a screenshot of your current worker overview screen?

The system still runs a little hotter than before 9:10 yesterday, if it doesn’t go above 80+ pointlessly during the day, the problem has improved a bit.
The card I specifically mentioned is GPU 3

There was no such problem until 9:10 yesterday morning in Rig, it was working very comfortably. Even after updating everything after pointlessly 80+ the temperatures did not return to normal.

I am using manual fan setting.

looks like you have a high ambient temp, and almost all your cards need thermal pads upgraded aswell.

if you had 2 separate rigs have temp issues at the same time. im assuming they’re in the same room? and your exhaust/ventilation isnt keeping up and or fan shutting off etc? this isn’t a software issue causing them to heat up, its a heat issue.

Before doing all the updates it was fine until 9:10 am yesterday morning.
The air conditioners are working, the fans are working, I am following the ambient temperature yesterday and the day before that, the same two rigs are in separate rooms and the same brand card in them was 80+ in the same minute.
Only one card in both rigs became 80+.
I made the updates at 16:30, except for the kernel, the temperatures dropped to 20-25 degrees in those two increments, but at 23:30 it became 80+ again. The ambient temperatures, air conditioners and fans work uninterruptedly, there is no problem at all. Now I also did a kernel update 80+ I hope not.

post the power draw graphs and fan speed graphs at the time of the higher temps. if the power draw doesn’t matically increase, or fan speed decrease, its a local temp issue.

Does the local temperature affect only 1 card in two different rigs and I had no problems with this model card until yesterday, no one lives in the world. At 23:30, both cards were 80+, did the local temperature rise and fall back in 10 minutes? Or did it go up at 9:10 just after I did the updates? I guess the update is affecting the local temperature (:

This card was working at 50 degrees before June 16, 9:10. There was such a problem. After 9:10, 80+ worked, I made updates, 60+ started working.

try turning the ac or exhaust vent or whatever you have set off in that room and i bet that graph will look identical again. whichever card is getting the most heat from the others will heat up most.

power draw decreased because of thermal throttling. im gonna vote this isnt a software issue and a local temp issue still.

whats your ambient temp? 60c on the core for my 3090s is 90F or so ambient. is your room that hot with ac on?

There are 3 air conditioners and 4 industrial fans. I can constantly check whether the ambient temperature has changed after the update, the ambient temperature is the same, the air conditioners are working, the fans are working. There is no problem. What is the logic of it going over 80+ and dropping after updating?

the only thing that makes sense is the local temp raised. you cant create more heat with less power and the same amount of fan/cfm.

It’s the same again, my air conditioners are working, the fans are working, but this is snow and the 3080 rig, this brand card works at 80+ ambient temperature is the same although everything is the same, two different rig conditions are exactly the same but meaninglessly 80+


When it heats up, other cards heat up too.
I’ve never seen this card over 55+, it’s been more than 1 year.

looks like a fan or ac isnt working to me. the temps wouldn’t all be higher with the same or lower power draw. have you looked at the rigs in person?

maybe add a box fan to the rig? or add more exhaust to the room. those cards are cooking. you really should be ~10-15c lower on the core and mem temps than you are in all your screenshots