More
referral
Increase your income with Hive. Invite your friends and earn real cryptocurrency!

GPU Driver Error, No Temp

Hey guys!

I can’t stop getting the “GPU Driver Error, No Temp” all the time. Sometimes it works well for about 30h and then this comes up, causing the GPU to be offline.

Changed risers and connectors with another GPU and still getting this.
Reflashed the USB, used another USB.
Changed miner, by now NBMiner is the one causing less problems and better hashrate.
Changed to Gen2 and Gen1 in the BIOS, no solution.
4G is enabled, used to be disabled, no changes.

I’m using 6 RX6600 (non XT).
MOBO Asus Rog Strix B450-F Gaming II
Ryzen 5 3600
RAM Adata 8GB
2 750W PSU Gold. The cards aren’t consuming more than 60W each, so I’m doing more than good on power supply.

Any idea? I’m going crazy! This is all brand new equipment, just got my rig a month ago.

GPU 3 is the one causing the error.

Wager your overclocking settings for at least (1) GPU are a touch high:

Have a bunch of RX6600 series myself, some will NOT run max settings you see shared.

1 Like

It would happen even without any OC

No overclock is default clock and is actually a non-optimal for mining setting.

Which kernel are you running?

I’m currently on 5.10.0-hiveos #83

Increase voltages if those aren’t stable, what miner are you using?

I found out that core voltage and VDDCI do almost nothing in terms of watts when it comes to 620/630 and let’s say 650/650. Check it out - you’ll see the same watts.
Go with core 940 | 670/670/1100 and 940 mem and if it runs fine for 24h try 650/650. If still runs good bump up mem to 950 one by one and keep memory at/below 60 degrees to keep them happy.
I do have like 20 of those cards all brands and “colors”. If you face any troubles #1 it’s your overclock. #2 disconnect one by one and try to see if you need to send one card back that causing you issues.
I had to do that with Gigabyte Eagle which was not going above 27Mh no matter what.

I’ll try, but can’t do MVDD 1100 because it crashes. Micron works fine from 1150 and up.

NBMiner is the one.
Phoenix never worked, it simply restarts.
TRM worked fine for some hours then crashed with GPU Dead.
GMiner and lolMiner same error, and lower MH than NB

I thought about increasing the voltage on that GPU, but since it’s the warmer one that would increase even more the temperature. And since the problem existed even without any voltage or clocks, I don’t think that would help.

I would switch back to trm, and whenever you get the gpu detected dead error you know which card to adjust the ocs on.

I don’t think switching to TRM is smart. It gives me a lower Mh and hits the same error more often than NB.

You don’t want to narrow down your problem gpu to be able to fix the ocs and restore stability? Teamredminer will show you which card is bringing your rig down.

Also, nbminer is known to fluff the local hashrate, so just because it looks better, doesn’t mean it is better at the pool.

I know which card is causing the problem, added it on the post.

Increase voltages on that card until stable. The goal is to have the lowest voltage that is stable, if it’s not stable, you’re likely too low.

1 Like

Okay, here are some updates:

  • Used the command “gpu-fans-find” to detect which GPU was the GPU3.
  • As this one was heating more than the others, I witched spots with the colder spot on the rig. Keeping the risers in place, so I could notice if the problem was the GPU or the risers.
  • First it run well, no problems. Then in a second attempt it came back with the problem (it starts to lower the hashrate on that GPU and suddenly goes off).
  • I pushed the MVDD from 1150 to 1200 while the hashrate was going down, and after the OC update, it came back up. So I think the MVDD was causing it to crash, since this GPU for some reason needs a bit more voltage than the rest.

I will keep you posted after a couple of days.

Drastically depends on your mobo+CPU+bios setup. I just tried to use G540+China mobo and one of the Z390-P on the same rig. A lot of differences in power consumption, stability, etc. We all like to blame OC, raisers, PSU, etc but now I know that some of the CPUs+mobo actually cause the issues.

TRM error samples.

image

1 Like

increase core voltage.