More
referral
Increase your income with Hive. Invite your friends and earn real cryptocurrency!

GPU driver errors and GPUs lost, forcing reboots

Thanks for this dude, this issue has been hounding me for a week i had a few unexpected power interruptions and EVERY time after that i have all these weird nvidia driver issues and my oc’s wont apply.
and the only why i could get it working is to reload the flash drive. but this saved me a lot of time doing that every the drives. so +1 from me
PS: i used shell in a box and only removed my nvidia-oc.conf and autofan.conf and restarted.

cd /hive-config
rm nvidia-oc.conf autofan.conf

Please use the above commands at own risk and understand what you are doing.

4 Likes

I had the same problem.(ERGO, 2miners, t-rex / rig:3060ti,3070ti, 3080)
I solved it by:

  1. Downgrade hiveos.
  2. Downgrade nvidia drivers to stable version.
  3. Many tries to set good OC.
    Now its working fine.
1 Like

Same issue, four 3080. So frustrating. If this works I will praise NEoKhajitt and evandrop to my grandchildren. Will post again. So far stable.

@KryptoMc can you share what stable versions you are using please?

its N 460.91.03

1 Like

what version of hive os? thanks

i updated hiveos to last version and it still stable :slight_smile:

the last version of hive OS 0.6-210@210920 with drivers NVDIA 460.91.03 right?

yes bruh

1 Like

Thanks for this, works fine !!! :ok_hand:

1 Like

Yes the 90.03 is good had zero errors for 2 weeks

Yeah, works for me, too! Big double thumb up :+1: :+1:

1 Like

Hola

Tuve el mismo problema con 4 3070, lo que me sirvió, como indica un comentario, es eliminar el OC, luego probar que los cables usb no sean el problema (como indica otro comentario) encendiendo una tarjeta a la vez, luego dos (tengo dos tarjetas por PSU) y finalmente las 4, pero tenía una tarjeta que me seguía dando fan 0 y error en OC, lo que hice es cambiar la ficha del riser (la que va a la MB) de posición hasta que me tomo el OC y nuevamente está minando como antes

espero que sirva

2 Likes

Hola @eduardogt21 gracias por tu comentario. Me podrías decir/mostrar cómo eliminar el OC? Tu tienes actualizados los drivers de HiveOS y Nvidia a los actuales? Gracias por tu ayuda. Estoy con este problema hace meses y mi RIG solo anoche se rebooteó 7 veces…

Hey @NEoKhajitt can you explain me how you did use those commands? Thanks

This issue is relationship with the overclock settings , ti was happened to me on the RTX 3060 with LHR, After I isolate all the GPU’s I found that this one with the Overclock settings 1600 on core and 2600 on MEM is not stable So after I change the setting to 1450 and 2400 I recover for two days the GPU without issues after that I saw again the issue but it was recovered after the reboot, It is possible that I will need to change again the OC to find the correct settings, at begin of this issue the settings were 0 on Core and 1600 on MEM but I had a power around 120W and a temperatures around 60 grades for one month without any issues, So don’t worries you only need to find the correct the OC for the GPU maybe you will sacrifice hashrate to get stable the GPU



Log2

1 Like

sir would mine explain to me step to step how did use those commands?

Okay let me discuss one by one and I will discuss how to solve it without spending money first

first, it says gpu are lost because you activated watchdog (reboot if gpu is offline), try to disable that

secondly,try to reflash the usb

Lastly try to check all gpu connection/riser its can be also if gpu is old/used

Thats all from me :slight_smile:

Solved it?

This topic was automatically closed 416 days after the last reply. New replies are no longer allowed.