More
referral
Increase your income with Hive. Invite your friends and earn real cryptocurrency!

RTX 3090 All The Sudden Crashing Then Crashing Miner

Hi. I need some pointers here. I have been dual mining Flux and Zil on a 12 gpu rig for a long time. All the sudden the other day my 3090 stopped mining. If I restart the rig, it will show up with temps and data but the second the miner starts mining, it crashes and still shows up but with no hashrate or temps. Then a minute or two later, the whole miner will crash. I tried replacing the riser, checked to make sure everything is still getting power, created a new flight sheet, wiped the ssd and installed a fresh version of Hiveos, updated the nvidia drivers, and cleaned all the cards but nothing seems to be working. Did my 3090 just go bad? The only thing I haven’t done is taken it out and installed it by itself in another rig. Also, on startup it seems that the kernel says failed when trying to load but hiveos still ends up loading and starting. Maybe this is part of my issue? Let me know your thoughts. I will put the miner log below when it crashes. Thanks!

Oct 30 10:29:24 UB_Used_Rig kernel: [ 167.818077][ C0] NVRM: GPU at PCI:0000:11:00: GPU-7d6c5065-1af3-03ed-7b11-d005e4d73308
Oct 30 10:29:24 UB_Used_Rig kernel: [ 167.818081][ C0] NVRM: GPU Board Serial Number: 1564521013729
Oct 30 10:29:24 UB_Used_Rig kernel: [ 167.818082][ C0] NVRM: Xid (PCI:0000:11:00): 79, pid=’’, name=, GPU has fallen off the bus.
Oct 30 10:29:24 UB_Used_Rig kernel: [ 167.818084][ C0] NVRM: GPU 0000:11:00.0: GPU has fallen off the bus.
Oct 30 10:29:24 UB_Used_Rig kernel: [ 167.818085][ C0] NVRM: GPU 0000:11:00.0: GPU serial number is 1564521013729.
Oct 30 10:29:24 UB_Used_Rig kernel: [ 167.818091][ C0] NVRM: A GPU crash dump has been created. If possible, please run
Oct 30 10:29:24 UB_Used_Rig kernel: [ 167.818091][ C0] NVRM: nvidia-bug-report.sh as root to collect this data before
Oct 30 10:29:24 UB_Used_Rig kernel: [ 167.818091][ C0] NVRM: the NVIDIA kernel module is unloaded.

That error is pretty generic, could be a bunch of things. XID Errors :: GPU Deployment and Management Documentation

Isolate all external variables. If possible try the problem gpu on its own, directly on the motherboard with known good power cables (ones that work on other gpus currently)

My guess is potentially a burnt power cable/splitter

1 Like

Yeah I am guessing the same. I didn’t change the power cables yet. Took the card off and put it on my test bench and it works just fine. I did find a chunk missing from one of the chips on the riser and thought that would have been the issue but no luck. Ill replace the power cables when I have a chance. Thanks for the response!