Hi. I need some pointers here. I have been dual mining Flux and Zil on a 12 gpu rig for a long time. All the sudden the other day my 3090 stopped mining. If I restart the rig, it will show up with temps and data but the second the miner starts mining, it crashes and still shows up but with no hashrate or temps. Then a minute or two later, the whole miner will crash. I tried replacing the riser, checked to make sure everything is still getting power, created a new flight sheet, wiped the ssd and installed a fresh version of Hiveos, updated the nvidia drivers, and cleaned all the cards but nothing seems to be working. Did my 3090 just go bad? The only thing I haven’t done is taken it out and installed it by itself in another rig. Also, on startup it seems that the kernel says failed when trying to load but hiveos still ends up loading and starting. Maybe this is part of my issue? Let me know your thoughts. I will put the miner log below when it crashes. Thanks!
Oct 30 10:29:24 UB_Used_Rig kernel: [ 167.818077][ C0] NVRM: GPU at PCI:0000:11:00: GPU-7d6c5065-1af3-03ed-7b11-d005e4d73308
Oct 30 10:29:24 UB_Used_Rig kernel: [ 167.818081][ C0] NVRM: GPU Board Serial Number: 1564521013729
Oct 30 10:29:24 UB_Used_Rig kernel: [ 167.818082][ C0] NVRM: Xid (PCI:0000:11:00): 79, pid=’’, name=, GPU has fallen off the bus.
Oct 30 10:29:24 UB_Used_Rig kernel: [ 167.818084][ C0] NVRM: GPU 0000:11:00.0: GPU has fallen off the bus.
Oct 30 10:29:24 UB_Used_Rig kernel: [ 167.818085][ C0] NVRM: GPU 0000:11:00.0: GPU serial number is 1564521013729.
Oct 30 10:29:24 UB_Used_Rig kernel: [ 167.818091][ C0] NVRM: A GPU crash dump has been created. If possible, please run
Oct 30 10:29:24 UB_Used_Rig kernel: [ 167.818091][ C0] NVRM: nvidia-bug-report.sh as root to collect this data before
Oct 30 10:29:24 UB_Used_Rig kernel: [ 167.818091][ C0] NVRM: the NVIDIA kernel module is unloaded.