Hive OS crashing, not rebooting

Sethtrosity · November 19, 2021, 12:10am

Hello I’m currently on OS 0.6-211@211117 and have been having the same issue since 0.6-211@211112.
After some time mining, usually 45min+, my rig will crash and not restart.
There are no GPU errors, and watchdog is not restarting the rig.

Here is the error log:

Nov 18 11:49:32 bloodgulch xinit[2692]: 18/11/2021 11:49:32 rfbListenOnTCP6Port: error in bind IPv6 socket: Address family not supported by protocol
Nov 18 16:54:50 bloodgulch ntfs-3g[556]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 18 16:54:50 bloodgulch ntfs-3g[556]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 18 16:54:50 bloodgulch kernel: [ 3.663971][ T374] EXT4-fs (sda4): re-mounted. Opts: errors=remount-ro,commit=120
Nov 18 16:54:53 bloodgulch ntfs-3g[747]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 18 16:54:53 bloodgulch ntfs-3g[747]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 18 16:55:25 bloodgulch xinit[2699]: #011(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
Nov 18 16:55:27 bloodgulch xinit[2699]: 18/11/2021 16:55:27 errors, etc) it may be disabled:
Nov 18 16:55:27 bloodgulch xinit[2699]: 18/11/2021 16:55:27 errors, etc) it may be disabled via: ‘-noscr’
Nov 18 16:55:27 bloodgulch xinit[2699]: 18/11/2021 16:55:27 rfbListenOnTCP6Port: error in bind IPv6 socket: Address family not supported by protocol

If I downgrade below 211@211112 the rig no longer crashes. I am unable to do so, as I just installed a 6700XT and the older versions do not detect it. Also, this was happening before installing the 6700XT, but I was forced to upgrade.

Anyone have any ideas? I have tried flashing a brand new SSD and still get the same error.

jhawk2002 · November 19, 2021, 2:02am

I’m getting the same crashes and can’t reboot. I thought it was maybe trex miner. I tried downgrading but haven’t ran into the problem again yet. Mine usually happens about every 4-6 hours. If I get the same issue I will have to downgrade hiveos. It sucks bc its almost night night time and won’t be able to catch if if sleeping

Sethtrosity · November 19, 2021, 2:05am

I think it is something in the latest hive builds, as I can downgrade to 0.6-211102 and run the same version of T-Rex (0.24.7) with no issues at all.

Hopefully they can fix it, or let us know what the issue is.

jhawk2002 · November 19, 2021, 2:23am

so in hive shell do I use this command to downgrade? apt-get install -y --allow-downgrades hive=0.6-211102

jhawk2002 · November 19, 2021, 2:25am

The reason I thought it was the miner is because I was getting random reboots on my windows rig as well so I downgraded both miners to 0.24.6

Sethtrosity · November 19, 2021, 2:25am

I just used ‘hive-replace --list’ and chose 0.6-211102

jhawk2002 · November 19, 2021, 2:41am

Thanks went ahead and downgraded hive as well. Hopefully my windows rig stops crashing too

Sethtrosity · November 19, 2021, 9:15pm

Unfortunately, the issue still persists on 0.6-211102. I have updated to the latest version and switched to nbminer. I will see if my rig continues to crash with the same errors.

Sethtrosity · November 27, 2021, 10:34pm

So after days of troubleshooting my rig I finally figured out what the issue was.

kernel: [72580.057791][ T1171] NVRM: GPU at PCI:0000:04:00: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx                                                                                  
kernel: [72580.057793][ T1171] NVRM: Xid (PCI:0000:04:00): 62, pid=1171, 0000(0000) 00000000 00000000                                                                                   
kernel: [72580.093233][ T1171] NVRM: Xid (PCI:0000:04:00): 45, pid=4560, Ch 00000010                                                                                                    
kernel: [72581.503222][   C10] sched: RT throttling activated                                                                                                                           
kernel: [72585.094636][ T1171] NVRM: Xid (PCI:0000:04:00): 45, pid=4560, Ch 00000011                                                                                                    
kernel: [72585.098915][ T1171] NVRM: Xid (PCI:0000:04:00): 45, pid=4560, Ch 00000012                                                                                                    
kernel: [72585.103196][ T1171] NVRM: Xid (PCI:0000:04:00): 45, pid=4560, Ch 00000013                                                                                                    
kernel: [72585.107478][ T1171] NVRM: Xid (PCI:0000:04:00): 45, pid=4560, Ch 00000014                                                                                                    
kernel: [72585.111761][ T1171] NVRM: Xid (PCI:0000:04:00): 45, pid=4560, Ch 00000015                                                                                                    
kernel: [72585.116045][ T1171] NVRM: Xid (PCI:0000:04:00): 45, pid=4560, Ch 00000016                                                                                                    
kernel: [72585.120329][ T1171] NVRM: Xid (PCI:0000:04:00): 45, pid=4560, Ch 00000017realloc(): invalid pointer

My rig was freezing after my 3080 started to thermal throttle. The watchdog didn’t catch it and eventually the rig would just freeze. I solved the issue by setting the correct MH/s value in watchdog for my miner.

My nvidia cards do a combined total of 446 MH/s and I had watchdog set to 400 MH/s.
I changed the value to 440 MH/s for the miner I am running, and it hasn’t froze since.

system · January 18, 2023, 1:34pm

This topic was automatically closed 416 days after the last reply. New replies are no longer allowed.