I have a strange issue. I run a couple of AMD and Nvidia based rigs. The AMD ones both have RX580 cards, all cards are overclocked with (almost) identical parameters and producing a steady ~31.9 MH/s rate.
One of the rigs, though, has oss behavious: in the past it kept getting stuck every 1-2 days, requiring a hard reboot. I changed some OC settings, lowering the hashrate on some cards. Well, it did not go away - but it changed. Not it does not freeze anymore, it… crashes the OS (but mining still works, as reported by the pool). The machine also responds to PING and i can also log into VNC - however no command in RoxTerm works (it says ‘no such command’). When i click on icons on the taskbar, they dissapear (see attachments). If i right click on the desktop, the VNC server shuts down (gracefully). However, mining still works (!!!). Cards are slighly warmer (because of AutoFan not working anymore), but i still get hashes sent to the pool (!!!)… for hours!
The USB watchdog does not reboot the machine, as it still gets pinged (!). I have to go there and manually pull out the plug. Sometimes this error appears afer a couple of hours. I went to the BIOS and disables Intel SpeedStep - now the error manifested itself after 40 minutes. And, what’s even funnier, the crash usualy happens when i log into HiveOS.farm and the website probes the worker - if i do not log on, i noticed it works fine for hours (judging from the pool’s hashrate). Perhaps it crashes a lot earlier than i imagine and i only get to see it when i log into HiveOS, who knows?
Anybody else had such an experience. The Motherboard is a an ASUS B250, running with 13 GPUS and 3 750W power supplies. Almost all parts are new and the GPU’s run quite cool, at ~ 47-51 degrees… fans stay at ~45%.