Rigs Going Offline & Staying Offline

I’ve been running two rigs for about 7-months now. Each one randomly, at various times, goes offline. I’ve tried everything the “Watchdog” settings have to offer, but the rigs randomly go completely offline every few times in a day, sometimes days apart.

I’ve tried to trick HiveOS into rebooting the rigs immediately when it detects a low hash rate, instead of trying to restart the miner first. This is achieved by the “Watchdog” with the setting “Miner restart after” to i.e. 5-minutes, and the other setting “Reboot rig after” to i.e. 1-minute. This approach works as soon as a low hash rate is achieved for any other reason than the rigs going offline, but when the rigs go offline for some mysterious reason, these settings have no effect on the rigs, and they stay on, wasting hours of electricity and hash rates until you get a chance to manually reboot the rig.

Finally, one would think that the most obvious setting for this issue found in the “Watchdog”, is the setting “Don’t reboot if the internet is lost”, and even by keeping this setting turned OFF (as we DO want the rigs to reboot if they go offline), it still has no effect on the rigs going offline. This setting by itself should do the job, as the rigs actually go OFFLINE but get stuck showing their status as being OFFLINE, and nothing happens until you intervene with a manual reboot.

This has been a big problem for me over the last 7-months, as I’m not always near the rigs to be able to manually intervene to reboot the rigs. I’ve had situations where the rigs had to stay offline for 7-days until I got back to manually restart the rigs. What a complete waste of electricity and hash rates!!

The income loss being experienced is two-folded:

  1. Electricity costs
  2. Mining time lost

Can anyone please advise how this can be resolved? I would greatly appreciate any advice or fix to assist my rigs just to reboot themselves without having to get stuck being offline until manually being rebooted.

Thank you.

Can you post a screenshot of your worker overview tab? Showing all cards/ocs/driver version/miner version/kernel etc etc. typically instability is due to bad oc or failing hardware (risers/cables etc)

Sure. Here’s a screenshot below:

Use the workers overview screen instead of the cards screen, it shows the relevant info all there.

Some recommendations, use teamredminer for the amd cards, nbminer is know to fluff the local hashrate, and isn’t as maintained as other miners.

Update your hive image to get the latest kernel and amd drivers ( run hive-replace -s Inside the shell and follow the prompt, wait for it to reboot)

Use trex for the nvidia card (select add miner in the flight sheet config screen)

Teamredminer will pinpoint problem cards pretty well, and notify you, adjust oc on any problem cards until everything is stable

If you don’t have immediate access to the platform, use : Smart socket

