I ran into a similar issue but for me it was the OC.
Certain version of Hive OS would not allow me to OC the certain card pasted a certain point.
Like the 0.6-208-beta@210818 would not allow me to OC my Power Color RX 6600 XT pass 30.2 MH, but it was stable. When i would OC pass that it would run for a while some times hours and then one of the card would loose hashrate and then LA would go high. Ultimately I would loose all GPU. Only a hard reset fixed the issue. Atert the OS i was able to OC to 32.7 MH with the same cards.
I also notice that when I OC some card RX 6600 XT to 32 MH, they would mine for hours and then loose hashrate, then i would loose the whole rig. For those instances I would need to play with the OC a bit until i found one that was not too much for the card even though the OC are witching acceptable limits for the card.
Also my latest find was that sometimes the Hive GUI or notification will ping me that a rig was down When it is not. In the past, would use my remote system to hard reset the rig and get it up again. But I noticed that if the notification claims that the rig is down, as long as i was on the same network i could use shell in a box to remote in and have a look. and about 90% of the time the rig was up and mining. The issue was just a false reading form the GUI or communication with the server.
Hope this helps: if your rig goes down, Assuming that you do not have any hardware related issues.
step 1 - If on the same network, check with shell in a box to see if it is still mining. this works as well if you have a monitor connected to the rig.
Step 2 - check your OC and make sure they are not too much for the card. Even if they are within acceptable limits, lower them a bit to get a more stable run.
Step 3 - Upgrade or downgrade Hive OS version Test both stable and Beta as betta sometimes works better.