HiveOS freeze and offline suddenly at random time

Hi all,

I have problem with all 3 of my rig…previously its all good and run smoothly but at the last few days, its suddenly offline without any error shows and didnt reboot. And when i see physically the rig is running (all gpu fan is running, cpu fan is running), and when i plugged the monitor its shows that its freezing, cant do anything even with keyboard. Only with hard reset power button it will reboot. And it will repeat freezing again after sometimes.
i have 3 rigs, they have the same issues, they all freezes at some random times, closely at the sametime but the different is only about 30 minutes (example: rig 1 offline at 5.00AM, rig 2 at 5.30AM, and rig 3 at 5:45AM)

yesterday, i found some people solved the issues by updating the hiveOS latest version, i just did the same and its still the same.

rig 1 spec:
6 gpu mixed brand 5700xt
psu 1700w enermax
Motherboard: EX-B250-V7 ASUSTeK COMPUTER INC. (0811 09/26/2017)
CPU: 2 × Intel® Celeron® CPU G3900 @ 2.80GHz AES
HDD: SSD ATA Corsair Force GT 90.0GB
ram: 8GB
hiveOS version: now 0.6-206@210723(still freeze), previously 0.6-206@210712 (freeze)

rig 2 spec:
3 gpu mixed brand vega 56
psu 1000w enermax
Motherboard: B365 Phantom Gaming 4 ASRock
CPU: 2 × Intel® Celeron® CPU G3900 @ 2.80GHz AES
HDD: ATA MidasForce SSD 120GB
ram: 8GB
hiveOS version: now 0.6-206@210723(still freeze), previously 0.6-206@210604 (freeze)

rig 3 spec:
3 gpu mixed brand vega 56
psu 1000w enermax
Motherboard: Z370 TOMAHAWK (MS-7B47)
CPU: 2 × Intel® Celeron® G4900 CPU @ 3.10GHz
HDD: SSD ATA Techbyte 128GB
ram: 8GB
hiveOS version: now 0.6-206@210723(still freeze), previously 0.6-206@210604 (freeze)

anyone could help? or at least does anyone have same problem?
I saw some people post the same issues, but its says because they all using USB drive…im using SSD from the very beginning. but still the same…
i really appreciate any input thank you all…

I highly suggest downgrading to 0.6-203. There are so many issues present on this forum with users using the newer versions. I’ve gave the new version so many tries on my various rigs and they all exhibit functionality problems. Have a read on this forum and many others seem to have the same issue.

i see…i will try to downgrade to the lowest available versions hiveOS with miner that able to mine coin (ERG)

After downgrading, its still occured…so i googling it for globally (previously i just google for hive os freeze). its seems that its happened on Windows too or with anything else…and i found out some topic discuss about riser being bad/hot or hardware failure, so i will try this…if you have same problem, and try to change the riser or howto power up the riser, dont do it randomly, do it in sequence and to all gpu…i will do this now at one of my rig (the most frequent freeze, even 4-5 times a day). but the most strange thing is, at sleeping time all the rig sequencely down almost in the same time (my 2 other rig freeze at 2.00-3.00 am), i will try to use stabilizer for electricity for these issue.

Have you had any luck with stabilizing your rigs? I have just one 6 gpu rig and it’s doing exactly what you’re describing. Majority of the time the issue is it’s frozen and needs a hard power reset (twice a day, often early morning like yours), other times the internet is logged out but the command prompt is still functional. Trying to stabilize this thing, used to run 4 days before crashing now it’s twice a day and I can’t remote reboot it

Edit: going to downgrade to 0.6-203 as suggested above and see how that goes

Hi…sad to hear more people having the same problem…i installed 3000VA stabilizer for each my rig…its reduce the number of freeze but still occured many times (reduced from each hour freeze become each 2 hour)unsatisfied with these i decide to check one of my rig, i unplug all the hardware and checking all the cable, riser, socket and all…i found 1 cable power to VGA that a little bit different in color ( the pin inside the socket should be shining but i found 1 cable that more to black and doff like burn or having too much load) so i change the cable and yes its reduce much become each 4 hour freeze but still freeze and freeze. Sadly after so tired one full day checking all, the next day all freeze coming up again for each hour. Then i try to unplug the RAM at Motherboard, clean the socket and all. What i found is, i cant turn ON the rig back, because the RAM socket, i try to move it to other slot but still cant. Its totally annoying, the rig cant display at all on the monitor its like hanging and cant read the RAM. But i can see that perhaps THIS IS the disease…many people think perhaps riser (this can be right but if it riser, the symptom should be one of your vga sometimes is missing and undetected), and most common people says overclock is wrong (this group is totally newbie) overclock wrong cant make the rig from healthy running to freeze without any symptoms or error.
Yes, in my case its because of the RAM! It seems that RAM is too hot, and cant handle so its freeze the rig totally! even one of my rig, the RAM slot is burn out till the RAM is totally broken (this one is always freezing for each 5 minutes, even after the cable i change only last for one day). So try to check your RAM, the RAM of other 2 of my rig is seems fine but actually got the same symptoms, and i replace the RAM with new (previously is used one) and also choose with high heat resistance. Now its really really almost non of them is freezing, but perhaps later can be occured again, depens on the temperature of your room especially dont forget to chill the motherboard and RAM. even though at HiveOS dashboard monitor RAM is sooo free but it seems the RAM is working hard to die. If you move the RAM to other slot and the rig cant turned ON, beware, the motherboard already got damaged. If you already buy new RAM and cant read by the motherboard, say goodbye to MB

Downgrading didn’t help. Is it possible one of my GPU running hot is causing the system to crash? My founders 6700xt will have its memory get up into 96C with the core around 66C. I’ve heard of 3080’s thermal throttling when they get their memory over 100C, figured I’d see something along those lines before a system crash with the 6700xt running hot. The weird thing is it’s not crashing during hot parts of the day. Probably gonna move that card to the edge of the rig so it’s not drawing in air from the other gpus exhaust and see if that helps.

Glad you figured out your issue, you think it’s the ram on all 3 rigs? Why would they all start failing at once?

yup as i though, downgrading wont help. Because as the log from hiveOS upgrade, nothing help much.
Today my rig is freezing madly even with new RAM, and after i unplug the RAM, replace with old one, the GPU is missing 2 from 4 GPU (i dont even touch the GPU!), i thought is the riser problem, 2 riser replaced then and still the same! undetected with old RAM, and then i tried using new RAM again, now its become 3 detected, 1 still missing! Even with tightening the riser with pcie, didnt give any effect! So finally i tried to switch the pcie (move to other pcie, i have 6 slot pcie and use only 4 pcie) and voila its now all running and better (not freezing even after 4 hour) hopefully this will solve until new MB is coming. My conclusion is, your temperature is the most important to keep! Your GPU mem better not more than 90C, and GPU temp no more at 65C or it will bring totally dead soon! it will be more sad than crash the system! And the system crash(freezing) at my case is caused by motherboard, i thought at first the most important is to keep GPU temp and GPU mem temp, but for all stability, motherboard and RAM can make your rig freeze suddenly if not being taken care well. I previously bought used Motherboard and used RAM, and my room dont have Air Conditioner so most likely after daily running well non stop and they got the limit of the heat, and then become freezing in the same time(Hot Air circulation accumulated perhaps, even though i have 3 exhaust fan 3 high speed fan). Try to move your RAM to other slot at least. It wont harm you.

Hopefully the new motherboard fixes your issue, is it just one rig that needs a swap? Kinda crazy all 3 started dying at the same time, maybe you had a really hot day?

After swapping the gpu to the outside and placing the rig higher into the airflow of my window fan I’m getting 76C memory temp and 48C core, plus everything else is running cooler. Hoping that solves my issue. If it crashes again I’ll swap the ram slot. Thinking my rig would crash anytime that GPU got hotter than 96C, which I suppose is a good thing

