Error with GPU, unable to get GPU temperature, GPU is lost

2021.03.10:13:17:09.466: eths Eth: New job #56f3fecd from; diff: 2500MH
2021.03.10:13:17:09.813: hwmc GPU4: unable to get temperature - GPU is lost (15)
2021.03.10:13:17:09.813: hwmc GPU4: unable to get fan speed - GPU is lost (15)
2021.03.10:13:17:09.955: main Eth speed: 190.948 MH/s, shares: 178/0/0, time: 0:41
2021.03.10:13:17:09.955: main GPUs: 1: 21.217 MH/s (32) 2: 21.213 MH/s (18) 3: 21.204 MH/s (16) 4: 0.000 MH/s (17) 5: 21.210 MH/s (13) 6: 21.208 MH/s (16) 7: 21.227 MH/s (19) 8: 21.231 MH/s (20) 9: 21.225 MH/s (11) 10: 21.213 MH/s (16)
2021.03.10:13:17:09.955: main GPU1: 69C 47% 81W, GPU2: 69C 45% 80W, GPU3: 69C 49% 81W, GPU4: N/A, GPU5: 70C 63% 79W, GPU6: 68C 38% 81W, GPU7: 66C 38% 79W, GPU8: 69C 40% 80W, GPU9: 68C 38% 79W, GPU10: 71C 65% 80W
GPUs power: 718.0 W
2021.03.10:13:17:10.536: eths Eth: Send: {“id”:4,“jsonrpc”:“2.0”,“method”:“eth_submitWork”,“params”:[“0xdeb66e637021760a”,“0x56f3fecd6f6af9baad5671b640c37fb6515ebcab70f0fdce8bb16144a4199fee”,“0xa0e442c7440e3446ae3f2ae4fc2c0b2e11719ea7c091fea9cf2fea30ee8769f6”]}

2021.03.10:13:17:10.536: eths Eth: Share actual difficulty: 5648 MH
2021.03.10:13:17:10.536: GPU8 Eth: GPU8: ETH share found!
2021.03.10:13:17:10.580: eths Eth: Received: {“id”:4,“jsonrpc”:“2.0”,“result”:true}
2021.03.10:13:17:10.580: eths Eth: Share accepted in 44 ms
2021.03.10:13:17:10.910: eths Eth: Send: {“id”:5,“jsonrpc”:“2.0”,“method”:“eth_getWork”,“params”:[]}

2021.03.10:13:17:10.944: eths Eth: Received: {“id”:5,“jsonrpc”:“2.0”,“result”:[“0x56f3fecd6f6af9baad5671b640c37fb6515ebcab70f0fdce8bb16144a4199fee”,“0x312b6007e9a1bb08b78dad2afdce476501adb039bc29ea6c27ede523fe87d3d4”,“0x01b7cdfd9d7bdbab7d6ae6881cb5109a365f7e0df99d2255b971b0845d”]}
2021.03.10:13:17:12.914: eths Eth: Received: {“id”:0,“jsonrpc”:“2.0”,“result”:[“0x549b73d83b39452946febb386221a590ecb0e9174b1bc3aa2b7fbd4a0d3b01b1”,“0x312b6007e9a1bb08b78dad2afdce476501adb039bc29ea6c27ede523fe87d3d4”,“0x01b7cdfd9d7bdbab7d6ae6881cb5109a365f7e0df99d2255b971b0845d”]}
2021.03.10:13:17:12.914: eths Eth: New job #549b73d8 from; diff: 2500MH
2021.03.10:13:17:14.966: main Eth speed: 190.943 MH/s, shares: 179/0/0, time: 0:41
2021.03.10:13:17:14.966: main GPUs: 1: 21.217 MH/s (32) 2: 21.214 MH/s (18) 3: 21.205 MH/s (16) 4: 0.000 MH/s (17) 5: 21.209 MH/s (13) 6: 21.209 MH/s (16) 7: 21.227 MH/s (19) 8: 21.225 MH/s (21) 9: 21.226 MH/s (11) 10: 21.213 MH/s (16)
2021.03.10:13:17:15.352: eths Eth: Received: {“id”:0,“jsonrpc”:“2.0”,“result”:[“0x3aaee806535a1d79440006b358ba395626e7f24991269bedd4d69bf42f5bbdb3”,“0x312b6007e9a1bb08b78dad2afdce476501adb039bc29ea6c27ede523fe87d3d4”,“0x01b7cdfd9d7bdbab7d6ae6881cb5109a365f7e0df99d2255b971b0845d”]}
2021.03.10:13:17:15.353: eths Eth: New job #3aaee806 from; diff: 2500MH
2021.03.10:13:17:15.807: wdog GPU4 not responding
2021.03.10:13:17:15.808: wdog Thread(s) not responding. Restarting.

Anyone ran into this issue? What could be causing it? It worked fine for almost an hour and this happens

same, I was able to run my rig for 9 hours then GPU 5 ran into this. I’ve since had to reboot 3x time before just deciding to remove it. :worried:

Yep, been dealing with this for about a month. Sometimes the rig will run for days and sometimes for minutes before it comes up. I have changed risers and still the same issue, mainly on the same card slot but a 2nd one appears sometimes as well.
Last things I have to try is Ethernet cable instead of WiFi, but if that is not the solution I have to blame Hive.
I switched from PhoenixMiner to TRex and that helped for about an hour and it has an identical error message which related back to Hive from what I read on other forums. It has been a nightmare, cutting my mining by nearly 50% as it goes off overnight and won’t fix itself.

I’ve got the same one. I start my rig and everything works fine, then, there is this error, all my computer freeze and I need to turn off everything. Then I turn on and it starts mining again normally. The problem is that I don’t know how to solve it, because, my rig can mine for a few minutes or a few hours or days before this error comes… sometimes it happens when I’m sleeping and my rig stop mining for the whole night… does anyone discovered how to solve it? I’m using a motherboard with 4 pcie + 2 expansor for pcie, so I’m with a total of 10 GPUs. I’m using phoenixminer. Hope someone can help with this issue plz.

I had the same issue. Tried everything. Turned out to be the gpu riser. I have had a plethora of several different issues. 99% of the time I change the gpu riser and it fixes the issue. It’s weird because I buy the best risers I can, but godamn those things are so fragile. Don’t even look at them or they won’t work right. Change your gpu riser and it should fix them.

Just built a new rig, up and running since a week ago w/ Phoenix miner, no issues until Yesterday, GPU 3 (2070) wouldn’t respond error. I’ve tried resetting OC to various settings including defaults, changed miners, changed pools, power recycled multiple times, etc, but no luck. For curiosity, I swapped the motherboard slots (no riser) with GPU 2, and it’s been working so far. I don’t think this is an issue with motherboard or the riser.

I believe this issue to be power related or power cable related. I was pushing it for months using a 1200 watt plus a 750 watt (both server power supplies) and looking at Hive on a 12 GPU system seeing 1500 watts roughly and thinking oh that’s okay. Splitting the power since I’m residential. I swapped cards, the card with the issue worked fine in another spot with another riser. So I switched out the riser from the GPURiser white one to one of the new 8 capacitor white ones from them and things started to work, but then slowly didn’t. Also when I came out of the bathroom I smelled burning plastic, or something burning which I’m not use to smelling. I haven’t fixed this yet but I’m going to try and replace the power cables to the server breakout board, but if that does not work I will remove a 3080 from it and put it on a seperate 3080 rig I was building and see if the problems go away. Then I can pretty much bank that it’s a power issue. (power issue could still mean number of issues). I was running fine on pheonix for months then switched to T-Rex ran great for a week and now this happened. T Rex seemed to get higher speeds and lower power usage on all cards (all 3070s, a couple 3060ti’s and one 3080, 12 gpu in total). I’ll update in a little while.

Can you tell me how you get the log? (Log in to hiveos itself and journalctl?)

From my experience, most of the time, it is related to power supply or OC setting.

