My rig is composed of 4 5700xt.
It keeps crashing after anywhere from 20 mins to 18 hours. I’ve never got it to run for over 18 hours.
[0mAverage speed (10s): 0.00mh/s | 48.23mh/s | 0.00mh/s | 0.00mh/s Total: 48.23 mh/s [38;2;189;183;107mNew job received: 0xd01ff5 Epoch: 387 Target: 000000006df37f67 [0m[38;2;178;034;034mStuck device detected, invoking emergency script
The real problem : OS logs (repeated over and over)
Jan 10 12:25:23 hive5700XT kernel: [58483.988705] amdgpu: Failed to export SMU metrics table! Jan 10 12:25:28 hive5700XT kernel: [58488.988954] amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
I’ve tried numerous different OC settings, these ones are pretty conservative and have good temps
What I’ve tried so far :
- Update B250 motherboard with this guide
- Change risers, splitters and power cables
- Tried running a single card at stock, crashed after ~20 hours (same OS error log)
- Update Hive OS to 0.6-191@210109
- Switched miners from PhoenixMiner to lolMiner (same OS error log)
Other info :
- Kernel 5.0.21-201105-hiveos
- AMD Driver OpenCL 20.30
I can’t find much information about these error logs, most of the ones that I found online are related to monitor issues, which doesn’t apply to me.
Anybody else encountered this issue?