8 GPU RIG: 2 GPU not working


I am using a RIG ONDA 1800 D8P-D3 V1.00 Onda Technology Corporation (5.6.5 01/16/2018) with 8 Radeon RX 580 8192M · AMD/ATI.
It runs 5.0.21-hiveos.

Problem: GPU0 and GPU1 are not working. Here are the logs:

[ 18.408128] [drm] amdgpu: 8192M of VRAM memory ready
[ 18.408133] [drm] amdgpu: 5901M of GTT memory ready.
[ 18.408205] [drm] GART: num cpu pages 65536, num gpu pages 65536
[ 18.408427] amdgpu 0000:02:00.0: (-22) kernel bo map failed
[ 18.408605] [drm:amdgpu_device_init [amdgpu]] ERROR amdgpu_vram_scratch_init failed -22
[ 18.408615] amdgpu 0000:02:00.0: amdgpu_device_ip_init failed
[ 18.408622] amdgpu 0000:02:00.0: Fatal error during GPU init
[ 18.408629] [drm] amdgpu: finishing device.

Booting back on win10 system all GPU are working.

Any help to solve this issue would be appreciated.

Thank you.

Did you manage to get this card working?

Any luck? I’m having a similar issue. All I did was add a 5th GPU to an existing 4 GPU rig. The reason I think it’s Hive/Bios/Software is that the 4th GPU, which was previously working, is now getting similar errors. No change to riser, pcie port, or anything on that GPU. But only 4 out of 5 GPUs are functional. Errors below.

[ 17.016849] amdgpu: ATOM BIOS: 113-1E366CU-S52
[ 17.016868] [drm] UVD is enabled in VM mode
[ 17.016868] [drm] UVD ENC is enabled in VM mode
[ 17.016871] [drm] VCE enabled in VM mode
[ 17.016885] [drm] GPU posting now…
[ 17.148586] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[ 17.148603] amdgpu 0000:05:00.0: BAR 2: releasing [mem 0x40000000-0x401fffff 64bit pref]
[ 17.148604] amdgpu 0000:05:00.0: BAR 0: releasing [??? 0x00000000 flags 0x0]
[ 17.148692] [drm:amdgpu_device_resize_fb_bar [amdgpu]] ERROR Problem resizing BAR0 (-16).

[ 17.148696] amdgpu 0000:05:00.0: BAR 2: assigned [mem 0x40000000-0x401fffff 64bit pref]
[ 17.149424] amdgpu 0000:05:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[ 17.149426] amdgpu 0000:05:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[ 17.149430] ------------[ cut here ]------------
[ 17.149431] reserve_memtype failed: [mem 0x00000000-0xffffffffffffffff], req write-combining
[ 17.149854] amdgpu 0000:05:00.0: amdgpu: (-22) kernel bo map failed
[ 17.150611] [drm:amdgpu_device_init [amdgpu]] ERROR amdgpu_vram_scratch_init failed -22
[ 17.151306] amdgpu 0000:05:00.0: amdgpu: amdgpu_device_ip_init failed
[ 17.152020] amdgpu 0000:05:00.0: amdgpu: Fatal error during GPU init

Unfortunataly I found no solution yet.
Rebooting in Win10 I have all GPU working (but this is not what I want since it is not stable for a long time).

I am having the same issue. I am new to HiveOS. Started with 1 RX570 8GB and it runs without issue. I just got in 2 more RX580 8GB and both of these load with the error that " [c3h5o9n]" is getting. Does anyone have any idea what this means and how to resolve it?

same error here on an XFX RX580 after swapping in a sixth card on the rig. The card was working great before the reboot. I was about to test in win 10… no solutions to this yet?

when i remove the sus card, the same problem starts happening to the next card down the line. Shows up in Hiveos, fans spin on Post, but once hiveos inits the card, I get the error and theres no feedback on the web GUI for temp/fan speed/etc… yeo @hiveos, can you give your two cents? Seems everyone in this thread has confirmed these are working cards.

is there any solutions ? i do have it in 2 rigs RX 580 and RTX 3080 I chille cards i found that its not for all type of cards, i thinking its an error from hive but for now we need to add more RAM or shift those rigs to windows due unfix from the developer, its very frustrating issue.

I have had this error on Asrock H510 Pro BTC+ motherboard. Only one GPU mined and other detected but weren’t initialized. Solution was to update BIOS to 1.4 version. Hope this will help someone.

