More
referral
Increase your income with Hive. Invite your friends and earn real cryptocurrency!

Vega 56 weird crashes

I’m getting crashes on some Vega 56 that don’t really make any sense and the logs just make me more confused. What could it be, they were stable for 20 hours?

That’s a power delivery error. Check cables and make sure it’s fully seated in the slot/riser. 9/10 this is from a loose/burnt pcie power cable

Thank for the response, I will have a look at the pcie cables. However I put an Nvidia card on the same slot with the pcie cables and it worked fine, so the problem moved?

It can be a faulty card as well. You have to troubleshoot and narrow down the issue

I have managed to force the error to happen. The rig is built up of 2x 1080s, 2x 1080Tis and 4x Vega 56 which are powered by two PSUs. Each psu has 1x 1080, 1x 1080Ti and 2x Vega 56. When the whole system is running on Ergo it pulls around 80% max power. I tried not running the 2x 1080 and the error doesn’t happen, but the moment I add them to the mix the one of the 2x Vegas crash after not long (always happens on the side powered by the secondary PSU). Also when the rig crashes and restarts the secondary PSU is not giving any power, the whole rig has to physically be turned off and on again to bring both online. Is the over voltage protection related to this?

How much power are you drawing through what size psu? (At the wall, not software)

I don’t have anything to measure that, I only know the total that the software is showing

What size psu? What’s the software power draw showing?

990w before prebuild and with the 1080 disabled. One 850w and one 750w, so there is a bit of headroom which makes little sense as to why there would be any power delivery error.

Try replacing the problem psu with one higher rated

The problem has now moved to a gpu powered by the other PSU… On another note, I’m using molex to power the risers and heard that might be problematic?

Have you solved it yet?
Question do you use " 1 to 4 PCI-Express 16X Slots Riser Card" ?
If so make sure you don’t mix amd and nvidia on them.
I had weird crashes when I mixed them there.
What OC setting you have on the vega cards and whats the software watt consumption, mh/s of each one? HBM temps?
Which psus you have?
Also lowering OC on vegas would be wise (at least testing for stability) since 1 card OCed when gaming can make over 500w power spikes alone.
Still for 24/7 psus doesn’t sound enough for me. Personally I have 5 vegas and 2 3080s on a 1600W superflower on HEAVILY undervolted settings (1000w from the wall… software reporting is 700w)… When I pushed I had 2 psus connected +2000w superflower.
Buying a wall meter of each psu would be very helpful.
Pictures showing how you have connected everything would be useful as well.

I guess I unfortunately have Nvidia and AMD mixed on a pcie riser. So that might be the problem, also I’m going over from molex to six pin. In addition I will be upgrading the 750w psu with a 850w, which will atleast give a tiny bit more headroom. I will try 6 pin first and then try to sort out the pcie splitter. Thank you for your input. I will update when I get things running again

1 Like

So, I fixed the pcie 1 to 4 splitter so that there is no mix of AMD and Nvidia. In addition I removed all molex cables and changed it to 6pin pcie. This is driving me nuts, these AMD card really seem to hate me…

like i mentioned above, its a power delivery issue, could be psu, cables, riser/connection or the card itself. rule out variables one by one.

I’m trying that, going to swap out psu soon. Now I am testing of its a spesific gpu og a slot

50h uptime now with no restarts, think it might be stable. Lowered overclocks and swapped the cards to different slots

Error seems to occur when the VDD gets to around 900

Dude you alrdy need efficiency… instead OC mb try to UV…
Here are my settings (7th Sep): Mining ERGO with Vega 56 & 64 - #630 by Benthebrewman
Core 1190-1260, VDDC 800-831, Soc 899
you can see everything copy paste em start low shoot for eff and stability.

Could high OC be giving the weird 511C temperatur error? Also I dont care that much for efficient, I just want the best hashrate I can get stable.