More
referral
Increase your income with Hive. Invite your friends and earn real cryptocurrency!

Vega 56 weird crashes

So, I fixed the pcie 1 to 4 splitter so that there is no mix of AMD and Nvidia. In addition I removed all molex cables and changed it to 6pin pcie. This is driving me nuts, these AMD card really seem to hate me…

like i mentioned above, its a power delivery issue, could be psu, cables, riser/connection or the card itself. rule out variables one by one.

I’m trying that, going to swap out psu soon. Now I am testing of its a spesific gpu og a slot

50h uptime now with no restarts, think it might be stable. Lowered overclocks and swapped the cards to different slots

Error seems to occur when the VDD gets to around 900

Dude you alrdy need efficiency… instead OC mb try to UV…
Here are my settings (7th Sep): Mining ERGO with Vega 56 & 64 - #630 by Benthebrewman
Core 1190-1260, VDDC 800-831, Soc 899
you can see everything copy paste em start low shoot for eff and stability.

Could high OC be giving the weird 511C temperatur error? Also I dont care that much for efficient, I just want the best hashrate I can get stable.

No, it’s a power delivery issue

Could the lack of headroom be the issue, as I have eliminated most of the other variables. If so would a good check be to for example just run the amd cards (330w less drawn from each psu when Nvidia cards are offline)?

If you’re overloading the psu causing it to trip/fail that could cause it to stop delivering power.

I doesn’t exactly trip because the rig is able to restart, however I does take at least one other card offline on the same psu when it does. The psu has been tripped a couple times before so that could factor in to the problem. Still waiting for a higher rates psu…

Is that the same psu running the motherboard?

No its not the motherboard psu, however on other occasion where the powerlimit obviously tripped the over current protection the secondary PSU went offline along with all gpus connected to it on reboot. Now when it reboots it is still giving power to the atleast 2 gpues (2 offline after reboot)

So I tried disabling all the Nvidia GPUs leaving a software reported draw of under 800w total, which gives a big amount of headroom. Still the rig is being retarded as there is nothing that should make it crash, basically everything has been checked. When Nvidia cards are placed in the place of AMD there is no problem at all…

Also I don’t get how it can run 24-48h and then suddenly crash, if there is a power delivery issue it should happen much quicker?

So swap psus and see if the issue persists.

Error seems to be oc related, lowered the oc and now there is no issue (so far, 24h+)

It’s definitely a power delivery issue, my guess is your oc is overloading something and causing it to stop getting power. But the only way to figure out exactly what’s wrong is to test and rule out each component.

Unfortunately I’m still waiting for the new PSU to replace the 750w. As of right now all I can test is OC. That will in turn most likely rule out issues with pcie cables and the risers themselves. At this point I think it it is my OC drawing too much or incorrect volts compared to other OC variables and PSU headroom

you have 2 psus, correct?

swap them and see if the issue follows