So I have 3 5700’s(all 3 MSI, 2 flashed to XT BIOS) and 1 5700XT. I have had this issue for a while, started off every hour. Through troubleshooting, I’ve gotten it to go days without reporting a GPU dead. I don’t think it’s a single cause. I have found that certain risers are better for certain cards. For instance my 580’s are better with Sata to 6-pin connectors. Some of my cards perform better with a Molex PCIe riser connector. Sometimes switching the power of the riser can fix it. 6-pin to 6-pin will always be the preferred way to power a riser.
It’s possible your OC’s are too much, this one I couldn’t verify though, I would move from 900 to 895 and it would seem like it’s fixed, but then I’d increase it back to 900 and it would still perform like it’s fixed so I can’t verify that.
THE BIGGEST success I had was actually heat. Replacing the thermal pads, adding plastic washers to the back of the heatsink to apply more pressure, and then putting the rig in front of a big box fan in front of a window (it’s pretty cold where I live). Now it runs days on end before a dead GPU.
ELI5: A Combo of exterior/interior temperature of the memory modules, PCIe riser power module, or OC values.