Hey all, I am trying to stablize my rig but its almost daily that it gets a GPU error. I am trying to understand 1. how can I resolve and 2. is there a way I can get my rig to reboot itself and start back up when the error occurs. I’ve played with some settings but it doesnt work.
My rig is as per below:
hiveOS version: .06-616 (linux)
nvidia drivers: N 510.73.05
And I was getting errors with the auto fan so I turned it off and just set my fans on my 3060, 3070 TI to 70%, and my 3080 because they are running hotter to 99%. I never see a temp over 98 degrees usually but I cant see why its shutting down. It always shows the gpu error on the same card. I tried turning down the OC too but not sure what else to do. Any help with #1 or #2 is appreciated.