(In reply to Andy Furniss from comment #19)
> (In reply to Andy Furniss from comment #18)
> > So after last update I ran valley again briefly and saw a few vmfaults, then
> > a longer run and got thousands again.
> > 
> > without touching anything else did echo mem >/sys/power/state and then woke
> > up.
> > 
> > 10 minute run of valley has produced zero faults.
> 
> Further test from power off, nothing else running apart from X/fluxox short
> run of valley no faults. Reran valley for a bit longer and got thousands.
> Did memsleep ran valley no faults but after about 10 minutes it hung.

Do you get those GPU faults in the log even when there's no hang? I haven't
checked dmesg while running valley myself, but I do know they always appear
when a hang has happened (I'm using ssh to grab dmesg while it's hung).

Dmesg is sometimes completely filled with GPU faults, other times it's just a
few. I ran it a few minutes ago and only got this:

[ 1737.984328] amdgpu 0000:01:00.0: GPU fault detected: 146 0x08804804
[ 1737.984338] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00100110
[ 1737.984343] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x0A048004
[ 1737.984348] VM fault (0x04, vmid 5) at page 1048848, read from 'TC6'
(0x54433600) (72)
[ 1737.984355] amdgpu 0000:01:00.0: GPU fault detected: 146 0x08804004
[ 1737.984359] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00000000
[ 1737.984363] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x00000000
[ 1737.984366] VM fault (0x00, vmid 0) at page 0, read from '' (0x00000000) (0)
[ 1737.984374] amdgpu 0000:01:00.0: GPU fault detected: 146 0x08800804
[ 1737.984378] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00000000
[ 1737.984381] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x00000000
[ 1737.984384] VM fault (0x00, vmid 0) at page 0, read from '' (0x00000000) (0)