[PATCH 0/4] Page table fence

* [PATCH 0/4] Page table fence
@ 2023-06-01 19:31 Philip Yang
  2023-06-01 19:31 ` [PATCH 1/4] drm/amdgpu: Implement page table BO fence Philip Yang
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Philip Yang @ 2023-06-01 19:31 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling, christian.koenig

This patch series to fix GPU generate random no-retry fault on APU with
XNACK on.

If updating GPU page table to use PDE0 as PTE, for example unmap 2MB
align virtual address, then map same virtual address using transparent
2MB huge page, we free the PTE BO first and then flush TLB.

If XNACK ON, H/W may access the freed old PTE page before TLB is flushed.
On APU, the freed PTE BO system memory page maybe used and the content
is changed, this causes H/W enerates unexpected no-retry fault.

The fix is to add fence to the freed page table BO, and then signal the
fence after TLB is flushed to really free the page table BO page.

Philip Yang (4):
  drm/amdgpu: Implement page table BO fence
  drm/amdkfd: Signal page table fence after KFD flush tlb
  drm/amdgpu: Signal page table fence after gfx vm flush
  drm/amdgpu: Add fence to the freed page table BOs

 drivers/gpu/drm/amd/amdgpu/amdgpu.h       |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 45 +++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c    |  7 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    |  4 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h    |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 33 +++++++++++------
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  5 +++
 7 files changed, 86 insertions(+), 11 deletions(-)

-- 
2.35.1

^ permalink raw reply	[flat|nested] 16+ messages in thread