All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Page table fence
@ 2023-06-01 19:31 Philip Yang
  2023-06-01 19:31 ` [PATCH 1/4] drm/amdgpu: Implement page table BO fence Philip Yang
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Philip Yang @ 2023-06-01 19:31 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling, christian.koenig

This patch series to fix GPU generate random no-retry fault on APU with
XNACK on.

If updating GPU page table to use PDE0 as PTE, for example unmap 2MB
align virtual address, then map same virtual address using transparent
2MB huge page, we free the PTE BO first and then flush TLB.

If XNACK ON, H/W may access the freed old PTE page before TLB is flushed.
On APU, the freed PTE BO system memory page maybe used and the content
is changed, this causes H/W enerates unexpected no-retry fault.

The fix is to add fence to the freed page table BO, and then signal the
fence after TLB is flushed to really free the page table BO page.

Philip Yang (4):
  drm/amdgpu: Implement page table BO fence
  drm/amdkfd: Signal page table fence after KFD flush tlb
  drm/amdgpu: Signal page table fence after gfx vm flush
  drm/amdgpu: Add fence to the freed page table BOs

 drivers/gpu/drm/amd/amdgpu/amdgpu.h       |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 45 +++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c    |  7 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    |  4 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h    |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 33 +++++++++++------
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  5 +++
 7 files changed, 86 insertions(+), 11 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] drm/amdgpu: Implement page table BO fence
  2023-06-01 19:31 [PATCH 0/4] Page table fence Philip Yang
@ 2023-06-01 19:31 ` Philip Yang
  2023-06-01 20:12   ` Felix Kuehling
  2023-06-02 11:54   ` Christian König
  2023-06-01 19:31 ` [PATCH 2/4] drm/amdkfd: Signal page table fence after KFD flush tlb Philip Yang
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 16+ messages in thread
From: Philip Yang @ 2023-06-01 19:31 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling, christian.koenig

Add pt_fence to amdgpu vm structure and implement helper functions. This
fence will be shared by all page table BOs of the same amdgpu vm.

Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h       |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 45 +++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    |  4 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h    |  1 +
 4 files changed, 52 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 5af954abd5ba..09c116dfda31 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1499,6 +1499,8 @@ int amdgpu_device_set_cg_state(struct amdgpu_device *adev,
 int amdgpu_device_set_pg_state(struct amdgpu_device *adev,
 			       enum amd_powergating_state state);
 
+struct dma_fence *amdgpu_pt_fence_create(void);
+
 static inline bool amdgpu_device_has_timeouts_enabled(struct amdgpu_device *adev)
 {
 	return amdgpu_gpu_recovery != 0 &&
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index c694b41f6461..d9bfb0af3a09 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -57,6 +57,16 @@ struct amdgpu_fence {
 	ktime_t				start_timestamp;
 };
 
+/*
+ * Page table BO fence
+ * Fence used to ensure page table BOs are freed after TLB is flushed, to avoid
+ * H/W access corrupted data if the freed BO page is reused.
+ */
+struct amdgpu_pt_fence {
+	struct dma_fence base;
+	spinlock_t lock;
+};
+
 static struct kmem_cache *amdgpu_fence_slab;
 
 int amdgpu_fence_slab_init(void)
@@ -79,6 +89,7 @@ void amdgpu_fence_slab_fini(void)
  */
 static const struct dma_fence_ops amdgpu_fence_ops;
 static const struct dma_fence_ops amdgpu_job_fence_ops;
+static const struct dma_fence_ops amdgpu_pt_fence_ops;
 static inline struct amdgpu_fence *to_amdgpu_fence(struct dma_fence *f)
 {
 	struct amdgpu_fence *__f = container_of(f, struct amdgpu_fence, base);
@@ -852,6 +863,40 @@ static const struct dma_fence_ops amdgpu_job_fence_ops = {
 	.release = amdgpu_job_fence_release,
 };
 
+static atomic_t pt_fence_seq = ATOMIC_INIT(0);
+
+struct dma_fence *amdgpu_pt_fence_create(void)
+{
+	struct amdgpu_pt_fence *fence;
+
+	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+	if (!fence)
+		return NULL;
+
+	spin_lock_init(&fence->lock);
+	dma_fence_init(&fence->base, &amdgpu_pt_fence_ops, &fence->lock,
+		       dma_fence_context_alloc(1), atomic_inc_return(&pt_fence_seq));
+
+	dma_fence_get(&fence->base);
+	return &fence->base;
+}
+
+static const char *amdgpu_pt_fence_get_timeline_name(struct dma_fence *f)
+{
+	return "pt_fence_timeline";
+}
+
+static void amdgpu_pt_fence_release(struct dma_fence *f)
+{
+	kfree_rcu(f, rcu);
+}
+
+static const struct dma_fence_ops amdgpu_pt_fence_ops = {
+	.get_driver_name = amdgpu_fence_get_driver_name,
+	.get_timeline_name = amdgpu_pt_fence_get_timeline_name,
+	.release = amdgpu_pt_fence_release,
+};
+
 /*
  * Fence debugfs
  */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 37b9d8a8dbec..0219398e625c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2147,6 +2147,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 		vm->update_funcs = &amdgpu_vm_sdma_funcs;
 
 	vm->last_update = dma_fence_get_stub();
+	vm->pt_fence = amdgpu_pt_fence_create();
 	vm->last_unlocked = dma_fence_get_stub();
 	vm->last_tlb_flush = dma_fence_get_stub();
 	vm->generation = 0;
@@ -2270,6 +2271,8 @@ int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 
 	dma_fence_put(vm->last_update);
 	vm->last_update = dma_fence_get_stub();
+	dma_fence_put(vm->pt_fence);
+	vm->pt_fence = amdgpu_pt_fence_create();
 	vm->is_compute_context = true;
 
 	/* Free the shadow bo for compute VM */
@@ -2358,6 +2361,7 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 	}
 
 	dma_fence_put(vm->last_update);
+	dma_fence_put(vm->pt_fence);
 	for (i = 0; i < AMDGPU_MAX_VMHUBS; i++)
 		amdgpu_vmid_free_reserved(adev, vm, i);
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index d551fca1780e..9cc729358612 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -286,6 +286,7 @@ struct amdgpu_vm {
 	/* contains the page directory */
 	struct amdgpu_vm_bo_base     root;
 	struct dma_fence	*last_update;
+	struct dma_fence	*pt_fence;
 
 	/* Scheduler entities for page table updates */
 	struct drm_sched_entity	immediate;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] drm/amdkfd: Signal page table fence after KFD flush tlb
  2023-06-01 19:31 [PATCH 0/4] Page table fence Philip Yang
  2023-06-01 19:31 ` [PATCH 1/4] drm/amdgpu: Implement page table BO fence Philip Yang
@ 2023-06-01 19:31 ` Philip Yang
  2023-06-01 20:34   ` Felix Kuehling
  2023-06-02 11:57   ` Christian König
  2023-06-01 19:31 ` [PATCH 3/4] drm/amdgpu: Signal page table fence after gfx vm flush Philip Yang
  2023-06-01 19:31 ` [PATCH 4/4] drm/amdgpu: Add fence to the freed page table BOs Philip Yang
  3 siblings, 2 replies; 16+ messages in thread
From: Philip Yang @ 2023-06-01 19:31 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling, christian.koenig

To free page table BOs which are freed when updating page table, for
example PTE BOs when PDE0 used as PTE.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index af0a4b5257cc..0ff007a74d03 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -2101,6 +2101,11 @@ void kfd_flush_tlb(struct kfd_process_device *pdd, enum TLB_FLUSH_TYPE type)
 			amdgpu_amdkfd_flush_gpu_tlb_pasid(
 				dev->adev, pdd->process->pasid, type, xcc);
 	}
+
+	/* Signal page table fence to free page table BOs */
+	dma_fence_signal(vm->pt_fence);
+	dma_fence_put(vm->pt_fence);
+	vm->pt_fence = amdgpu_pt_fence_create();
 }
 
 struct kfd_process_device *kfd_process_device_data_by_id(struct kfd_process *p, uint32_t gpu_id)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] drm/amdgpu: Signal page table fence after gfx vm flush
  2023-06-01 19:31 [PATCH 0/4] Page table fence Philip Yang
  2023-06-01 19:31 ` [PATCH 1/4] drm/amdgpu: Implement page table BO fence Philip Yang
  2023-06-01 19:31 ` [PATCH 2/4] drm/amdkfd: Signal page table fence after KFD flush tlb Philip Yang
@ 2023-06-01 19:31 ` Philip Yang
  2023-06-01 20:37   ` Felix Kuehling
  2023-06-01 19:31 ` [PATCH 4/4] drm/amdgpu: Add fence to the freed page table BOs Philip Yang
  3 siblings, 1 reply; 16+ messages in thread
From: Philip Yang @ 2023-06-01 19:31 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling, christian.koenig

To free page table BOs which are fenced and freed when updating page
table.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index e0d3e3aa2e31..10d63256d26b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -219,6 +219,13 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
 			amdgpu_ring_undo(ring);
 			return r;
 		}
+
+		if (vm) {
+			/* Signal fence to free page table BO */
+			dma_fence_signal(vm->pt_fence);
+			dma_fence_put(vm->pt_fence);
+			vm->pt_fence = amdgpu_pt_fence_create();
+		}
 	}
 
 	amdgpu_ring_ib_begin(ring);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] drm/amdgpu: Add fence to the freed page table BOs
  2023-06-01 19:31 [PATCH 0/4] Page table fence Philip Yang
                   ` (2 preceding siblings ...)
  2023-06-01 19:31 ` [PATCH 3/4] drm/amdgpu: Signal page table fence after gfx vm flush Philip Yang
@ 2023-06-01 19:31 ` Philip Yang
  3 siblings, 0 replies; 16+ messages in thread
From: Philip Yang @ 2023-06-01 19:31 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling, christian.koenig

If updating page table free the page table BOs, add fence to the BOs to
ensure the page table BOs are not freed and reused before TLB is
flushed.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 33 +++++++++++++++--------
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index dea1a64be44d..16eb9472d469 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -633,8 +633,10 @@ static int amdgpu_vm_pt_alloc(struct amdgpu_device *adev,
  * amdgpu_vm_pt_free - free one PD/PT
  *
  * @entry: PDE to free
+ * @fence: fence added the freed page table BO
  */
-static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry)
+static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry,
+			      struct dma_fence *fence)
 {
 	struct amdgpu_bo *shadow;
 
@@ -643,6 +645,9 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry)
 	shadow = amdgpu_bo_shadowed(entry->bo);
 	if (shadow) {
 		ttm_bo_set_bulk_move(&shadow->tbo, NULL);
+		if (fence && !dma_resv_reserve_fences(shadow->tbo.base.resv, 1))
+			dma_resv_add_fence(shadow->tbo.base.resv, fence,
+					   DMA_RESV_USAGE_BOOKKEEP);
 		amdgpu_bo_unref(&shadow);
 	}
 	ttm_bo_set_bulk_move(&entry->bo->tbo, NULL);
@@ -651,6 +656,9 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry)
 	spin_lock(&entry->vm->status_lock);
 	list_del(&entry->vm_status);
 	spin_unlock(&entry->vm->status_lock);
+	if (fence && !dma_resv_reserve_fences(entry->bo->tbo.base.resv, 1))
+		dma_resv_add_fence(entry->bo->tbo.base.resv, fence,
+				   DMA_RESV_USAGE_BOOKKEEP);
 	amdgpu_bo_unref(&entry->bo);
 }
 
@@ -670,7 +678,7 @@ void amdgpu_vm_pt_free_work(struct work_struct *work)
 	amdgpu_bo_reserve(vm->root.bo, true);
 
 	list_for_each_entry_safe(entry, next, &pt_freed, vm_status)
-		amdgpu_vm_pt_free(entry);
+		amdgpu_vm_pt_free(entry, NULL);
 
 	amdgpu_bo_unreserve(vm->root.bo);
 }
@@ -682,13 +690,15 @@ void amdgpu_vm_pt_free_work(struct work_struct *work)
  * @vm: amdgpu vm structure
  * @start: optional cursor where to start freeing PDs/PTs
  * @unlocked: vm resv unlock status
+ * @fence: page table fence added to the freed BOs
  *
  * Free the page directory or page table level and all sub levels.
  */
 static void amdgpu_vm_pt_free_dfs(struct amdgpu_device *adev,
 				  struct amdgpu_vm *vm,
 				  struct amdgpu_vm_pt_cursor *start,
-				  bool unlocked)
+				  bool unlocked,
+				  struct dma_fence *fence)
 {
 	struct amdgpu_vm_pt_cursor cursor;
 	struct amdgpu_vm_bo_base *entry;
@@ -706,10 +716,10 @@ static void amdgpu_vm_pt_free_dfs(struct amdgpu_device *adev,
 	}
 
 	for_each_amdgpu_vm_pt_dfs_safe(adev, vm, start, cursor, entry)
-		amdgpu_vm_pt_free(entry);
+		amdgpu_vm_pt_free(entry, fence);
 
 	if (start)
-		amdgpu_vm_pt_free(start->entry);
+		amdgpu_vm_pt_free(start->entry, fence);
 }
 
 /**
@@ -721,7 +731,7 @@ static void amdgpu_vm_pt_free_dfs(struct amdgpu_device *adev,
  */
 void amdgpu_vm_pt_free_root(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 {
-	amdgpu_vm_pt_free_dfs(adev, vm, NULL, false);
+	amdgpu_vm_pt_free_dfs(adev, vm, NULL, false, NULL);
 }
 
 /**
@@ -905,6 +915,7 @@ int amdgpu_vm_ptes_update(struct amdgpu_vm_update_params *params,
 	struct amdgpu_device *adev = params->adev;
 	struct amdgpu_vm_pt_cursor cursor;
 	uint64_t frag_start = start, frag_end;
+	struct amdgpu_vm *vm = params->vm;
 	unsigned int frag;
 	int r;
 
@@ -913,7 +924,7 @@ int amdgpu_vm_ptes_update(struct amdgpu_vm_update_params *params,
 			       &frag_end);
 
 	/* walk over the address space and update the PTs */
-	amdgpu_vm_pt_start(adev, params->vm, start, &cursor);
+	amdgpu_vm_pt_start(adev, vm, start, &cursor);
 	while (cursor.pfn < end) {
 		unsigned int shift, parent_shift, mask;
 		uint64_t incr, entry_end, pe_start;
@@ -923,7 +934,7 @@ int amdgpu_vm_ptes_update(struct amdgpu_vm_update_params *params,
 			/* make sure that the page tables covering the
 			 * address range are actually allocated
 			 */
-			r = amdgpu_vm_pt_alloc(params->adev, params->vm,
+			r = amdgpu_vm_pt_alloc(params->adev, vm,
 					       &cursor, params->immediate);
 			if (r)
 				return r;
@@ -986,7 +997,6 @@ int amdgpu_vm_ptes_update(struct amdgpu_vm_update_params *params,
 		entry_end = min(entry_end, end);
 
 		do {
-			struct amdgpu_vm *vm = params->vm;
 			uint64_t upd_end = min(entry_end, frag_end);
 			unsigned int nptes = (upd_end - frag_start) >> shift;
 			uint64_t upd_flags = flags | AMDGPU_PTE_FRAG(frag);
@@ -1029,9 +1039,10 @@ int amdgpu_vm_ptes_update(struct amdgpu_vm_update_params *params,
 				/* Make sure previous mapping is freed */
 				if (cursor.entry->bo) {
 					params->table_freed = true;
-					amdgpu_vm_pt_free_dfs(adev, params->vm,
+					amdgpu_vm_pt_free_dfs(adev, vm,
 							      &cursor,
-							      params->unlocked);
+							      params->unlocked,
+							      vm->pt_fence);
 				}
 				amdgpu_vm_pt_next(adev, &cursor);
 			}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] drm/amdgpu: Implement page table BO fence
  2023-06-01 19:31 ` [PATCH 1/4] drm/amdgpu: Implement page table BO fence Philip Yang
@ 2023-06-01 20:12   ` Felix Kuehling
  2023-06-02 11:54   ` Christian König
  1 sibling, 0 replies; 16+ messages in thread
From: Felix Kuehling @ 2023-06-01 20:12 UTC (permalink / raw)
  To: Philip Yang, amd-gfx; +Cc: christian.koenig

On 2023-06-01 15:31, Philip Yang wrote:
> Add pt_fence to amdgpu vm structure and implement helper functions. This
> fence will be shared by all page table BOs of the same amdgpu vm.
>
> Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h       |  2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 45 +++++++++++++++++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    |  4 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h    |  1 +
>   4 files changed, 52 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 5af954abd5ba..09c116dfda31 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1499,6 +1499,8 @@ int amdgpu_device_set_cg_state(struct amdgpu_device *adev,
>   int amdgpu_device_set_pg_state(struct amdgpu_device *adev,
>   			       enum amd_powergating_state state);
>   
> +struct dma_fence *amdgpu_pt_fence_create(void);
> +
>   static inline bool amdgpu_device_has_timeouts_enabled(struct amdgpu_device *adev)
>   {
>   	return amdgpu_gpu_recovery != 0 &&
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index c694b41f6461..d9bfb0af3a09 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -57,6 +57,16 @@ struct amdgpu_fence {
>   	ktime_t				start_timestamp;
>   };
>   
> +/*
> + * Page table BO fence
> + * Fence used to ensure page table BOs are freed after TLB is flushed, to avoid
> + * H/W access corrupted data if the freed BO page is reused.
> + */
> +struct amdgpu_pt_fence {
> +	struct dma_fence base;
> +	spinlock_t lock;
> +};
> +
>   static struct kmem_cache *amdgpu_fence_slab;
>   
>   int amdgpu_fence_slab_init(void)
> @@ -79,6 +89,7 @@ void amdgpu_fence_slab_fini(void)
>    */
>   static const struct dma_fence_ops amdgpu_fence_ops;
>   static const struct dma_fence_ops amdgpu_job_fence_ops;
> +static const struct dma_fence_ops amdgpu_pt_fence_ops;
>   static inline struct amdgpu_fence *to_amdgpu_fence(struct dma_fence *f)
>   {
>   	struct amdgpu_fence *__f = container_of(f, struct amdgpu_fence, base);
> @@ -852,6 +863,40 @@ static const struct dma_fence_ops amdgpu_job_fence_ops = {
>   	.release = amdgpu_job_fence_release,
>   };
>   
> +static atomic_t pt_fence_seq = ATOMIC_INIT(0);
> +
> +struct dma_fence *amdgpu_pt_fence_create(void)
> +{
> +	struct amdgpu_pt_fence *fence;
> +
> +	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
> +	if (!fence)
> +		return NULL;
> +
> +	spin_lock_init(&fence->lock);
> +	dma_fence_init(&fence->base, &amdgpu_pt_fence_ops, &fence->lock,
> +		       dma_fence_context_alloc(1), atomic_inc_return(&pt_fence_seq));

Creating a new fence context per fence is probably wrong. I think we 
only need one PT fence context per GPU or per spatial partition, or 
maybe per VM.

Regards,
   Felix


> +
> +	dma_fence_get(&fence->base);
> +	return &fence->base;
> +}
> +
> +static const char *amdgpu_pt_fence_get_timeline_name(struct dma_fence *f)
> +{
> +	return "pt_fence_timeline";
> +}
> +
> +static void amdgpu_pt_fence_release(struct dma_fence *f)
> +{
> +	kfree_rcu(f, rcu);
> +}
> +
> +static const struct dma_fence_ops amdgpu_pt_fence_ops = {
> +	.get_driver_name = amdgpu_fence_get_driver_name,
> +	.get_timeline_name = amdgpu_pt_fence_get_timeline_name,
> +	.release = amdgpu_pt_fence_release,
> +};
> +
>   /*
>    * Fence debugfs
>    */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 37b9d8a8dbec..0219398e625c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2147,6 +2147,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>   		vm->update_funcs = &amdgpu_vm_sdma_funcs;
>   
>   	vm->last_update = dma_fence_get_stub();
> +	vm->pt_fence = amdgpu_pt_fence_create();
>   	vm->last_unlocked = dma_fence_get_stub();
>   	vm->last_tlb_flush = dma_fence_get_stub();
>   	vm->generation = 0;
> @@ -2270,6 +2271,8 @@ int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>   
>   	dma_fence_put(vm->last_update);
>   	vm->last_update = dma_fence_get_stub();
> +	dma_fence_put(vm->pt_fence);
> +	vm->pt_fence = amdgpu_pt_fence_create();
>   	vm->is_compute_context = true;
>   
>   	/* Free the shadow bo for compute VM */
> @@ -2358,6 +2361,7 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>   	}
>   
>   	dma_fence_put(vm->last_update);
> +	dma_fence_put(vm->pt_fence);
>   	for (i = 0; i < AMDGPU_MAX_VMHUBS; i++)
>   		amdgpu_vmid_free_reserved(adev, vm, i);
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index d551fca1780e..9cc729358612 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -286,6 +286,7 @@ struct amdgpu_vm {
>   	/* contains the page directory */
>   	struct amdgpu_vm_bo_base     root;
>   	struct dma_fence	*last_update;
> +	struct dma_fence	*pt_fence;
>   
>   	/* Scheduler entities for page table updates */
>   	struct drm_sched_entity	immediate;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] drm/amdkfd: Signal page table fence after KFD flush tlb
  2023-06-01 19:31 ` [PATCH 2/4] drm/amdkfd: Signal page table fence after KFD flush tlb Philip Yang
@ 2023-06-01 20:34   ` Felix Kuehling
  2023-06-02 11:57   ` Christian König
  1 sibling, 0 replies; 16+ messages in thread
From: Felix Kuehling @ 2023-06-01 20:34 UTC (permalink / raw)
  To: Philip Yang, amd-gfx; +Cc: christian.koenig


On 2023-06-01 15:31, Philip Yang wrote:
> To free page table BOs which are freed when updating page table, for
> example PTE BOs when PDE0 used as PTE.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 +++++
>   1 file changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index af0a4b5257cc..0ff007a74d03 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -2101,6 +2101,11 @@ void kfd_flush_tlb(struct kfd_process_device *pdd, enum TLB_FLUSH_TYPE type)
>   			amdgpu_amdkfd_flush_gpu_tlb_pasid(
>   				dev->adev, pdd->process->pasid, type, xcc);
>   	}
> +
> +	/* Signal page table fence to free page table BOs */
> +	dma_fence_signal(vm->pt_fence);
> +	dma_fence_put(vm->pt_fence);
> +	vm->pt_fence = amdgpu_pt_fence_create();

This is a bit weird to create a fence here when we have no idea when it 
will be signaled or whether it will be signaled at all. But I think it's 
OK in this case. You add this fence to PT BOs before they get freed, and 
at that point those BOs must wait for the next TLB flush before the BO 
can be freed.

The only important thing is that fences in the same context get signaled 
in the same order they are created. I think if you allocate a fence 
context per VM, that should be guaranteed by the VM root reservation 
lock that serializes all page table operations in the same VM.

You may need some locking here to prevent concurrent access to the fence 
while you're signaling and replacing it.

Regards,
   Felix


>   }
>   
>   struct kfd_process_device *kfd_process_device_data_by_id(struct kfd_process *p, uint32_t gpu_id)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] drm/amdgpu: Signal page table fence after gfx vm flush
  2023-06-01 19:31 ` [PATCH 3/4] drm/amdgpu: Signal page table fence after gfx vm flush Philip Yang
@ 2023-06-01 20:37   ` Felix Kuehling
  0 siblings, 0 replies; 16+ messages in thread
From: Felix Kuehling @ 2023-06-01 20:37 UTC (permalink / raw)
  To: Philip Yang, amd-gfx; +Cc: christian.koenig


On 2023-06-01 15:31, Philip Yang wrote:
> To free page table BOs which are fenced and freed when updating page
> table.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 7 +++++++
>   1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index e0d3e3aa2e31..10d63256d26b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -219,6 +219,13 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
>   			amdgpu_ring_undo(ring);
>   			return r;
>   		}
> +
> +		if (vm) {
> +			/* Signal fence to free page table BO */
> +			dma_fence_signal(vm->pt_fence);
> +			dma_fence_put(vm->pt_fence);
> +			vm->pt_fence = amdgpu_pt_fence_create();
> +		}

I think this is too early. The TLB flush is not done at this point, it's 
only been emitted to the ring but not executed yet. You probably need to 
signal the PT fence in a fence callback from the fence "f" that signals 
when the IB completes.

Regards,
   Felix


>   	}
>   
>   	amdgpu_ring_ib_begin(ring);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] drm/amdgpu: Implement page table BO fence
  2023-06-01 19:31 ` [PATCH 1/4] drm/amdgpu: Implement page table BO fence Philip Yang
  2023-06-01 20:12   ` Felix Kuehling
@ 2023-06-02 11:54   ` Christian König
  1 sibling, 0 replies; 16+ messages in thread
From: Christian König @ 2023-06-02 11:54 UTC (permalink / raw)
  To: Philip Yang, amd-gfx; +Cc: Felix.Kuehling

Am 01.06.23 um 21:31 schrieb Philip Yang:
> Add pt_fence to amdgpu vm structure and implement helper functions. This
> fence will be shared by all page table BOs of the same amdgpu vm.

Well first of all please don't add anything to amdgpu.h or 
amdgpu_fence.* This should probably all go into a new file.

Then that approach here won't work like this, but I'm going to comment 
on the other patches.

Christian.

>
> Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h       |  2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 45 +++++++++++++++++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    |  4 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h    |  1 +
>   4 files changed, 52 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 5af954abd5ba..09c116dfda31 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1499,6 +1499,8 @@ int amdgpu_device_set_cg_state(struct amdgpu_device *adev,
>   int amdgpu_device_set_pg_state(struct amdgpu_device *adev,
>   			       enum amd_powergating_state state);
>   
> +struct dma_fence *amdgpu_pt_fence_create(void);
> +
>   static inline bool amdgpu_device_has_timeouts_enabled(struct amdgpu_device *adev)
>   {
>   	return amdgpu_gpu_recovery != 0 &&
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index c694b41f6461..d9bfb0af3a09 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -57,6 +57,16 @@ struct amdgpu_fence {
>   	ktime_t				start_timestamp;
>   };
>   
> +/*
> + * Page table BO fence
> + * Fence used to ensure page table BOs are freed after TLB is flushed, to avoid
> + * H/W access corrupted data if the freed BO page is reused.
> + */
> +struct amdgpu_pt_fence {
> +	struct dma_fence base;
> +	spinlock_t lock;
> +};
> +
>   static struct kmem_cache *amdgpu_fence_slab;
>   
>   int amdgpu_fence_slab_init(void)
> @@ -79,6 +89,7 @@ void amdgpu_fence_slab_fini(void)
>    */
>   static const struct dma_fence_ops amdgpu_fence_ops;
>   static const struct dma_fence_ops amdgpu_job_fence_ops;
> +static const struct dma_fence_ops amdgpu_pt_fence_ops;
>   static inline struct amdgpu_fence *to_amdgpu_fence(struct dma_fence *f)
>   {
>   	struct amdgpu_fence *__f = container_of(f, struct amdgpu_fence, base);
> @@ -852,6 +863,40 @@ static const struct dma_fence_ops amdgpu_job_fence_ops = {
>   	.release = amdgpu_job_fence_release,
>   };
>   
> +static atomic_t pt_fence_seq = ATOMIC_INIT(0);
> +
> +struct dma_fence *amdgpu_pt_fence_create(void)
> +{
> +	struct amdgpu_pt_fence *fence;
> +
> +	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
> +	if (!fence)
> +		return NULL;
> +
> +	spin_lock_init(&fence->lock);
> +	dma_fence_init(&fence->base, &amdgpu_pt_fence_ops, &fence->lock,
> +		       dma_fence_context_alloc(1), atomic_inc_return(&pt_fence_seq));
> +
> +	dma_fence_get(&fence->base);
> +	return &fence->base;
> +}
> +
> +static const char *amdgpu_pt_fence_get_timeline_name(struct dma_fence *f)
> +{
> +	return "pt_fence_timeline";
> +}
> +
> +static void amdgpu_pt_fence_release(struct dma_fence *f)
> +{
> +	kfree_rcu(f, rcu);
> +}
> +
> +static const struct dma_fence_ops amdgpu_pt_fence_ops = {
> +	.get_driver_name = amdgpu_fence_get_driver_name,
> +	.get_timeline_name = amdgpu_pt_fence_get_timeline_name,
> +	.release = amdgpu_pt_fence_release,
> +};
> +
>   /*
>    * Fence debugfs
>    */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 37b9d8a8dbec..0219398e625c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2147,6 +2147,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>   		vm->update_funcs = &amdgpu_vm_sdma_funcs;
>   
>   	vm->last_update = dma_fence_get_stub();
> +	vm->pt_fence = amdgpu_pt_fence_create();
>   	vm->last_unlocked = dma_fence_get_stub();
>   	vm->last_tlb_flush = dma_fence_get_stub();
>   	vm->generation = 0;
> @@ -2270,6 +2271,8 @@ int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>   
>   	dma_fence_put(vm->last_update);
>   	vm->last_update = dma_fence_get_stub();
> +	dma_fence_put(vm->pt_fence);
> +	vm->pt_fence = amdgpu_pt_fence_create();
>   	vm->is_compute_context = true;
>   
>   	/* Free the shadow bo for compute VM */
> @@ -2358,6 +2361,7 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>   	}
>   
>   	dma_fence_put(vm->last_update);
> +	dma_fence_put(vm->pt_fence);
>   	for (i = 0; i < AMDGPU_MAX_VMHUBS; i++)
>   		amdgpu_vmid_free_reserved(adev, vm, i);
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index d551fca1780e..9cc729358612 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -286,6 +286,7 @@ struct amdgpu_vm {
>   	/* contains the page directory */
>   	struct amdgpu_vm_bo_base     root;
>   	struct dma_fence	*last_update;
> +	struct dma_fence	*pt_fence;
>   
>   	/* Scheduler entities for page table updates */
>   	struct drm_sched_entity	immediate;


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] drm/amdkfd: Signal page table fence after KFD flush tlb
  2023-06-01 19:31 ` [PATCH 2/4] drm/amdkfd: Signal page table fence after KFD flush tlb Philip Yang
  2023-06-01 20:34   ` Felix Kuehling
@ 2023-06-02 11:57   ` Christian König
  2023-06-02 14:54     ` Felix Kuehling
  1 sibling, 1 reply; 16+ messages in thread
From: Christian König @ 2023-06-02 11:57 UTC (permalink / raw)
  To: Philip Yang, amd-gfx, Somalapuram, Amaranath, Sharma, Shashank
  Cc: Felix.Kuehling

Am 01.06.23 um 21:31 schrieb Philip Yang:
> To free page table BOs which are freed when updating page table, for
> example PTE BOs when PDE0 used as PTE.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 +++++
>   1 file changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index af0a4b5257cc..0ff007a74d03 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -2101,6 +2101,11 @@ void kfd_flush_tlb(struct kfd_process_device *pdd, enum TLB_FLUSH_TYPE type)
>   			amdgpu_amdkfd_flush_gpu_tlb_pasid(
>   				dev->adev, pdd->process->pasid, type, xcc);
>   	}
> +
> +	/* Signal page table fence to free page table BOs */
> +	dma_fence_signal(vm->pt_fence);

That's not something you can do here.

Signaling a fence can never depend on anything except for hardware work. 
In other words dma_fence_signal() is supposed to be called only from 
interrupt context!

What we can to is to put the TLB flushing into an irq worker or work 
item and let the signaling happen from there.

Amar and Shashank are already working on this, I strongly suggest to 
sync up with them.

Regards,
Christian.

> +	dma_fence_put(vm->pt_fence);
> +	vm->pt_fence = amdgpu_pt_fence_create();
>   }
>   
>   struct kfd_process_device *kfd_process_device_data_by_id(struct kfd_process *p, uint32_t gpu_id)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] drm/amdkfd: Signal page table fence after KFD flush tlb
  2023-06-02 11:57   ` Christian König
@ 2023-06-02 14:54     ` Felix Kuehling
  2023-06-05 15:13       ` Shashank Sharma
  0 siblings, 1 reply; 16+ messages in thread
From: Felix Kuehling @ 2023-06-02 14:54 UTC (permalink / raw)
  To: Christian König, Philip Yang, amd-gfx, Somalapuram,
	Amaranath, Sharma, Shashank

Am 2023-06-02 um 07:57 schrieb Christian König:
> Am 01.06.23 um 21:31 schrieb Philip Yang:
>> To free page table BOs which are freed when updating page table, for
>> example PTE BOs when PDE0 used as PTE.
>>
>> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> index af0a4b5257cc..0ff007a74d03 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> @@ -2101,6 +2101,11 @@ void kfd_flush_tlb(struct kfd_process_device 
>> *pdd, enum TLB_FLUSH_TYPE type)
>>               amdgpu_amdkfd_flush_gpu_tlb_pasid(
>>                   dev->adev, pdd->process->pasid, type, xcc);
>>       }
>> +
>> +    /* Signal page table fence to free page table BOs */
>> +    dma_fence_signal(vm->pt_fence);
>
> That's not something you can do here.
>
> Signaling a fence can never depend on anything except for hardware 
> work. In other words dma_fence_signal() is supposed to be called only 
> from interrupt context!

We are signaling eviction fences from normal user context, too. There is 
no practical way to put this into an interrupt handler when the TLB 
flush is being done synchronously on a user thread. We have to do this 
in such a context for user mode queues.

Regards,
   Felix


>
> What we can to is to put the TLB flushing into an irq worker or work 
> item and let the signaling happen from there.
>
> Amar and Shashank are already working on this, I strongly suggest to 
> sync up with them.
>
> Regards,
> Christian.
>
>> +    dma_fence_put(vm->pt_fence);
>> +    vm->pt_fence = amdgpu_pt_fence_create();
>>   }
>>     struct kfd_process_device *kfd_process_device_data_by_id(struct 
>> kfd_process *p, uint32_t gpu_id)
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] drm/amdkfd: Signal page table fence after KFD flush tlb
  2023-06-02 14:54     ` Felix Kuehling
@ 2023-06-05 15:13       ` Shashank Sharma
  2023-06-05 15:18         ` Christian König
  0 siblings, 1 reply; 16+ messages in thread
From: Shashank Sharma @ 2023-06-05 15:13 UTC (permalink / raw)
  To: Felix Kuehling, Christian König, Philip Yang, amd-gfx,
	Somalapuram, Amaranath


On 02/06/2023 16:54, Felix Kuehling wrote:
> Am 2023-06-02 um 07:57 schrieb Christian König:
>> Am 01.06.23 um 21:31 schrieb Philip Yang:
>>> To free page table BOs which are freed when updating page table, for
>>> example PTE BOs when PDE0 used as PTE.
>>>
>>> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 +++++
>>>   1 file changed, 5 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> index af0a4b5257cc..0ff007a74d03 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> @@ -2101,6 +2101,11 @@ void kfd_flush_tlb(struct kfd_process_device 
>>> *pdd, enum TLB_FLUSH_TYPE type)
>>>               amdgpu_amdkfd_flush_gpu_tlb_pasid(
>>>                   dev->adev, pdd->process->pasid, type, xcc);
>>>       }
>>> +
>>> +    /* Signal page table fence to free page table BOs */
>>> +    dma_fence_signal(vm->pt_fence);
>>
>> That's not something you can do here.
>>
>> Signaling a fence can never depend on anything except for hardware 
>> work. In other words dma_fence_signal() is supposed to be called only 
>> from interrupt context!
>
> We are signaling eviction fences from normal user context, too. There 
> is no practical way to put this into an interrupt handler when the TLB 
> flush is being done synchronously on a user thread. We have to do this 
> in such a context for user mode queues.


We are currently working on adding a provide a high level kernel API 
which can be called directly to perform a TLB flush. Internally this API 
will add a deferred work to use the SDMA engine to perform a GPU TLB 
flush work (to compensate for a HW bug in Navi Chips). If my 
understanding is right, by interrupt context Christian means to perform 
this flush and signal from that differed work, is that so @Christian ?

- Shashank

>
> Regards,
>   Felix
>
>
>>
>> What we can to is to put the TLB flushing into an irq worker or work 
>> item and let the signaling happen from there.
>>
>> Amar and Shashank are already working on this, I strongly suggest to 
>> sync up with them.
>>
>> Regards,
>> Christian.
>>
>>> +    dma_fence_put(vm->pt_fence);
>>> +    vm->pt_fence = amdgpu_pt_fence_create();
>>>   }
>>>     struct kfd_process_device *kfd_process_device_data_by_id(struct 
>>> kfd_process *p, uint32_t gpu_id)
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] drm/amdkfd: Signal page table fence after KFD flush tlb
  2023-06-05 15:13       ` Shashank Sharma
@ 2023-06-05 15:18         ` Christian König
  2023-06-05 15:40           ` Shashank Sharma
  0 siblings, 1 reply; 16+ messages in thread
From: Christian König @ 2023-06-05 15:18 UTC (permalink / raw)
  To: Shashank Sharma, Felix Kuehling, Philip Yang, amd-gfx,
	Somalapuram, Amaranath

Am 05.06.23 um 17:13 schrieb Shashank Sharma:
>
> On 02/06/2023 16:54, Felix Kuehling wrote:
>> Am 2023-06-02 um 07:57 schrieb Christian König:
>>> Am 01.06.23 um 21:31 schrieb Philip Yang:
>>>> To free page table BOs which are freed when updating page table, for
>>>> example PTE BOs when PDE0 used as PTE.
>>>>
>>>> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 +++++
>>>>   1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> index af0a4b5257cc..0ff007a74d03 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> @@ -2101,6 +2101,11 @@ void kfd_flush_tlb(struct kfd_process_device 
>>>> *pdd, enum TLB_FLUSH_TYPE type)
>>>>               amdgpu_amdkfd_flush_gpu_tlb_pasid(
>>>>                   dev->adev, pdd->process->pasid, type, xcc);
>>>>       }
>>>> +
>>>> +    /* Signal page table fence to free page table BOs */
>>>> +    dma_fence_signal(vm->pt_fence);
>>>
>>> That's not something you can do here.
>>>
>>> Signaling a fence can never depend on anything except for hardware 
>>> work. In other words dma_fence_signal() is supposed to be called 
>>> only from interrupt context!
>>
>> We are signaling eviction fences from normal user context, too. There 
>> is no practical way to put this into an interrupt handler when the 
>> TLB flush is being done synchronously on a user thread. We have to do 
>> this in such a context for user mode queues.
>
>
> We are currently working on adding a provide a high level kernel API 
> which can be called directly to perform a TLB flush. Internally this 
> API will add a deferred work to use the SDMA engine to perform a GPU 
> TLB flush work (to compensate for a HW bug in Navi Chips). If my 
> understanding is right, by interrupt context Christian means to 
> perform this flush and signal from that differed work, is that so 
> @Christian ?

Well more or less. Ideally you put the TLB flush in a work item (or use 
the SDMA for the hw bug workaround on Navi 1x).

The point is that you shouldn't have it in the same execution thread as 
the VM page table updates, because any memory allocation or grabbing a 
lock could potentially depend on the TLB flush as soon as you have 
published the dma_fence (by adding it to the VM page table BOs for example).

Christian.

>
> - Shashank
>
>>
>> Regards,
>>   Felix
>>
>>
>>>
>>> What we can to is to put the TLB flushing into an irq worker or work 
>>> item and let the signaling happen from there.
>>>
>>> Amar and Shashank are already working on this, I strongly suggest to 
>>> sync up with them.
>>>
>>> Regards,
>>> Christian.
>>>
>>>> +    dma_fence_put(vm->pt_fence);
>>>> +    vm->pt_fence = amdgpu_pt_fence_create();
>>>>   }
>>>>     struct kfd_process_device *kfd_process_device_data_by_id(struct 
>>>> kfd_process *p, uint32_t gpu_id)
>>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] drm/amdkfd: Signal page table fence after KFD flush tlb
  2023-06-05 15:18         ` Christian König
@ 2023-06-05 15:40           ` Shashank Sharma
  2023-06-05 15:47             ` Christian König
  0 siblings, 1 reply; 16+ messages in thread
From: Shashank Sharma @ 2023-06-05 15:40 UTC (permalink / raw)
  To: Christian König, Felix Kuehling, Philip Yang, amd-gfx,
	Somalapuram, Amaranath


On 05/06/2023 17:18, Christian König wrote:
> Am 05.06.23 um 17:13 schrieb Shashank Sharma:
>>
>> On 02/06/2023 16:54, Felix Kuehling wrote:
>>> Am 2023-06-02 um 07:57 schrieb Christian König:
>>>> Am 01.06.23 um 21:31 schrieb Philip Yang:
>>>>> To free page table BOs which are freed when updating page table, for
>>>>> example PTE BOs when PDE0 used as PTE.
>>>>>
>>>>> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 +++++
>>>>>   1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>> index af0a4b5257cc..0ff007a74d03 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>> @@ -2101,6 +2101,11 @@ void kfd_flush_tlb(struct 
>>>>> kfd_process_device *pdd, enum TLB_FLUSH_TYPE type)
>>>>>               amdgpu_amdkfd_flush_gpu_tlb_pasid(
>>>>>                   dev->adev, pdd->process->pasid, type, xcc);
>>>>>       }
>>>>> +
>>>>> +    /* Signal page table fence to free page table BOs */
>>>>> +    dma_fence_signal(vm->pt_fence);
>>>>
>>>> That's not something you can do here.
>>>>
>>>> Signaling a fence can never depend on anything except for hardware 
>>>> work. In other words dma_fence_signal() is supposed to be called 
>>>> only from interrupt context!
>>>
>>> We are signaling eviction fences from normal user context, too. 
>>> There is no practical way to put this into an interrupt handler when 
>>> the TLB flush is being done synchronously on a user thread. We have 
>>> to do this in such a context for user mode queues.
>>
>>
>> We are currently working on adding a provide a high level kernel API 
>> which can be called directly to perform a TLB flush. Internally this 
>> API will add a deferred work to use the SDMA engine to perform a GPU 
>> TLB flush work (to compensate for a HW bug in Navi Chips). If my 
>> understanding is right, by interrupt context Christian means to 
>> perform this flush and signal from that differed work, is that so 
>> @Christian ?
>
> Well more or less. Ideally you put the TLB flush in a work item (or 
> use the SDMA for the hw bug workaround on Navi 1x).
>
> The point is that you shouldn't have it in the same execution thread 
> as the VM page table updates, because any memory allocation or 
> grabbing a lock could potentially depend on the TLB flush as soon as 
> you have published the dma_fence (by adding it to the VM page table 
> BOs for example).
>
Would it work for everyone if we add this generic API (say 
amdgpu_flush_tlb_async()) to add a TLB flush work and will also send 
this dma_fence_signal from within ? KFD can simply consume it from 
wherever they want, do you see a race condition if we do like this ?

- Shashank

> Christian.
>
>>
>> - Shashank
>>
>>>
>>> Regards,
>>>   Felix
>>>
>>>
>>>>
>>>> What we can to is to put the TLB flushing into an irq worker or 
>>>> work item and let the signaling happen from there.
>>>>
>>>> Amar and Shashank are already working on this, I strongly suggest 
>>>> to sync up with them.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> +    dma_fence_put(vm->pt_fence);
>>>>> +    vm->pt_fence = amdgpu_pt_fence_create();
>>>>>   }
>>>>>     struct kfd_process_device 
>>>>> *kfd_process_device_data_by_id(struct kfd_process *p, uint32_t 
>>>>> gpu_id)
>>>>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] drm/amdkfd: Signal page table fence after KFD flush tlb
  2023-06-05 15:40           ` Shashank Sharma
@ 2023-06-05 15:47             ` Christian König
  2023-06-06 16:16               ` Philip Yang
  0 siblings, 1 reply; 16+ messages in thread
From: Christian König @ 2023-06-05 15:47 UTC (permalink / raw)
  To: Shashank Sharma, Felix Kuehling, Philip Yang, amd-gfx,
	Somalapuram, Amaranath

Am 05.06.23 um 17:40 schrieb Shashank Sharma:
>
> On 05/06/2023 17:18, Christian König wrote:
>> Am 05.06.23 um 17:13 schrieb Shashank Sharma:
>>>
>>> On 02/06/2023 16:54, Felix Kuehling wrote:
>>>> Am 2023-06-02 um 07:57 schrieb Christian König:
>>>>> Am 01.06.23 um 21:31 schrieb Philip Yang:
>>>>>> To free page table BOs which are freed when updating page table, for
>>>>>> example PTE BOs when PDE0 used as PTE.
>>>>>>
>>>>>> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
>>>>>> ---
>>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 +++++
>>>>>>   1 file changed, 5 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
>>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> index af0a4b5257cc..0ff007a74d03 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> @@ -2101,6 +2101,11 @@ void kfd_flush_tlb(struct 
>>>>>> kfd_process_device *pdd, enum TLB_FLUSH_TYPE type)
>>>>>>               amdgpu_amdkfd_flush_gpu_tlb_pasid(
>>>>>>                   dev->adev, pdd->process->pasid, type, xcc);
>>>>>>       }
>>>>>> +
>>>>>> +    /* Signal page table fence to free page table BOs */
>>>>>> +    dma_fence_signal(vm->pt_fence);
>>>>>
>>>>> That's not something you can do here.
>>>>>
>>>>> Signaling a fence can never depend on anything except for hardware 
>>>>> work. In other words dma_fence_signal() is supposed to be called 
>>>>> only from interrupt context!
>>>>
>>>> We are signaling eviction fences from normal user context, too. 
>>>> There is no practical way to put this into an interrupt handler 
>>>> when the TLB flush is being done synchronously on a user thread. We 
>>>> have to do this in such a context for user mode queues.
>>>
>>>
>>> We are currently working on adding a provide a high level kernel API 
>>> which can be called directly to perform a TLB flush. Internally this 
>>> API will add a deferred work to use the SDMA engine to perform a GPU 
>>> TLB flush work (to compensate for a HW bug in Navi Chips). If my 
>>> understanding is right, by interrupt context Christian means to 
>>> perform this flush and signal from that differed work, is that so 
>>> @Christian ?
>>
>> Well more or less. Ideally you put the TLB flush in a work item (or 
>> use the SDMA for the hw bug workaround on Navi 1x).
>>
>> The point is that you shouldn't have it in the same execution thread 
>> as the VM page table updates, because any memory allocation or 
>> grabbing a lock could potentially depend on the TLB flush as soon as 
>> you have published the dma_fence (by adding it to the VM page table 
>> BOs for example).
>>
> Would it work for everyone if we add this generic API (say 
> amdgpu_flush_tlb_async()) to add a TLB flush work and will also send 
> this dma_fence_signal from within ? KFD can simply consume it from 
> wherever they want, do you see a race condition if we do like this ?

Yes, that's pretty much the whole idea. amdgpu_flush_tlb() should just 
return a dma_fence object.

This dma_fence object should either be the SDMA workaround or signaled 
from a work item.

We can then fence the BOs or just wait for the dma_fence object to signal.

Regards,
Christian.


>
> - Shashank
>
>> Christian.
>>
>>>
>>> - Shashank
>>>
>>>>
>>>> Regards,
>>>>   Felix
>>>>
>>>>
>>>>>
>>>>> What we can to is to put the TLB flushing into an irq worker or 
>>>>> work item and let the signaling happen from there.
>>>>>
>>>>> Amar and Shashank are already working on this, I strongly suggest 
>>>>> to sync up with them.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> + dma_fence_put(vm->pt_fence);
>>>>>> +    vm->pt_fence = amdgpu_pt_fence_create();
>>>>>>   }
>>>>>>     struct kfd_process_device 
>>>>>> *kfd_process_device_data_by_id(struct kfd_process *p, uint32_t 
>>>>>> gpu_id)
>>>>>
>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] drm/amdkfd: Signal page table fence after KFD flush tlb
  2023-06-05 15:47             ` Christian König
@ 2023-06-06 16:16               ` Philip Yang
  0 siblings, 0 replies; 16+ messages in thread
From: Philip Yang @ 2023-06-06 16:16 UTC (permalink / raw)
  To: Christian König, Shashank Sharma, Felix Kuehling,
	Philip Yang, amd-gfx, Somalapuram, Amaranath


On 2023-06-05 11:47, Christian König wrote:
> Am 05.06.23 um 17:40 schrieb Shashank Sharma:
>>
>> On 05/06/2023 17:18, Christian König wrote:
>>> Am 05.06.23 um 17:13 schrieb Shashank Sharma:
>>>>
>>>> On 02/06/2023 16:54, Felix Kuehling wrote:
>>>>> Am 2023-06-02 um 07:57 schrieb Christian König:
>>>>>> Am 01.06.23 um 21:31 schrieb Philip Yang:
>>>>>>> To free page table BOs which are freed when updating page table, 
>>>>>>> for
>>>>>>> example PTE BOs when PDE0 used as PTE.
>>>>>>>
>>>>>>> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
>>>>>>> ---
>>>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 +++++
>>>>>>>   1 file changed, 5 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
>>>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>>> index af0a4b5257cc..0ff007a74d03 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>>> @@ -2101,6 +2101,11 @@ void kfd_flush_tlb(struct 
>>>>>>> kfd_process_device *pdd, enum TLB_FLUSH_TYPE type)
>>>>>>>               amdgpu_amdkfd_flush_gpu_tlb_pasid(
>>>>>>>                   dev->adev, pdd->process->pasid, type, xcc);
>>>>>>>       }
>>>>>>> +
>>>>>>> +    /* Signal page table fence to free page table BOs */
>>>>>>> +    dma_fence_signal(vm->pt_fence);
>>>>>>
>>>>>> That's not something you can do here.
>>>>>>
>>>>>> Signaling a fence can never depend on anything except for 
>>>>>> hardware work. In other words dma_fence_signal() is supposed to 
>>>>>> be called only from interrupt context!
>>>>>
>>>>> We are signaling eviction fences from normal user context, too. 
>>>>> There is no practical way to put this into an interrupt handler 
>>>>> when the TLB flush is being done synchronously on a user thread. 
>>>>> We have to do this in such a context for user mode queues.
>>>>
>>>>
>>>> We are currently working on adding a provide a high level kernel 
>>>> API which can be called directly to perform a TLB flush. Internally 
>>>> this API will add a deferred work to use the SDMA engine to perform 
>>>> a GPU TLB flush work (to compensate for a HW bug in Navi Chips). If 
>>>> my understanding is right, by interrupt context Christian means to 
>>>> perform this flush and signal from that differed work, is that so 
>>>> @Christian ?
>>>
>>> Well more or less. Ideally you put the TLB flush in a work item (or 
>>> use the SDMA for the hw bug workaround on Navi 1x).
>>>
>>> The point is that you shouldn't have it in the same execution thread 
>>> as the VM page table updates, because any memory allocation or 
>>> grabbing a lock could potentially depend on the TLB flush as soon as 
>>> you have published the dma_fence (by adding it to the VM page table 
>>> BOs for example).
>>>
>> Would it work for everyone if we add this generic API (say 
>> amdgpu_flush_tlb_async()) to add a TLB flush work and will also send 
>> this dma_fence_signal from within ? KFD can simply consume it from 
>> wherever they want, do you see a race condition if we do like this ?
>
> Yes, that's pretty much the whole idea. amdgpu_flush_tlb() should just 
> return a dma_fence object.
>
> This dma_fence object should either be the SDMA workaround or signaled 
> from a work item.
>
> We can then fence the BOs or just wait for the dma_fence object to 
> signal.

In order to fix the original issue, page table BOs freed and reused 
before tlb is flushed, I will rebase and modify the patch series on new 
API amdgpu_flush_tlb_sync() which works for both KFD and gfx. Is my 
understanding correct?

Regards,

Philip

>
> Regards,
> Christian.
>
>
>>
>> - Shashank
>>
>>> Christian.
>>>
>>>>
>>>> - Shashank
>>>>
>>>>>
>>>>> Regards,
>>>>>   Felix
>>>>>
>>>>>
>>>>>>
>>>>>> What we can to is to put the TLB flushing into an irq worker or 
>>>>>> work item and let the signaling happen from there.
>>>>>>
>>>>>> Amar and Shashank are already working on this, I strongly suggest 
>>>>>> to sync up with them.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>> + dma_fence_put(vm->pt_fence);
>>>>>>> +    vm->pt_fence = amdgpu_pt_fence_create();
>>>>>>>   }
>>>>>>>     struct kfd_process_device 
>>>>>>> *kfd_process_device_data_by_id(struct kfd_process *p, uint32_t 
>>>>>>> gpu_id)
>>>>>>
>>>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-06-06 16:16 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-01 19:31 [PATCH 0/4] Page table fence Philip Yang
2023-06-01 19:31 ` [PATCH 1/4] drm/amdgpu: Implement page table BO fence Philip Yang
2023-06-01 20:12   ` Felix Kuehling
2023-06-02 11:54   ` Christian König
2023-06-01 19:31 ` [PATCH 2/4] drm/amdkfd: Signal page table fence after KFD flush tlb Philip Yang
2023-06-01 20:34   ` Felix Kuehling
2023-06-02 11:57   ` Christian König
2023-06-02 14:54     ` Felix Kuehling
2023-06-05 15:13       ` Shashank Sharma
2023-06-05 15:18         ` Christian König
2023-06-05 15:40           ` Shashank Sharma
2023-06-05 15:47             ` Christian König
2023-06-06 16:16               ` Philip Yang
2023-06-01 19:31 ` [PATCH 3/4] drm/amdgpu: Signal page table fence after gfx vm flush Philip Yang
2023-06-01 20:37   ` Felix Kuehling
2023-06-01 19:31 ` [PATCH 4/4] drm/amdgpu: Add fence to the freed page table BOs Philip Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.