All of lore.kernel.org
 help / color / mirror / Atom feed
* Graceful page fault handling for Vega/Navi
@ 2019-09-04 15:02 Christian König
       [not found] ` <20190904150230.13885-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Christian König @ 2019-09-04 15:02 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi everyone,

this series is the next puzzle piece for recoverable page fault handling on Vega and Navi.

It adds a new direct scheduler entity for VM updates which is then used to update page tables during a fault.

In other words previously an application doing an invalid memory access would just hang and/or repeat the invalid access over and over again. Now the handling is modified so that the invalid memory access is redirected to the dummy page.

This needs the following prerequisites:
a) The firmware must be new enough so allow re-routing of page faults.
b) Fault retry must be enabled using the amdgpu.noretry=0 parameter.
c) Enough free VRAM to allocate page tables to point to the dummy page.

The re-routing of page faults current only works on Vega10, so Vega20 and Navi will still need some more time.

Please review and/or comment,
Christian.


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/9] drm/ttm: return -EBUSY on pipelining with no_gpu_wait
       [not found] ` <20190904150230.13885-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
@ 2019-09-04 15:02   ` Christian König
  2019-09-04 15:02   ` [PATCH 2/9] drm/amdgpu: split the VM entity into direct and delayed Christian König
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2019-09-04 15:02 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Setting the no_gpu_wait flag means that the allocate BO must be available
immediately and we can't wait for any GPU operation to finish.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 43 +++++++++++++++++++++---------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 2070e8a57ed8..2899702139fb 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -924,7 +924,8 @@ EXPORT_SYMBOL(ttm_bo_mem_put);
  */
 static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
 				 struct ttm_mem_type_manager *man,
-				 struct ttm_mem_reg *mem)
+				 struct ttm_mem_reg *mem,
+				 bool no_wait_gpu)
 {
 	struct dma_fence *fence;
 	int ret;
@@ -933,19 +934,22 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
 	fence = dma_fence_get(man->move);
 	spin_unlock(&man->move_lock);
 
-	if (fence) {
-		reservation_object_add_shared_fence(bo->resv, fence);
+	if (!fence)
+		return 0;
 
-		ret = reservation_object_reserve_shared(bo->resv, 1);
-		if (unlikely(ret)) {
-			dma_fence_put(fence);
-			return ret;
-		}
+	if (no_wait_gpu)
+		return -EBUSY;
+
+	reservation_object_add_shared_fence(bo->resv, fence);
 
-		dma_fence_put(bo->moving);
-		bo->moving = fence;
+	ret = reservation_object_reserve_shared(bo->resv, 1);
+	if (unlikely(ret)) {
+		dma_fence_put(fence);
+		return ret;
 	}
 
+	dma_fence_put(bo->moving);
+	bo->moving = fence;
 	return 0;
 }
 
@@ -974,7 +978,7 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
 			return ret;
 	} while (1);
 
-	return ttm_bo_add_move_fence(bo, man, mem);
+	return ttm_bo_add_move_fence(bo, man, mem, ctx->no_wait_gpu);
 }
 
 static uint32_t ttm_bo_select_caching(struct ttm_mem_type_manager *man,
@@ -1116,13 +1120,16 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
 		if (unlikely(ret))
 			goto error;
 
-		if (mem->mm_node) {
-			ret = ttm_bo_add_move_fence(bo, man, mem);
-			if (unlikely(ret)) {
-				(*man->func->put_node)(man, mem);
-				goto error;
-			}
-			return 0;
+		if (!mem->mm_node)
+			continue;
+
+		ret = ttm_bo_add_move_fence(bo, man, mem, ctx->no_wait_gpu);
+		if (unlikely(ret)) {
+			(*man->func->put_node)(man, mem);
+			if (ret == -EBUSY)
+				continue;
+
+			goto error;
 		}
 	}
 
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/9] drm/amdgpu: split the VM entity into direct and delayed
       [not found] ` <20190904150230.13885-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  2019-09-04 15:02   ` [PATCH 1/9] drm/ttm: return -EBUSY on pipelining with no_gpu_wait Christian König
@ 2019-09-04 15:02   ` Christian König
  2019-09-04 15:02   ` [PATCH 3/9] drm/amdgpu: allow direct submission in the VM backends v2 Christian König
                     ` (8 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2019-09-04 15:02 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

For page fault handling we need to use a direct update which can't be
blocked by ongoing user CS.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c     |  6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      | 21 +++++++++++++++------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |  5 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c |  5 +++--
 4 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
index cd15540c5622..dfe155566571 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
@@ -282,7 +282,7 @@ static int amdgpu_vmid_grab_reserved(struct amdgpu_vm *vm,
 	    !dma_fence_is_later(updates, (*id)->flushed_updates))
 	    updates = NULL;
 
-	if ((*id)->owner != vm->entity.fence_context ||
+	if ((*id)->owner != vm->direct.fence_context ||
 	    job->vm_pd_addr != (*id)->pd_gpu_addr ||
 	    updates || !(*id)->last_flush ||
 	    ((*id)->last_flush->context != fence_context &&
@@ -349,7 +349,7 @@ static int amdgpu_vmid_grab_used(struct amdgpu_vm *vm,
 		struct dma_fence *flushed;
 
 		/* Check all the prerequisites to using this VMID */
-		if ((*id)->owner != vm->entity.fence_context)
+		if ((*id)->owner != vm->direct.fence_context)
 			continue;
 
 		if ((*id)->pd_gpu_addr != job->vm_pd_addr)
@@ -449,7 +449,7 @@ int amdgpu_vmid_grab(struct amdgpu_vm *vm, struct amdgpu_ring *ring,
 	}
 
 	id->pd_gpu_addr = job->vm_pd_addr;
-	id->owner = vm->entity.fence_context;
+	id->owner = vm->direct.fence_context;
 
 	if (job->vm_needs_flush) {
 		dma_fence_put(id->last_flush);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 501e13420786..fc103a9f20c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2687,12 +2687,17 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	spin_lock_init(&vm->invalidated_lock);
 	INIT_LIST_HEAD(&vm->freed);
 
-	/* create scheduler entity for page table updates */
-	r = drm_sched_entity_init(&vm->entity, adev->vm_manager.vm_pte_rqs,
+	/* create scheduler entities for page table updates */
+	r = drm_sched_entity_init(&vm->direct, adev->vm_manager.vm_pte_rqs,
 				  adev->vm_manager.vm_pte_num_rqs, NULL);
 	if (r)
 		return r;
 
+	r = drm_sched_entity_init(&vm->delayed, adev->vm_manager.vm_pte_rqs,
+				  adev->vm_manager.vm_pte_num_rqs, NULL);
+	if (r)
+		goto error_free_direct;
+
 	vm->pte_support_ats = false;
 
 	if (vm_context == AMDGPU_VM_CONTEXT_COMPUTE) {
@@ -2721,7 +2726,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 		bp.flags &= ~AMDGPU_GEM_CREATE_SHADOW;
 	r = amdgpu_bo_create(adev, &bp, &root);
 	if (r)
-		goto error_free_sched_entity;
+		goto error_free_delayed;
 
 	r = amdgpu_bo_reserve(root, true);
 	if (r)
@@ -2764,8 +2769,11 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	amdgpu_bo_unref(&vm->root.base.bo);
 	vm->root.base.bo = NULL;
 
-error_free_sched_entity:
-	drm_sched_entity_destroy(&vm->entity);
+error_free_delayed:
+	drm_sched_entity_destroy(&vm->delayed);
+
+error_free_direct:
+	drm_sched_entity_destroy(&vm->direct);
 
 	return r;
 }
@@ -2954,7 +2962,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 		spin_unlock_irqrestore(&adev->vm_manager.pasid_lock, flags);
 	}
 
-	drm_sched_entity_destroy(&vm->entity);
+	drm_sched_entity_destroy(&vm->direct);
+	drm_sched_entity_destroy(&vm->delayed);
 
 	if (!RB_EMPTY_ROOT(&vm->va.rb_root)) {
 		dev_err(adev->dev, "still active bo inside vm\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 3352a87b822e..7138722ee55f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -257,8 +257,9 @@ struct amdgpu_vm {
 	struct amdgpu_vm_pt     root;
 	struct dma_fence	*last_update;
 
-	/* Scheduler entity for page table updates */
-	struct drm_sched_entity	entity;
+	/* Scheduler entities for page table updates */
+	struct drm_sched_entity	direct;
+	struct drm_sched_entity	delayed;
 
 	unsigned int		pasid;
 	/* dedicated to vm */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
index ddd181f5ed37..d087d6650d79 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
@@ -99,12 +99,13 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
 	struct dma_fence *f;
 	int r;
 
-	ring = container_of(p->vm->entity.rq->sched, struct amdgpu_ring, sched);
+	ring = container_of(p->vm->delayed.rq->sched, struct amdgpu_ring,
+			    sched);
 
 	WARN_ON(ib->length_dw == 0);
 	amdgpu_ring_pad_ib(ring, ib);
 	WARN_ON(ib->length_dw > p->num_dw_left);
-	r = amdgpu_job_submit(p->job, &p->vm->entity,
+	r = amdgpu_job_submit(p->job, &p->vm->delayed,
 			      AMDGPU_FENCE_OWNER_VM, &f);
 	if (r)
 		goto error;
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/9] drm/amdgpu: allow direct submission in the VM backends v2
       [not found] ` <20190904150230.13885-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  2019-09-04 15:02   ` [PATCH 1/9] drm/ttm: return -EBUSY on pipelining with no_gpu_wait Christian König
  2019-09-04 15:02   ` [PATCH 2/9] drm/amdgpu: split the VM entity into direct and delayed Christian König
@ 2019-09-04 15:02   ` Christian König
  2019-09-04 15:02   ` [PATCH 4/9] drm/amdgpu: allow direct submission of PDE updates Christian König
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2019-09-04 15:02 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

This allows us to update page tables directly while in a page fault.

v2: use direct/delayed entities and still wait for moves

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |  5 +++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c  | 16 ++++++-------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 25 +++++++++++----------
 3 files changed, 26 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 7138722ee55f..54dcd0bcce1a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -201,6 +201,11 @@ struct amdgpu_vm_update_params {
 	 */
 	struct amdgpu_vm *vm;
 
+	/**
+	 * @direct: if changes should be made directly
+	 */
+	bool direct;
+
 	/**
 	 * @pages_addr:
 	 *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
index 5222d165abfc..a2daeadd770f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
@@ -49,13 +49,6 @@ static int amdgpu_vm_cpu_prepare(struct amdgpu_vm_update_params *p, void *owner,
 {
 	int r;
 
-	/* Wait for PT BOs to be idle. PTs share the same resv. object
-	 * as the root PD BO
-	 */
-	r = amdgpu_bo_sync_wait(p->vm->root.base.bo, owner, true);
-	if (unlikely(r))
-		return r;
-
 	/* Wait for any BO move to be completed */
 	if (exclusive) {
 		r = dma_fence_wait(exclusive, true);
@@ -63,7 +56,14 @@ static int amdgpu_vm_cpu_prepare(struct amdgpu_vm_update_params *p, void *owner,
 			return r;
 	}
 
-	return 0;
+	/* Don't wait for submissions during page fault */
+	if (p->direct)
+		return 0;
+
+	/* Wait for PT BOs to be idle. PTs share the same resv. object
+	 * as the root PD BO
+	 */
+	return amdgpu_bo_sync_wait(p->vm->root.base.bo, owner, true);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
index d087d6650d79..38c966cedc26 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
@@ -68,17 +68,19 @@ static int amdgpu_vm_sdma_prepare(struct amdgpu_vm_update_params *p,
 	if (r)
 		return r;
 
+	p->num_dw_left = ndw;
+
+	/* Wait for moves to be completed */
 	r = amdgpu_sync_fence(p->adev, &p->job->sync, exclusive, false);
 	if (r)
 		return r;
 
-	r = amdgpu_sync_resv(p->adev, &p->job->sync, root->tbo.resv,
-			     owner, false);
-	if (r)
-		return r;
+	/* Don't wait for any submissions during page fault handling */
+	if (p->direct)
+		return 0;
 
-	p->num_dw_left = ndw;
-	return 0;
+	return amdgpu_sync_resv(p->adev, &p->job->sync, root->tbo.resv,
+				owner, false);
 }
 
 /**
@@ -95,23 +97,23 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
 {
 	struct amdgpu_bo *root = p->vm->root.base.bo;
 	struct amdgpu_ib *ib = p->job->ibs;
+	struct drm_sched_entity *entity;
 	struct amdgpu_ring *ring;
 	struct dma_fence *f;
 	int r;
 
-	ring = container_of(p->vm->delayed.rq->sched, struct amdgpu_ring,
-			    sched);
+	entity = p->direct ? &p->vm->direct : &p->vm->delayed;
+	ring = container_of(entity->rq->sched, struct amdgpu_ring, sched);
 
 	WARN_ON(ib->length_dw == 0);
 	amdgpu_ring_pad_ib(ring, ib);
 	WARN_ON(ib->length_dw > p->num_dw_left);
-	r = amdgpu_job_submit(p->job, &p->vm->delayed,
-			      AMDGPU_FENCE_OWNER_VM, &f);
+	r = amdgpu_job_submit(p->job, entity, AMDGPU_FENCE_OWNER_VM, &f);
 	if (r)
 		goto error;
 
 	amdgpu_bo_fence(root, f, true);
-	if (fence)
+	if (fence && !p->direct)
 		swap(*fence, f);
 	dma_fence_put(f);
 	return 0;
@@ -121,7 +123,6 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
 	return r;
 }
 
-
 /**
  * amdgpu_vm_sdma_copy_ptes - copy the PTEs from mapping
  *
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/9] drm/amdgpu: allow direct submission of PDE updates.
       [not found] ` <20190904150230.13885-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
                     ` (2 preceding siblings ...)
  2019-09-04 15:02   ` [PATCH 3/9] drm/amdgpu: allow direct submission in the VM backends v2 Christian König
@ 2019-09-04 15:02   ` Christian König
       [not found]     ` <20190904150230.13885-5-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  2019-09-04 15:02   ` [PATCH 5/9] drm/amdgpu: allow direct submission of PTE updates Christian König
                     ` (6 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Christian König @ 2019-09-04 15:02 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

For handling PDE updates directly in the fault handler.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c           | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c          | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c           | 8 +++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h           | 4 ++--
 5 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index b0f0e060ded6..d3942d9306c4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -343,7 +343,7 @@ static int vm_update_pds(struct amdgpu_vm *vm, struct amdgpu_sync *sync)
 	struct amdgpu_device *adev = amdgpu_ttm_adev(pd->tbo.bdev);
 	int ret;
 
-	ret = amdgpu_vm_update_directories(adev, vm);
+	ret = amdgpu_vm_update_pdes(adev, vm, false);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 51f3db08b8eb..bd6b88827447 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -845,7 +845,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p)
 	if (r)
 		return r;
 
-	r = amdgpu_vm_update_directories(adev, vm);
+	r = amdgpu_vm_update_pdes(adev, vm, false);
 	if (r)
 		return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index e7af35c7080d..a621e629d876 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -521,7 +521,7 @@ static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
 			goto error;
 	}
 
-	r = amdgpu_vm_update_directories(adev, vm);
+	r = amdgpu_vm_update_pdes(adev, vm, false);
 
 error:
 	if (r && r != -ERESTARTSYS)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index fc103a9f20c5..b6c89ba9281c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1221,18 +1221,19 @@ static void amdgpu_vm_invalidate_pds(struct amdgpu_device *adev,
 }
 
 /*
- * amdgpu_vm_update_directories - make sure that all directories are valid
+ * amdgpu_vm_update_ - make sure that all directories are valid
  *
  * @adev: amdgpu_device pointer
  * @vm: requested vm
+ * @direct: submit directly to the paging queue
  *
  * Makes sure all directories are up to date.
  *
  * Returns:
  * 0 for success, error for failure.
  */
-int amdgpu_vm_update_directories(struct amdgpu_device *adev,
-				 struct amdgpu_vm *vm)
+int amdgpu_vm_update_pdes(struct amdgpu_device *adev,
+			  struct amdgpu_vm *vm, bool direct)
 {
 	struct amdgpu_vm_update_params params;
 	int r;
@@ -1243,6 +1244,7 @@ int amdgpu_vm_update_directories(struct amdgpu_device *adev,
 	memset(&params, 0, sizeof(params));
 	params.adev = adev;
 	params.vm = vm;
+	params.direct = direct;
 
 	r = vm->update_funcs->prepare(&params, AMDGPU_FENCE_OWNER_VM, NULL);
 	if (r)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 54dcd0bcce1a..0a97dc839f3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -366,8 +366,8 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 			      int (*callback)(void *p, struct amdgpu_bo *bo),
 			      void *param);
 int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job, bool need_pipe_sync);
-int amdgpu_vm_update_directories(struct amdgpu_device *adev,
-				 struct amdgpu_vm *vm);
+int amdgpu_vm_update_pdes(struct amdgpu_device *adev,
+			  struct amdgpu_vm *vm, bool direct);
 int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
 			  struct amdgpu_vm *vm,
 			  struct dma_fence **fence);
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/9] drm/amdgpu: allow direct submission of PTE updates
       [not found] ` <20190904150230.13885-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
                     ` (3 preceding siblings ...)
  2019-09-04 15:02   ` [PATCH 4/9] drm/amdgpu: allow direct submission of PDE updates Christian König
@ 2019-09-04 15:02   ` Christian König
  2019-09-04 15:02   ` [PATCH 6/9] drm/amdgpu: allow direct submission of clears Christian König
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2019-09-04 15:02 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

For handling PTE updates directly in the fault handler.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index b6c89ba9281c..c096903370aa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1486,13 +1486,14 @@ static int amdgpu_vm_update_ptes(struct amdgpu_vm_update_params *params,
  * amdgpu_vm_bo_update_mapping - update a mapping in the vm page table
  *
  * @adev: amdgpu_device pointer
- * @exclusive: fence we need to sync to
- * @pages_addr: DMA addresses to use for mapping
  * @vm: requested vm
+ * @direct: direct submission in a page fault
+ * @exclusive: fence we need to sync to
  * @start: start of mapped range
  * @last: last mapped entry
  * @flags: flags for the entries
  * @addr: addr to set the area to
+ * @pages_addr: DMA addresses to use for mapping
  * @fence: optional resulting fence
  *
  * Fill in the page table entries between @start and @last.
@@ -1501,11 +1502,11 @@ static int amdgpu_vm_update_ptes(struct amdgpu_vm_update_params *params,
  * 0 for success, -EINVAL for failure.
  */
 static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
+				       struct amdgpu_vm *vm, bool direct,
 				       struct dma_fence *exclusive,
-				       dma_addr_t *pages_addr,
-				       struct amdgpu_vm *vm,
 				       uint64_t start, uint64_t last,
 				       uint64_t flags, uint64_t addr,
+				       dma_addr_t *pages_addr,
 				       struct dma_fence **fence)
 {
 	struct amdgpu_vm_update_params params;
@@ -1515,6 +1516,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 	memset(&params, 0, sizeof(params));
 	params.adev = adev;
 	params.vm = vm;
+	params.direct = direct;
 	params.pages_addr = pages_addr;
 
 	/* sync to everything except eviction fences on unmapping */
@@ -1650,9 +1652,9 @@ static int amdgpu_vm_bo_split_mapping(struct amdgpu_device *adev,
 		}
 
 		last = min((uint64_t)mapping->last, start + max_entries - 1);
-		r = amdgpu_vm_bo_update_mapping(adev, exclusive, dma_addr, vm,
+		r = amdgpu_vm_bo_update_mapping(adev, vm, false, exclusive,
 						start, last, flags, addr,
-						fence);
+						dma_addr, fence);
 		if (r)
 			return r;
 
@@ -1946,9 +1948,9 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
 		    mapping->start < AMDGPU_GMC_HOLE_START)
 			init_pte_value = AMDGPU_PTE_DEFAULT_ATC;
 
-		r = amdgpu_vm_bo_update_mapping(adev, NULL, NULL, vm,
+		r = amdgpu_vm_bo_update_mapping(adev, vm, false, NULL,
 						mapping->start, mapping->last,
-						init_pte_value, 0, &f);
+						init_pte_value, 0, NULL, &f);
 		amdgpu_vm_free_mapping(adev, vm, mapping, f);
 		if (r) {
 			dma_fence_put(f);
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/9] drm/amdgpu: allow direct submission of clears
       [not found] ` <20190904150230.13885-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
                     ` (4 preceding siblings ...)
  2019-09-04 15:02   ` [PATCH 5/9] drm/amdgpu: allow direct submission of PTE updates Christian König
@ 2019-09-04 15:02   ` Christian König
  2019-09-04 15:02   ` [PATCH 7/9] drm/amdgpu: allocate PDs/PTs with no_gpu_wait in a page fault Christian König
                     ` (4 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2019-09-04 15:02 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

For handling PD/PT clears directly in the fault handler.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index c096903370aa..e3c11bd1ccee 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -695,6 +695,7 @@ bool amdgpu_vm_ready(struct amdgpu_vm *vm)
  * @adev: amdgpu_device pointer
  * @vm: VM to clear BO from
  * @bo: BO to clear
+ * @direct: use a direct update
  *
  * Root PD needs to be reserved when calling this.
  *
@@ -703,7 +704,8 @@ bool amdgpu_vm_ready(struct amdgpu_vm *vm)
  */
 static int amdgpu_vm_clear_bo(struct amdgpu_device *adev,
 			      struct amdgpu_vm *vm,
-			      struct amdgpu_bo *bo)
+			      struct amdgpu_bo *bo,
+			      bool direct)
 {
 	struct ttm_operation_ctx ctx = { true, false };
 	unsigned level = adev->vm_manager.root_level;
@@ -762,6 +764,7 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device *adev,
 	memset(&params, 0, sizeof(params));
 	params.adev = adev;
 	params.vm = vm;
+	params.direct = direct;
 
 	r = vm->update_funcs->prepare(&params, AMDGPU_FENCE_OWNER_KFD, NULL);
 	if (r)
@@ -852,7 +855,8 @@ static void amdgpu_vm_bo_param(struct amdgpu_device *adev, struct amdgpu_vm *vm,
  */
 static int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
 			       struct amdgpu_vm *vm,
-			       struct amdgpu_vm_pt_cursor *cursor)
+			       struct amdgpu_vm_pt_cursor *cursor,
+			       bool direct)
 {
 	struct amdgpu_vm_pt *entry = cursor->entry;
 	struct amdgpu_bo_param bp;
@@ -885,7 +889,7 @@ static int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
 	pt->parent = amdgpu_bo_ref(cursor->parent->base.bo);
 	amdgpu_vm_bo_base_init(&entry->base, vm, pt);
 
-	r = amdgpu_vm_clear_bo(adev, vm, pt);
+	r = amdgpu_vm_clear_bo(adev, vm, pt, direct);
 	if (r)
 		goto error_free_pt;
 
@@ -1395,7 +1399,8 @@ static int amdgpu_vm_update_ptes(struct amdgpu_vm_update_params *params,
 		uint64_t incr, entry_end, pe_start;
 		struct amdgpu_bo *pt;
 
-		r = amdgpu_vm_alloc_pts(params->adev, params->vm, &cursor);
+		r = amdgpu_vm_alloc_pts(params->adev, params->vm, &cursor,
+					params->direct);
 		if (r)
 			return r;
 
@@ -2742,7 +2747,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 
 	amdgpu_vm_bo_base_init(&vm->root.base, vm, root);
 
-	r = amdgpu_vm_clear_bo(adev, vm, root);
+	r = amdgpu_vm_clear_bo(adev, vm, root, false);
 	if (r)
 		goto error_unreserve;
 
@@ -2865,7 +2870,7 @@ int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm, uns
 	 */
 	if (pte_support_ats != vm->pte_support_ats) {
 		vm->pte_support_ats = pte_support_ats;
-		r = amdgpu_vm_clear_bo(adev, vm, vm->root.base.bo);
+		r = amdgpu_vm_clear_bo(adev, vm, vm->root.base.bo, false);
 		if (r)
 			goto free_idr;
 	}
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 7/9] drm/amdgpu: allocate PDs/PTs with no_gpu_wait in a page fault
       [not found] ` <20190904150230.13885-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
                     ` (5 preceding siblings ...)
  2019-09-04 15:02   ` [PATCH 6/9] drm/amdgpu: allow direct submission of clears Christian König
@ 2019-09-04 15:02   ` Christian König
  2019-09-04 15:02   ` [PATCH 8/9] drm/amdgpu: reserve the root PD while freeing PASIDs Christian König
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2019-09-04 15:02 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

While handling a page fault we can't wait for other ongoing GPU
operations or we can potentially run into deadlocks.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c     | 8 +++++---
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 04767905f004..510d04fd6e5f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -451,7 +451,7 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
 {
 	struct ttm_operation_ctx ctx = {
 		.interruptible = (bp->type != ttm_bo_type_kernel),
-		.no_wait_gpu = false,
+		.no_wait_gpu = bp->no_wait_gpu,
 		.resv = bp->resv,
 		.flags = TTM_OPT_FLAG_ALLOW_RES_EVICT
 	};
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 5a3c1779e200..e6ddd048a984 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -41,6 +41,7 @@ struct amdgpu_bo_param {
 	u32				preferred_domain;
 	u64				flags;
 	enum ttm_bo_type		type;
+	bool				no_wait_gpu;
 	struct reservation_object	*resv;
 };
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index e3c11bd1ccee..4c4c348b2752 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -821,7 +821,8 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device *adev,
  * @bp: resulting BO allocation parameters
  */
 static void amdgpu_vm_bo_param(struct amdgpu_device *adev, struct amdgpu_vm *vm,
-			       int level, struct amdgpu_bo_param *bp)
+			       int level, bool direct,
+			       struct amdgpu_bo_param *bp)
 {
 	memset(bp, 0, sizeof(*bp));
 
@@ -836,6 +837,7 @@ static void amdgpu_vm_bo_param(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	else if (!vm->root.base.bo || vm->root.base.bo->shadow)
 		bp->flags |= AMDGPU_GEM_CREATE_SHADOW;
 	bp->type = ttm_bo_type_kernel;
+	bp->no_wait_gpu = direct;
 	if (vm->root.base.bo)
 		bp->resv = vm->root.base.bo->tbo.resv;
 }
@@ -877,7 +879,7 @@ static int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
 	if (entry->base.bo)
 		return 0;
 
-	amdgpu_vm_bo_param(adev, vm, cursor->level, &bp);
+	amdgpu_vm_bo_param(adev, vm, cursor->level, direct, &bp);
 
 	r = amdgpu_bo_create(adev, &bp, &pt);
 	if (r)
@@ -2730,7 +2732,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 		vm->update_funcs = &amdgpu_vm_sdma_funcs;
 	vm->last_update = NULL;
 
-	amdgpu_vm_bo_param(adev, vm, adev->vm_manager.root_level, &bp);
+	amdgpu_vm_bo_param(adev, vm, adev->vm_manager.root_level, false, &bp);
 	if (vm_context == AMDGPU_VM_CONTEXT_COMPUTE)
 		bp.flags &= ~AMDGPU_GEM_CREATE_SHADOW;
 	r = amdgpu_bo_create(adev, &bp, &root);
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 8/9] drm/amdgpu: reserve the root PD while freeing PASIDs
       [not found] ` <20190904150230.13885-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
                     ` (6 preceding siblings ...)
  2019-09-04 15:02   ` [PATCH 7/9] drm/amdgpu: allocate PDs/PTs with no_gpu_wait in a page fault Christian König
@ 2019-09-04 15:02   ` Christian König
  2019-09-04 15:02   ` [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2 Christian König
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2019-09-04 15:02 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Free the pasid only while the root PD is reserved. This prevents use after
free in the page fault handling.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 4c4c348b2752..951608fc1925 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2961,18 +2961,26 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 	struct amdgpu_bo_va_mapping *mapping, *tmp;
 	bool prt_fini_needed = !!adev->gmc.gmc_funcs->set_prt;
 	struct amdgpu_bo *root;
-	int i, r;
+	int i;
 
 	amdgpu_amdkfd_gpuvm_destroy_cb(adev, vm);
 
+	root = amdgpu_bo_ref(vm->root.base.bo);
+	amdgpu_bo_reserve(root, true);
 	if (vm->pasid) {
 		unsigned long flags;
 
 		spin_lock_irqsave(&adev->vm_manager.pasid_lock, flags);
 		idr_remove(&adev->vm_manager.pasid_idr, vm->pasid);
 		spin_unlock_irqrestore(&adev->vm_manager.pasid_lock, flags);
+		vm->pasid = 0;
 	}
 
+	amdgpu_vm_free_pts(adev, vm, NULL);
+	amdgpu_bo_unreserve(root);
+	amdgpu_bo_unref(&root);
+	WARN_ON(vm->root.base.bo);
+
 	drm_sched_entity_destroy(&vm->direct);
 	drm_sched_entity_destroy(&vm->delayed);
 
@@ -2997,16 +3005,6 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 		amdgpu_vm_free_mapping(adev, vm, mapping, NULL);
 	}
 
-	root = amdgpu_bo_ref(vm->root.base.bo);
-	r = amdgpu_bo_reserve(root, true);
-	if (r) {
-		dev_err(adev->dev, "Leaking page tables because BO reservation failed\n");
-	} else {
-		amdgpu_vm_free_pts(adev, vm, NULL);
-		amdgpu_bo_unreserve(root);
-	}
-	amdgpu_bo_unref(&root);
-	WARN_ON(vm->root.base.bo);
 	dma_fence_put(vm->last_update);
 	for (i = 0; i < AMDGPU_MAX_VMHUBS; i++)
 		amdgpu_vmid_free_reserved(adev, vm, i);
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2
       [not found] ` <20190904150230.13885-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
                     ` (7 preceding siblings ...)
  2019-09-04 15:02   ` [PATCH 8/9] drm/amdgpu: reserve the root PD while freeing PASIDs Christian König
@ 2019-09-04 15:02   ` Christian König
       [not found]     ` <20190904150230.13885-10-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  2019-09-04 22:52   ` Graceful page fault handling for Vega/Navi Kuehling, Felix
  2019-09-04 23:03   ` Huang, Ray
  10 siblings, 1 reply; 25+ messages in thread
From: Christian König @ 2019-09-04 15:02 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Next step towards HMM support. For now just silence the retry fault and
optionally redirect the request to the dummy page.

v2: make sure the VM is not destroyed while we handle the fault.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74 ++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
 3 files changed, 80 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 951608fc1925..410d89966a66 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
 		}
 	}
 }
+
+/**
+ * amdgpu_vm_handle_fault - graceful handling of VM faults.
+ * @adev: amdgpu device pointer
+ * @pasid: PASID of the VM
+ * @addr: Address of the fault
+ *
+ * Try to gracefully handle a VM fault. Return true if the fault was handled and
+ * shouldn't be reported any more.
+ */
+bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
+			    uint64_t addr)
+{
+	struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
+	struct amdgpu_bo *root;
+	uint64_t value, flags;
+	struct amdgpu_vm *vm;
+	long r;
+
+	if (!ring->sched.ready)
+		return false;
+
+	spin_lock(&adev->vm_manager.pasid_lock);
+	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
+	if (vm)
+		root = amdgpu_bo_ref(vm->root.base.bo);
+	else
+		root = NULL;
+	spin_unlock(&adev->vm_manager.pasid_lock);
+
+	if (!root)
+		return false;
+
+	r = amdgpu_bo_reserve(root, true);
+	if (r)
+		goto error_unref;
+
+	spin_lock(&adev->vm_manager.pasid_lock);
+	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
+	spin_unlock(&adev->vm_manager.pasid_lock);
+
+	if (!vm || vm->root.base.bo != root)
+		goto error_unlock;
+
+	addr /= AMDGPU_GPU_PAGE_SIZE;
+	flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
+		AMDGPU_PTE_SYSTEM;
+
+	if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
+		/* Redirect the access to the dummy page */
+		value = adev->dummy_page_addr;
+		flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
+			AMDGPU_PTE_WRITEABLE;
+	} else {
+		value = 0;
+	}
+
+	r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
+					flags, value, NULL, NULL);
+	if (r)
+		goto error_unlock;
+
+	r = amdgpu_vm_update_pdes(adev, vm, true);
+
+error_unlock:
+	amdgpu_bo_unreserve(root);
+	if (r < 0)
+		DRM_ERROR("Can't handle page fault (%ld)\n", r);
+
+error_unref:
+	amdgpu_bo_unref(&root);
+
+	return false;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 0a97dc839f3b..4dbbe1b6b413 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct amdgpu_device *adev);
 
 void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int pasid,
 			     struct amdgpu_task_info *task_info);
+bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
+			    uint64_t addr);
 
 void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 9d15679df6e0..15a1ce51befa 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -353,6 +353,10 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev,
 	}
 
 	/* If it's the first fault for this address, process it normally */
+	if (retry_fault && !in_interrupt() &&
+	    amdgpu_vm_handle_fault(adev, entry->pasid, addr))
+		return 1; /* This also prevents sending it to KFD */
+
 	if (!amdgpu_sriov_vf(adev)) {
 		/*
 		 * Issue a dummy read to wait for the status register to
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2
       [not found]     ` <20190904150230.13885-10-christian.koenig-5C7GfCeVMHo@public.gmane.org>
@ 2019-09-04 20:12       ` Yang, Philip
       [not found]         ` <844c2c90-8be4-df05-9df9-c9c9cde9b186-5C7GfCeVMHo@public.gmane.org>
  2019-09-04 22:47       ` Kuehling, Felix
  1 sibling, 1 reply; 25+ messages in thread
From: Yang, Philip @ 2019-09-04 20:12 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

This series looks nice and clear for me, two questions embedded below.

Are we going to use dedicated sdma page queue for direct VM update path 
during a fault?

Thanks,
Philip

On 2019-09-04 11:02 a.m., Christian König wrote:
> Next step towards HMM support. For now just silence the retry fault and
> optionally redirect the request to the dummy page.
> 
> v2: make sure the VM is not destroyed while we handle the fault.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74 ++++++++++++++++++++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
>   3 files changed, 80 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 951608fc1925..410d89966a66 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
>   		}
>   	}
>   }
> +
> +/**
> + * amdgpu_vm_handle_fault - graceful handling of VM faults.
> + * @adev: amdgpu device pointer
> + * @pasid: PASID of the VM
> + * @addr: Address of the fault
> + *
> + * Try to gracefully handle a VM fault. Return true if the fault was handled and
> + * shouldn't be reported any more.
> + */
> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> +			    uint64_t addr)
> +{
> +	struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
> +	struct amdgpu_bo *root;
> +	uint64_t value, flags;
> +	struct amdgpu_vm *vm;
> +	long r;
> +
> +	if (!ring->sched.ready)
> +		return false;
> +
> +	spin_lock(&adev->vm_manager.pasid_lock);
> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
> +	if (vm)
> +		root = amdgpu_bo_ref(vm->root.base.bo);
> +	else
> +		root = NULL;
> +	spin_unlock(&adev->vm_manager.pasid_lock);
> +
> +	if (!root)
> +		return false;
> +
> +	r = amdgpu_bo_reserve(root, true);
> +	if (r)
> +		goto error_unref;
> +
> +	spin_lock(&adev->vm_manager.pasid_lock);
> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
> +	spin_unlock(&adev->vm_manager.pasid_lock);
> +
Here get vm from pasid second time, and check if PD bo is changed, is 
this to handle vm fault race with vm destory?

> +	if (!vm || vm->root.base.bo != root)
> +		goto error_unlock;
> +
> +	addr /= AMDGPU_GPU_PAGE_SIZE;
> +	flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
> +		AMDGPU_PTE_SYSTEM;
> +
> +	if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
> +		/* Redirect the access to the dummy page */
> +		value = adev->dummy_page_addr;
> +		flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
> +			AMDGPU_PTE_WRITEABLE;
> +	} else {
> +		value = 0;
> +	}
> +
> +	r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
> +					flags, value, NULL, NULL);
> +	if (r)
> +		goto error_unlock;
> +
After fault address redirect to dummy page, will the fault recover and 
retry continue to execute? Is this dangerous to update PTE to use system 
memory address 0?

> +	r = amdgpu_vm_update_pdes(adev, vm, true);
> +
> +error_unlock:
> +	amdgpu_bo_unreserve(root);
> +	if (r < 0)
> +		DRM_ERROR("Can't handle page fault (%ld)\n", r);
> +
> +error_unref:
> +	amdgpu_bo_unref(&root);
> +
> +	return false;
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 0a97dc839f3b..4dbbe1b6b413 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct amdgpu_device *adev);
>   
>   void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int pasid,
>   			     struct amdgpu_task_info *task_info);
> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> +			    uint64_t addr);
>   
>   void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index 9d15679df6e0..15a1ce51befa 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -353,6 +353,10 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev,
>   	}
>   
>   	/* If it's the first fault for this address, process it normally */
> +	if (retry_fault && !in_interrupt() &&
> +	    amdgpu_vm_handle_fault(adev, entry->pasid, addr))
> +		return 1; /* This also prevents sending it to KFD */
> +
>   	if (!amdgpu_sriov_vf(adev)) {
>   		/*
>   		 * Issue a dummy read to wait for the status register to
> 
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2
       [not found]     ` <20190904150230.13885-10-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  2019-09-04 20:12       ` Yang, Philip
@ 2019-09-04 22:47       ` Kuehling, Felix
       [not found]         ` <4b5365d3-cc8c-1c2a-2675-f74baa4b9e8b-5C7GfCeVMHo@public.gmane.org>
  1 sibling, 1 reply; 25+ messages in thread
From: Kuehling, Felix @ 2019-09-04 22:47 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2019-09-04 11:02 a.m., Christian König wrote:
> Next step towards HMM support. For now just silence the retry fault and
> optionally redirect the request to the dummy page.
>
> v2: make sure the VM is not destroyed while we handle the fault.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74 ++++++++++++++++++++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
>   3 files changed, 80 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 951608fc1925..410d89966a66 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
>   		}
>   	}
>   }
> +
> +/**
> + * amdgpu_vm_handle_fault - graceful handling of VM faults.
> + * @adev: amdgpu device pointer
> + * @pasid: PASID of the VM
> + * @addr: Address of the fault
> + *
> + * Try to gracefully handle a VM fault. Return true if the fault was handled and
> + * shouldn't be reported any more.
> + */
> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> +			    uint64_t addr)
> +{
> +	struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
> +	struct amdgpu_bo *root;
> +	uint64_t value, flags;
> +	struct amdgpu_vm *vm;
> +	long r;
> +
> +	if (!ring->sched.ready)
> +		return false;
> +
> +	spin_lock(&adev->vm_manager.pasid_lock);
> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
> +	if (vm)
> +		root = amdgpu_bo_ref(vm->root.base.bo);
> +	else
> +		root = NULL;
> +	spin_unlock(&adev->vm_manager.pasid_lock);
> +
> +	if (!root)
> +		return false;
> +
> +	r = amdgpu_bo_reserve(root, true);
> +	if (r)
> +		goto error_unref;
> +
> +	spin_lock(&adev->vm_manager.pasid_lock);
> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
> +	spin_unlock(&adev->vm_manager.pasid_lock);

I think this deserves a comment. If I understand it correctly, you're 
looking up the vm twice so that you have the VM root reservation to 
protect against user-after-free. Otherwise the vm pointer is only valid 
as long as you're holding the spin-lock.


> +
> +	if (!vm || vm->root.base.bo != root)

The check of vm->root.base.bo should probably still be under the 
spin_lock. Because you're not sure yet it's the right VM, you can't rely 
on the reservation here to prevent use-after-free.


> +		goto error_unlock;
> +
> +	addr /= AMDGPU_GPU_PAGE_SIZE;
> +	flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
> +		AMDGPU_PTE_SYSTEM;
> +
> +	if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
> +		/* Redirect the access to the dummy page */
> +		value = adev->dummy_page_addr;
> +		flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
> +			AMDGPU_PTE_WRITEABLE;
> +	} else {
> +		value = 0;
> +	}
> +
> +	r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
> +					flags, value, NULL, NULL);
> +	if (r)
> +		goto error_unlock;
> +
> +	r = amdgpu_vm_update_pdes(adev, vm, true);
> +
> +error_unlock:
> +	amdgpu_bo_unreserve(root);
> +	if (r < 0)
> +		DRM_ERROR("Can't handle page fault (%ld)\n", r);
> +
> +error_unref:
> +	amdgpu_bo_unref(&root);
> +
> +	return false;
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 0a97dc839f3b..4dbbe1b6b413 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct amdgpu_device *adev);
>   
>   void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int pasid,
>   			     struct amdgpu_task_info *task_info);
> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> +			    uint64_t addr);
>   
>   void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index 9d15679df6e0..15a1ce51befa 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -353,6 +353,10 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev,
>   	}
>   
>   	/* If it's the first fault for this address, process it normally */
> +	if (retry_fault && !in_interrupt() &&
> +	    amdgpu_vm_handle_fault(adev, entry->pasid, addr))
> +		return 1; /* This also prevents sending it to KFD */

The !in_interrupt() is meant to only do this on the rerouted interrupt 
ring that's handled by a worker function?

Looks like amdgpu_vm_handle_fault never returns true for now. So we'll 
never get to the "return 1" here.

Regards,
   Felix


> +
>   	if (!amdgpu_sriov_vf(adev)) {
>   		/*
>   		 * Issue a dummy read to wait for the status register to
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/9] drm/amdgpu: allow direct submission of PDE updates.
       [not found]     ` <20190904150230.13885-5-christian.koenig-5C7GfCeVMHo@public.gmane.org>
@ 2019-09-04 22:48       ` Kuehling, Felix
  0 siblings, 0 replies; 25+ messages in thread
From: Kuehling, Felix @ 2019-09-04 22:48 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2019-09-04 11:02 a.m., Christian König wrote:
> For handling PDE updates directly in the fault handler.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c           | 2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c          | 2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c           | 8 +++++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h           | 4 ++--
>   5 files changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index b0f0e060ded6..d3942d9306c4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -343,7 +343,7 @@ static int vm_update_pds(struct amdgpu_vm *vm, struct amdgpu_sync *sync)
>   	struct amdgpu_device *adev = amdgpu_ttm_adev(pd->tbo.bdev);
>   	int ret;
>   
> -	ret = amdgpu_vm_update_directories(adev, vm);
> +	ret = amdgpu_vm_update_pdes(adev, vm, false);
>   	if (ret)
>   		return ret;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 51f3db08b8eb..bd6b88827447 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -845,7 +845,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p)
>   	if (r)
>   		return r;
>   
> -	r = amdgpu_vm_update_directories(adev, vm);
> +	r = amdgpu_vm_update_pdes(adev, vm, false);
>   	if (r)
>   		return r;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index e7af35c7080d..a621e629d876 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -521,7 +521,7 @@ static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
>   			goto error;
>   	}
>   
> -	r = amdgpu_vm_update_directories(adev, vm);
> +	r = amdgpu_vm_update_pdes(adev, vm, false);
>   
>   error:
>   	if (r && r != -ERESTARTSYS)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index fc103a9f20c5..b6c89ba9281c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1221,18 +1221,19 @@ static void amdgpu_vm_invalidate_pds(struct amdgpu_device *adev,
>   }
>   
>   /*
> - * amdgpu_vm_update_directories - make sure that all directories are valid
> + * amdgpu_vm_update_ - make sure that all directories are valid

Typo.

Regards,
   Felix


>    *
>    * @adev: amdgpu_device pointer
>    * @vm: requested vm
> + * @direct: submit directly to the paging queue
>    *
>    * Makes sure all directories are up to date.
>    *
>    * Returns:
>    * 0 for success, error for failure.
>    */
> -int amdgpu_vm_update_directories(struct amdgpu_device *adev,
> -				 struct amdgpu_vm *vm)
> +int amdgpu_vm_update_pdes(struct amdgpu_device *adev,
> +			  struct amdgpu_vm *vm, bool direct)
>   {
>   	struct amdgpu_vm_update_params params;
>   	int r;
> @@ -1243,6 +1244,7 @@ int amdgpu_vm_update_directories(struct amdgpu_device *adev,
>   	memset(&params, 0, sizeof(params));
>   	params.adev = adev;
>   	params.vm = vm;
> +	params.direct = direct;
>   
>   	r = vm->update_funcs->prepare(&params, AMDGPU_FENCE_OWNER_VM, NULL);
>   	if (r)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 54dcd0bcce1a..0a97dc839f3b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -366,8 +366,8 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>   			      int (*callback)(void *p, struct amdgpu_bo *bo),
>   			      void *param);
>   int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job, bool need_pipe_sync);
> -int amdgpu_vm_update_directories(struct amdgpu_device *adev,
> -				 struct amdgpu_vm *vm);
> +int amdgpu_vm_update_pdes(struct amdgpu_device *adev,
> +			  struct amdgpu_vm *vm, bool direct);
>   int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
>   			  struct amdgpu_vm *vm,
>   			  struct dma_fence **fence);
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Graceful page fault handling for Vega/Navi
       [not found] ` <20190904150230.13885-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
                     ` (8 preceding siblings ...)
  2019-09-04 15:02   ` [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2 Christian König
@ 2019-09-04 22:52   ` Kuehling, Felix
       [not found]     ` <03e2274c-a0dd-41f4-c5e0-26e371d01d23-5C7GfCeVMHo@public.gmane.org>
  2019-09-04 23:03   ` Huang, Ray
  10 siblings, 1 reply; 25+ messages in thread
From: Kuehling, Felix @ 2019-09-04 22:52 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


On 2019-09-04 11:02 a.m., Christian König wrote:
> Hi everyone,
>
> this series is the next puzzle piece for recoverable page fault handling on Vega and Navi.
>
> It adds a new direct scheduler entity for VM updates which is then used to update page tables during a fault.
>
> In other words previously an application doing an invalid memory access would just hang and/or repeat the invalid access over and over again. Now the handling is modified so that the invalid memory access is redirected to the dummy page.
>
> This needs the following prerequisites:
> a) The firmware must be new enough so allow re-routing of page faults.
> b) Fault retry must be enabled using the amdgpu.noretry=0 parameter.
> c) Enough free VRAM to allocate page tables to point to the dummy page.
>
> The re-routing of page faults current only works on Vega10, so Vega20 and Navi will still need some more time.

Wait, we don't do the page fault rerouting on Vega20 yet? So we're 
getting the full brunt of the fault storm on the main interrupt ring? In 
that case, we should probably change the default setting of 
amdgpu.noretry=1 at least until that's done.

Other than that the patch series looks reasonable to me. I commented on 
patches 4 and 9 separately.

Patch 1 is Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>

With the issues addressed that I pointed out, the rest is

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>

Regards,
   Felix


> Please review and/or comment,
> Christian.
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Graceful page fault handling for Vega/Navi
       [not found] ` <20190904150230.13885-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
                     ` (9 preceding siblings ...)
  2019-09-04 22:52   ` Graceful page fault handling for Vega/Navi Kuehling, Felix
@ 2019-09-04 23:03   ` Huang, Ray
       [not found]     ` <MN2PR12MB3309C53565E6D4E2E9FE23C7ECB80-rweVpJHSKTpWdvXm18W95QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  10 siblings, 1 reply; 25+ messages in thread
From: Huang, Ray @ 2019-09-04 23:03 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Wed, Sep 04, 2019 at 05:02:21PM +0200, Christian König wrote:
> Hi everyone,
> 
> this series is the next puzzle piece for recoverable page fault handling on Vega and Navi.
> 
> It adds a new direct scheduler entity for VM updates which is then used to update page tables during a fault.
> 
> In other words previously an application doing an invalid memory access would just hang and/or repeat the invalid access over and over again. Now the handling is modified so that the invalid memory access is redirected to the dummy page.
> 
> This needs the following prerequisites:
> a) The firmware must be new enough so allow re-routing of page faults.
> b) Fault retry must be enabled using the amdgpu.noretry=0 parameter.

In my side, I found "notretry" parameter not workable for vmid 0 vm faults.
If the same observation in your side, I'd like give a check.

Thanks,
Ray


> c) Enough free VRAM to allocate page tables to point to the dummy page.
> 
> The re-routing of page faults current only works on Vega10, so Vega20 and Navi will still need some more time.
> 
> Please review and/or comment,
> Christian.
> 
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Graceful page fault handling for Vega/Navi
       [not found]     ` <MN2PR12MB3309C53565E6D4E2E9FE23C7ECB80-rweVpJHSKTpWdvXm18W95QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2019-09-04 23:28       ` Kuehling, Felix
  0 siblings, 0 replies; 25+ messages in thread
From: Kuehling, Felix @ 2019-09-04 23:28 UTC (permalink / raw)
  To: Huang, Ray, Christian König; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2019-09-04 7:03 p.m., Huang, Ray wrote:
> On Wed, Sep 04, 2019 at 05:02:21PM +0200, Christian König wrote:
>> Hi everyone,
>>
>> this series is the next puzzle piece for recoverable page fault handling on Vega and Navi.
>>
>> It adds a new direct scheduler entity for VM updates which is then used to update page tables during a fault.
>>
>> In other words previously an application doing an invalid memory access would just hang and/or repeat the invalid access over and over again. Now the handling is modified so that the invalid memory access is redirected to the dummy page.
>>
>> This needs the following prerequisites:
>> a) The firmware must be new enough so allow re-routing of page faults.
>> b) Fault retry must be enabled using the amdgpu.noretry=0 parameter.
> In my side, I found "notretry" parameter not workable for vmid 0 vm faults.
> If the same observation in your side, I'd like give a check.

I think the noretry parameter is not meant to affect VMID0. I just find 
it surprising that retry faults are happening at all on VMID0. It 
doesn't make a lot of sense. I can't think of any good reason to retry 
any page faults in VMID0.

I see that the HW default for the RETRY_PERMISSION_OR_INVALID_PAGE_FAULT 
bit is 1 in VM_CONTEXT0_CNTL. I don't see us changing that value in the 
driver. We probably should. I'll send out a patch for that.

Regards,
   Felix


>
> Thanks,
> Ray
>
>
>> c) Enough free VRAM to allocate page tables to point to the dummy page.
>>
>> The re-routing of page faults current only works on Vega10, so Vega20 and Navi will still need some more time.
>>
>> Please review and/or comment,
>> Christian.
>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2
       [not found]         ` <844c2c90-8be4-df05-9df9-c9c9cde9b186-5C7GfCeVMHo@public.gmane.org>
@ 2019-09-09 12:03           ` Christian König
       [not found]             ` <db73b357-b7e4-f818-e57c-234a45f7c5cb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Christian König @ 2019-09-09 12:03 UTC (permalink / raw)
  To: Yang, Philip, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 04.09.19 um 22:12 schrieb Yang, Philip:
> This series looks nice and clear for me, two questions embedded below.
>
> Are we going to use dedicated sdma page queue for direct VM update path
> during a fault?
>
> Thanks,
> Philip
>
> On 2019-09-04 11:02 a.m., Christian König wrote:
>> Next step towards HMM support. For now just silence the retry fault and
>> optionally redirect the request to the dummy page.
>>
>> v2: make sure the VM is not destroyed while we handle the fault.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74 ++++++++++++++++++++++++++
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
>>    drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
>>    3 files changed, 80 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 951608fc1925..410d89966a66 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
>>    		}
>>    	}
>>    }
>> +
>> +/**
>> + * amdgpu_vm_handle_fault - graceful handling of VM faults.
>> + * @adev: amdgpu device pointer
>> + * @pasid: PASID of the VM
>> + * @addr: Address of the fault
>> + *
>> + * Try to gracefully handle a VM fault. Return true if the fault was handled and
>> + * shouldn't be reported any more.
>> + */
>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> +			    uint64_t addr)
>> +{
>> +	struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
>> +	struct amdgpu_bo *root;
>> +	uint64_t value, flags;
>> +	struct amdgpu_vm *vm;
>> +	long r;
>> +
>> +	if (!ring->sched.ready)
>> +		return false;
>> +
>> +	spin_lock(&adev->vm_manager.pasid_lock);
>> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>> +	if (vm)
>> +		root = amdgpu_bo_ref(vm->root.base.bo);
>> +	else
>> +		root = NULL;
>> +	spin_unlock(&adev->vm_manager.pasid_lock);
>> +
>> +	if (!root)
>> +		return false;
>> +
>> +	r = amdgpu_bo_reserve(root, true);
>> +	if (r)
>> +		goto error_unref;
>> +
>> +	spin_lock(&adev->vm_manager.pasid_lock);
>> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>> +	spin_unlock(&adev->vm_manager.pasid_lock);
>> +
> Here get vm from pasid second time, and check if PD bo is changed, is
> this to handle vm fault race with vm destory?

Yes, exactly.

>
>> +	if (!vm || vm->root.base.bo != root)
>> +		goto error_unlock;
>> +
>> +	addr /= AMDGPU_GPU_PAGE_SIZE;
>> +	flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>> +		AMDGPU_PTE_SYSTEM;
>> +
>> +	if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
>> +		/* Redirect the access to the dummy page */
>> +		value = adev->dummy_page_addr;
>> +		flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>> +			AMDGPU_PTE_WRITEABLE;
>> +	} else {
>> +		value = 0;
>> +	}
>> +
>> +	r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
>> +					flags, value, NULL, NULL);
>> +	if (r)
>> +		goto error_unlock;
>> +
> After fault address redirect to dummy page, will the fault recover and
> retry continue to execute?

Yes, the read/write operation will just retry and use the value from the 
dummy page instead.

> Is this dangerous to update PTE to use system
> memory address 0?

What are you talking about? The dummy page is a page allocate by TTM 
where we redirect faulty accesses to.

Regards,
Christian.

>
>> +	r = amdgpu_vm_update_pdes(adev, vm, true);
>> +
>> +error_unlock:
>> +	amdgpu_bo_unreserve(root);
>> +	if (r < 0)
>> +		DRM_ERROR("Can't handle page fault (%ld)\n", r);
>> +
>> +error_unref:
>> +	amdgpu_bo_unref(&root);
>> +
>> +	return false;
>> +}
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> index 0a97dc839f3b..4dbbe1b6b413 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> @@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct amdgpu_device *adev);
>>    
>>    void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int pasid,
>>    			     struct amdgpu_task_info *task_info);
>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> +			    uint64_t addr);
>>    
>>    void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
>>    
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> index 9d15679df6e0..15a1ce51befa 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> @@ -353,6 +353,10 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev,
>>    	}
>>    
>>    	/* If it's the first fault for this address, process it normally */
>> +	if (retry_fault && !in_interrupt() &&
>> +	    amdgpu_vm_handle_fault(adev, entry->pasid, addr))
>> +		return 1; /* This also prevents sending it to KFD */
>> +
>>    	if (!amdgpu_sriov_vf(adev)) {
>>    		/*
>>    		 * Issue a dummy read to wait for the status register to
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2
       [not found]         ` <4b5365d3-cc8c-1c2a-2675-f74baa4b9e8b-5C7GfCeVMHo@public.gmane.org>
@ 2019-09-09 12:07           ` Christian König
       [not found]             ` <3051c690-fabb-953a-3ae6-2cccfe8a502c-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Christian König @ 2019-09-09 12:07 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 05.09.19 um 00:47 schrieb Kuehling, Felix:
> On 2019-09-04 11:02 a.m., Christian König wrote:
>> Next step towards HMM support. For now just silence the retry fault and
>> optionally redirect the request to the dummy page.
>>
>> v2: make sure the VM is not destroyed while we handle the fault.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74 ++++++++++++++++++++++++++
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
>>    drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
>>    3 files changed, 80 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 951608fc1925..410d89966a66 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
>>    		}
>>    	}
>>    }
>> +
>> +/**
>> + * amdgpu_vm_handle_fault - graceful handling of VM faults.
>> + * @adev: amdgpu device pointer
>> + * @pasid: PASID of the VM
>> + * @addr: Address of the fault
>> + *
>> + * Try to gracefully handle a VM fault. Return true if the fault was handled and
>> + * shouldn't be reported any more.
>> + */
>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> +			    uint64_t addr)
>> +{
>> +	struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
>> +	struct amdgpu_bo *root;
>> +	uint64_t value, flags;
>> +	struct amdgpu_vm *vm;
>> +	long r;
>> +
>> +	if (!ring->sched.ready)
>> +		return false;
>> +
>> +	spin_lock(&adev->vm_manager.pasid_lock);
>> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>> +	if (vm)
>> +		root = amdgpu_bo_ref(vm->root.base.bo);
>> +	else
>> +		root = NULL;
>> +	spin_unlock(&adev->vm_manager.pasid_lock);
>> +
>> +	if (!root)
>> +		return false;
>> +
>> +	r = amdgpu_bo_reserve(root, true);
>> +	if (r)
>> +		goto error_unref;
>> +
>> +	spin_lock(&adev->vm_manager.pasid_lock);
>> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>> +	spin_unlock(&adev->vm_manager.pasid_lock);
> I think this deserves a comment. If I understand it correctly, you're
> looking up the vm twice so that you have the VM root reservation to
> protect against user-after-free. Otherwise the vm pointer is only valid
> as long as you're holding the spin-lock.
>
>
>> +
>> +	if (!vm || vm->root.base.bo != root)
> The check of vm->root.base.bo should probably still be under the
> spin_lock. Because you're not sure yet it's the right VM, you can't rely
> on the reservation here to prevent use-after-free.

Good point, going to fix that.

>
>
>> +		goto error_unlock;
>> +
>> +	addr /= AMDGPU_GPU_PAGE_SIZE;
>> +	flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>> +		AMDGPU_PTE_SYSTEM;
>> +
>> +	if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
>> +		/* Redirect the access to the dummy page */
>> +		value = adev->dummy_page_addr;
>> +		flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>> +			AMDGPU_PTE_WRITEABLE;
>> +	} else {
>> +		value = 0;
>> +	}
>> +
>> +	r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
>> +					flags, value, NULL, NULL);
>> +	if (r)
>> +		goto error_unlock;
>> +
>> +	r = amdgpu_vm_update_pdes(adev, vm, true);
>> +
>> +error_unlock:
>> +	amdgpu_bo_unreserve(root);
>> +	if (r < 0)
>> +		DRM_ERROR("Can't handle page fault (%ld)\n", r);
>> +
>> +error_unref:
>> +	amdgpu_bo_unref(&root);
>> +
>> +	return false;
>> +}
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> index 0a97dc839f3b..4dbbe1b6b413 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> @@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct amdgpu_device *adev);
>>    
>>    void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int pasid,
>>    			     struct amdgpu_task_info *task_info);
>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> +			    uint64_t addr);
>>    
>>    void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
>>    
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> index 9d15679df6e0..15a1ce51befa 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> @@ -353,6 +353,10 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev,
>>    	}
>>    
>>    	/* If it's the first fault for this address, process it normally */
>> +	if (retry_fault && !in_interrupt() &&
>> +	    amdgpu_vm_handle_fault(adev, entry->pasid, addr))
>> +		return 1; /* This also prevents sending it to KFD */
> The !in_interrupt() is meant to only do this on the rerouted interrupt
> ring that's handled by a worker function?

Yes, exactly. But I plan to add a workaround where the CPU redirects the 
fault to the other ring buffer for firmware versions which doesn't have 
that.

Adds quite a bunch of overhead on Vega10, because of the fault storm but 
should work fine on Vega20.

Key point is that we already released firmware without the redirection, 
but it's still better to have that than to run into the storm.

> Looks like amdgpu_vm_handle_fault never returns true for now. So we'll
> never get to the "return 1" here.

Oh, yes that actually belongs into a follow up patch.

Thanks,
Christian.

>
> Regards,
>     Felix
>
>
>> +
>>    	if (!amdgpu_sriov_vf(adev)) {
>>    		/*
>>    		 * Issue a dummy read to wait for the status register to

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Graceful page fault handling for Vega/Navi
       [not found]     ` <03e2274c-a0dd-41f4-c5e0-26e371d01d23-5C7GfCeVMHo@public.gmane.org>
@ 2019-09-09 12:09       ` Christian König
  0 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2019-09-09 12:09 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 05.09.19 um 00:52 schrieb Kuehling, Felix:
> On 2019-09-04 11:02 a.m., Christian König wrote:
>> Hi everyone,
>>
>> this series is the next puzzle piece for recoverable page fault handling on Vega and Navi.
>>
>> It adds a new direct scheduler entity for VM updates which is then used to update page tables during a fault.
>>
>> In other words previously an application doing an invalid memory access would just hang and/or repeat the invalid access over and over again. Now the handling is modified so that the invalid memory access is redirected to the dummy page.
>>
>> This needs the following prerequisites:
>> a) The firmware must be new enough so allow re-routing of page faults.
>> b) Fault retry must be enabled using the amdgpu.noretry=0 parameter.
>> c) Enough free VRAM to allocate page tables to point to the dummy page.
>>
>> The re-routing of page faults current only works on Vega10, so Vega20 and Navi will still need some more time.
> Wait, we don't do the page fault rerouting on Vega20 yet? So we're
> getting the full brunt of the fault storm on the main interrupt ring?

It's implemented, but the Vega20 firmware fails to enable the 
re-reouting for some reason.

I haven't had time yet to talk to the firmware guys why that happens.

> In that case, we should probably change the default setting of
> amdgpu.noretry=1 at least until that's done.
>
> Other than that the patch series looks reasonable to me. I commented on
> patches 4 and 9 separately.
>
> Patch 1 is Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
>
> With the issues addressed that I pointed out, the rest is
>
> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>

Thanks,
Christian.

>
> Regards,
>     Felix
>
>
>> Please review and/or comment,
>> Christian.
>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2
       [not found]             ` <db73b357-b7e4-f818-e57c-234a45f7c5cb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2019-09-09 13:58               ` Yang, Philip
       [not found]                 ` <4ebc27be-a403-0a0b-e9ff-4e5e18c5417c-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Yang, Philip @ 2019-09-09 13:58 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW



On 2019-09-09 8:03 a.m., Christian König wrote:
> Am 04.09.19 um 22:12 schrieb Yang, Philip:
>> This series looks nice and clear for me, two questions embedded below.
>>
>> Are we going to use dedicated sdma page queue for direct VM update path
>> during a fault?
>>
>> Thanks,
>> Philip
>>
>> On 2019-09-04 11:02 a.m., Christian König wrote:
>>> Next step towards HMM support. For now just silence the retry fault and
>>> optionally redirect the request to the dummy page.
>>>
>>> v2: make sure the VM is not destroyed while we handle the fault.
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74 
>>> ++++++++++++++++++++++++++
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
>>>    drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
>>>    3 files changed, 80 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 951608fc1925..410d89966a66 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm 
>>> *vm)
>>>            }
>>>        }
>>>    }
>>> +
>>> +/**
>>> + * amdgpu_vm_handle_fault - graceful handling of VM faults.
>>> + * @adev: amdgpu device pointer
>>> + * @pasid: PASID of the VM
>>> + * @addr: Address of the fault
>>> + *
>>> + * Try to gracefully handle a VM fault. Return true if the fault was 
>>> handled and
>>> + * shouldn't be reported any more.
>>> + */
>>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int 
>>> pasid,
>>> +                uint64_t addr)
>>> +{
>>> +    struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
>>> +    struct amdgpu_bo *root;
>>> +    uint64_t value, flags;
>>> +    struct amdgpu_vm *vm;
>>> +    long r;
>>> +
>>> +    if (!ring->sched.ready)
>>> +        return false;
>>> +
>>> +    spin_lock(&adev->vm_manager.pasid_lock);
>>> +    vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>>> +    if (vm)
>>> +        root = amdgpu_bo_ref(vm->root.base.bo);
>>> +    else
>>> +        root = NULL;
>>> +    spin_unlock(&adev->vm_manager.pasid_lock);
>>> +
>>> +    if (!root)
>>> +        return false;
>>> +
>>> +    r = amdgpu_bo_reserve(root, true);
>>> +    if (r)
>>> +        goto error_unref;
>>> +
>>> +    spin_lock(&adev->vm_manager.pasid_lock);
>>> +    vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>>> +    spin_unlock(&adev->vm_manager.pasid_lock);
>>> +
>> Here get vm from pasid second time, and check if PD bo is changed, is
>> this to handle vm fault race with vm destory?
> 
> Yes, exactly.
> 
>>
>>> +    if (!vm || vm->root.base.bo != root)
>>> +        goto error_unlock;
>>> +
>>> +    addr /= AMDGPU_GPU_PAGE_SIZE;
>>> +    flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>>> +        AMDGPU_PTE_SYSTEM;
>>> +
>>> +    if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
>>> +        /* Redirect the access to the dummy page */
>>> +        value = adev->dummy_page_addr;
>>> +        flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>>> +            AMDGPU_PTE_WRITEABLE;
>>> +    } else {
>>> +        value = 0;
>>> +    }
>>> +
>>> +    r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr 
>>> + 1,
>>> +                    flags, value, NULL, NULL);
>>> +    if (r)
>>> +        goto error_unlock;
>>> +
>> After fault address redirect to dummy page, will the fault recover and
>> retry continue to execute?
> 
> Yes, the read/write operation will just retry and use the value from the 
> dummy page instead.
> 
>> Is this dangerous to update PTE to use system
>> memory address 0?
> 
> What are you talking about? The dummy page is a page allocate by TTM 
> where we redirect faulty accesses to.
> 
For amdgpu_vm_fault_stop equals to AMDGPU_VM_FAULT_STOP_FIRST/ALWAYS 
case, value is 0, this will redirect to system memory 0. Maybe redirect 
is only needed for AMDGPU_VM_FAULT_STOP_NEVER?

Regards,
Philip

> Regards,
> Christian.
> 
>>
>>> +    r = amdgpu_vm_update_pdes(adev, vm, true);
>>> +
>>> +error_unlock:
>>> +    amdgpu_bo_unreserve(root);
>>> +    if (r < 0)
>>> +        DRM_ERROR("Can't handle page fault (%ld)\n", r);
>>> +
>>> +error_unref:
>>> +    amdgpu_bo_unref(&root);
>>> +
>>> +    return false;
>>> +}
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> index 0a97dc839f3b..4dbbe1b6b413 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> @@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct 
>>> amdgpu_device *adev);
>>>    void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned 
>>> int pasid,
>>>                     struct amdgpu_task_info *task_info);
>>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int 
>>> pasid,
>>> +                uint64_t addr);
>>>    void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> index 9d15679df6e0..15a1ce51befa 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> @@ -353,6 +353,10 @@ static int gmc_v9_0_process_interrupt(struct 
>>> amdgpu_device *adev,
>>>        }
>>>        /* If it's the first fault for this address, process it 
>>> normally */
>>> +    if (retry_fault && !in_interrupt() &&
>>> +        amdgpu_vm_handle_fault(adev, entry->pasid, addr))
>>> +        return 1; /* This also prevents sending it to KFD */
>>> +
>>>        if (!amdgpu_sriov_vf(adev)) {
>>>            /*
>>>             * Issue a dummy read to wait for the status register to
>>>
> 
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2
       [not found]             ` <3051c690-fabb-953a-3ae6-2cccfe8a502c-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2019-09-09 14:08               ` Zeng, Oak
       [not found]                 ` <BL0PR12MB2580535133C5BF8E771CE30680B70-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Zeng, Oak @ 2019-09-09 14:08 UTC (permalink / raw)
  To: Koenig, Christian, Kuehling, Felix,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Is looking up vm twice necessary? I think we are in interrupt context, is it possible that the user space application can be switched in between? My understanding is, if user space application is can't kick in during interrupt handling, application shouldn't have chance to exit (then their vm being destroyed).

Regards,
Oak

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian König
Sent: Monday, September 9, 2019 8:08 AM
To: Kuehling, Felix <Felix.Kuehling@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2

Am 05.09.19 um 00:47 schrieb Kuehling, Felix:
> On 2019-09-04 11:02 a.m., Christian König wrote:
>> Next step towards HMM support. For now just silence the retry fault 
>> and optionally redirect the request to the dummy page.
>>
>> v2: make sure the VM is not destroyed while we handle the fault.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74 ++++++++++++++++++++++++++
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
>>    drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
>>    3 files changed, 80 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 951608fc1925..410d89966a66 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
>>    		}
>>    	}
>>    }
>> +
>> +/**
>> + * amdgpu_vm_handle_fault - graceful handling of VM faults.
>> + * @adev: amdgpu device pointer
>> + * @pasid: PASID of the VM
>> + * @addr: Address of the fault
>> + *
>> + * Try to gracefully handle a VM fault. Return true if the fault was 
>> +handled and
>> + * shouldn't be reported any more.
>> + */
>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> +			    uint64_t addr)
>> +{
>> +	struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
>> +	struct amdgpu_bo *root;
>> +	uint64_t value, flags;
>> +	struct amdgpu_vm *vm;
>> +	long r;
>> +
>> +	if (!ring->sched.ready)
>> +		return false;
>> +
>> +	spin_lock(&adev->vm_manager.pasid_lock);
>> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>> +	if (vm)
>> +		root = amdgpu_bo_ref(vm->root.base.bo);
>> +	else
>> +		root = NULL;
>> +	spin_unlock(&adev->vm_manager.pasid_lock);
>> +
>> +	if (!root)
>> +		return false;
>> +
>> +	r = amdgpu_bo_reserve(root, true);
>> +	if (r)
>> +		goto error_unref;
>> +
>> +	spin_lock(&adev->vm_manager.pasid_lock);
>> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>> +	spin_unlock(&adev->vm_manager.pasid_lock);
> I think this deserves a comment. If I understand it correctly, you're 
> looking up the vm twice so that you have the VM root reservation to 
> protect against user-after-free. Otherwise the vm pointer is only 
> valid as long as you're holding the spin-lock.
>
>
>> +
>> +	if (!vm || vm->root.base.bo != root)
> The check of vm->root.base.bo should probably still be under the 
> spin_lock. Because you're not sure yet it's the right VM, you can't 
> rely on the reservation here to prevent use-after-free.

Good point, going to fix that.

>
>
>> +		goto error_unlock;
>> +
>> +	addr /= AMDGPU_GPU_PAGE_SIZE;
>> +	flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>> +		AMDGPU_PTE_SYSTEM;
>> +
>> +	if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
>> +		/* Redirect the access to the dummy page */
>> +		value = adev->dummy_page_addr;
>> +		flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>> +			AMDGPU_PTE_WRITEABLE;
>> +	} else {
>> +		value = 0;
>> +	}
>> +
>> +	r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
>> +					flags, value, NULL, NULL);
>> +	if (r)
>> +		goto error_unlock;
>> +
>> +	r = amdgpu_vm_update_pdes(adev, vm, true);
>> +
>> +error_unlock:
>> +	amdgpu_bo_unreserve(root);
>> +	if (r < 0)
>> +		DRM_ERROR("Can't handle page fault (%ld)\n", r);
>> +
>> +error_unref:
>> +	amdgpu_bo_unref(&root);
>> +
>> +	return false;
>> +}
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> index 0a97dc839f3b..4dbbe1b6b413 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> @@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct 
>> amdgpu_device *adev);
>>    
>>    void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int pasid,
>>    			     struct amdgpu_task_info *task_info);
>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> +			    uint64_t addr);
>>    
>>    void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
>>    
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> index 9d15679df6e0..15a1ce51befa 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> @@ -353,6 +353,10 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev,
>>    	}
>>    
>>    	/* If it's the first fault for this address, process it normally 
>> */
>> +	if (retry_fault && !in_interrupt() &&
>> +	    amdgpu_vm_handle_fault(adev, entry->pasid, addr))
>> +		return 1; /* This also prevents sending it to KFD */
> The !in_interrupt() is meant to only do this on the rerouted interrupt 
> ring that's handled by a worker function?

Yes, exactly. But I plan to add a workaround where the CPU redirects the fault to the other ring buffer for firmware versions which doesn't have that.

Adds quite a bunch of overhead on Vega10, because of the fault storm but should work fine on Vega20.

Key point is that we already released firmware without the redirection, but it's still better to have that than to run into the storm.

> Looks like amdgpu_vm_handle_fault never returns true for now. So we'll 
> never get to the "return 1" here.

Oh, yes that actually belongs into a follow up patch.

Thanks,
Christian.

>
> Regards,
>     Felix
>
>
>> +
>>    	if (!amdgpu_sriov_vf(adev)) {
>>    		/*
>>    		 * Issue a dummy read to wait for the status register to

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2
       [not found]                 ` <BL0PR12MB2580535133C5BF8E771CE30680B70-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2019-09-09 17:14                   ` Koenig, Christian
       [not found]                     ` <8d28a385-6ad2-20fe-12df-72b3fbf034e5-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Koenig, Christian @ 2019-09-09 17:14 UTC (permalink / raw)
  To: Zeng, Oak, Kuehling, Felix, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Well first of all we are not in interrupt context here, this is handled 
by a work item or otherwise we couldn't do all the locking.

But even in interrupt context another CPU can easily destroy the VM when 
we just handle a stale fault or the process was killed.

So this extra double checking is strictly necessary.

Regards,
Christian.

Am 09.09.19 um 16:08 schrieb Zeng, Oak:
> Is looking up vm twice necessary? I think we are in interrupt context, is it possible that the user space application can be switched in between? My understanding is, if user space application is can't kick in during interrupt handling, application shouldn't have chance to exit (then their vm being destroyed).
>
> Regards,
> Oak
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian König
> Sent: Monday, September 9, 2019 8:08 AM
> To: Kuehling, Felix <Felix.Kuehling@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2
>
> Am 05.09.19 um 00:47 schrieb Kuehling, Felix:
>> On 2019-09-04 11:02 a.m., Christian König wrote:
>>> Next step towards HMM support. For now just silence the retry fault
>>> and optionally redirect the request to the dummy page.
>>>
>>> v2: make sure the VM is not destroyed while we handle the fault.
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> ---
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74 ++++++++++++++++++++++++++
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
>>>     drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
>>>     3 files changed, 80 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 951608fc1925..410d89966a66 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
>>>     		}
>>>     	}
>>>     }
>>> +
>>> +/**
>>> + * amdgpu_vm_handle_fault - graceful handling of VM faults.
>>> + * @adev: amdgpu device pointer
>>> + * @pasid: PASID of the VM
>>> + * @addr: Address of the fault
>>> + *
>>> + * Try to gracefully handle a VM fault. Return true if the fault was
>>> +handled and
>>> + * shouldn't be reported any more.
>>> + */
>>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>>> +			    uint64_t addr)
>>> +{
>>> +	struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
>>> +	struct amdgpu_bo *root;
>>> +	uint64_t value, flags;
>>> +	struct amdgpu_vm *vm;
>>> +	long r;
>>> +
>>> +	if (!ring->sched.ready)
>>> +		return false;
>>> +
>>> +	spin_lock(&adev->vm_manager.pasid_lock);
>>> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>>> +	if (vm)
>>> +		root = amdgpu_bo_ref(vm->root.base.bo);
>>> +	else
>>> +		root = NULL;
>>> +	spin_unlock(&adev->vm_manager.pasid_lock);
>>> +
>>> +	if (!root)
>>> +		return false;
>>> +
>>> +	r = amdgpu_bo_reserve(root, true);
>>> +	if (r)
>>> +		goto error_unref;
>>> +
>>> +	spin_lock(&adev->vm_manager.pasid_lock);
>>> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>>> +	spin_unlock(&adev->vm_manager.pasid_lock);
>> I think this deserves a comment. If I understand it correctly, you're
>> looking up the vm twice so that you have the VM root reservation to
>> protect against user-after-free. Otherwise the vm pointer is only
>> valid as long as you're holding the spin-lock.
>>
>>
>>> +
>>> +	if (!vm || vm->root.base.bo != root)
>> The check of vm->root.base.bo should probably still be under the
>> spin_lock. Because you're not sure yet it's the right VM, you can't
>> rely on the reservation here to prevent use-after-free.
> Good point, going to fix that.
>
>>
>>> +		goto error_unlock;
>>> +
>>> +	addr /= AMDGPU_GPU_PAGE_SIZE;
>>> +	flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>>> +		AMDGPU_PTE_SYSTEM;
>>> +
>>> +	if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
>>> +		/* Redirect the access to the dummy page */
>>> +		value = adev->dummy_page_addr;
>>> +		flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>>> +			AMDGPU_PTE_WRITEABLE;
>>> +	} else {
>>> +		value = 0;
>>> +	}
>>> +
>>> +	r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
>>> +					flags, value, NULL, NULL);
>>> +	if (r)
>>> +		goto error_unlock;
>>> +
>>> +	r = amdgpu_vm_update_pdes(adev, vm, true);
>>> +
>>> +error_unlock:
>>> +	amdgpu_bo_unreserve(root);
>>> +	if (r < 0)
>>> +		DRM_ERROR("Can't handle page fault (%ld)\n", r);
>>> +
>>> +error_unref:
>>> +	amdgpu_bo_unref(&root);
>>> +
>>> +	return false;
>>> +}
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> index 0a97dc839f3b..4dbbe1b6b413 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> @@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct
>>> amdgpu_device *adev);
>>>     
>>>     void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int pasid,
>>>     			     struct amdgpu_task_info *task_info);
>>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>>> +			    uint64_t addr);
>>>     
>>>     void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
>>>     
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> index 9d15679df6e0..15a1ce51befa 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> @@ -353,6 +353,10 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev,
>>>     	}
>>>     
>>>     	/* If it's the first fault for this address, process it normally
>>> */
>>> +	if (retry_fault && !in_interrupt() &&
>>> +	    amdgpu_vm_handle_fault(adev, entry->pasid, addr))
>>> +		return 1; /* This also prevents sending it to KFD */
>> The !in_interrupt() is meant to only do this on the rerouted interrupt
>> ring that's handled by a worker function?
> Yes, exactly. But I plan to add a workaround where the CPU redirects the fault to the other ring buffer for firmware versions which doesn't have that.
>
> Adds quite a bunch of overhead on Vega10, because of the fault storm but should work fine on Vega20.
>
> Key point is that we already released firmware without the redirection, but it's still better to have that than to run into the storm.
>
>> Looks like amdgpu_vm_handle_fault never returns true for now. So we'll
>> never get to the "return 1" here.
> Oh, yes that actually belongs into a follow up patch.
>
> Thanks,
> Christian.
>
>> Regards,
>>      Felix
>>
>>
>>> +
>>>     	if (!amdgpu_sriov_vf(adev)) {
>>>     		/*
>>>     		 * Issue a dummy read to wait for the status register to
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2
       [not found]                 ` <4ebc27be-a403-0a0b-e9ff-4e5e18c5417c-5C7GfCeVMHo@public.gmane.org>
@ 2019-09-09 17:15                   ` Koenig, Christian
  0 siblings, 0 replies; 25+ messages in thread
From: Koenig, Christian @ 2019-09-09 17:15 UTC (permalink / raw)
  To: Yang, Philip, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 09.09.19 um 15:58 schrieb Yang, Philip:
>
> On 2019-09-09 8:03 a.m., Christian König wrote:
>> Am 04.09.19 um 22:12 schrieb Yang, Philip:
>>> This series looks nice and clear for me, two questions embedded below.
>>>
>>> Are we going to use dedicated sdma page queue for direct VM update path
>>> during a fault?
>>>
>>> Thanks,
>>> Philip
>>>
>>> On 2019-09-04 11:02 a.m., Christian König wrote:
>>>> Next step towards HMM support. For now just silence the retry fault and
>>>> optionally redirect the request to the dummy page.
>>>>
>>>> v2: make sure the VM is not destroyed while we handle the fault.
>>>>
>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>> ---
>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74
>>>> ++++++++++++++++++++++++++
>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
>>>>     drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
>>>>     3 files changed, 80 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> index 951608fc1925..410d89966a66 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> @@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm
>>>> *vm)
>>>>             }
>>>>         }
>>>>     }
>>>> +
>>>> +/**
>>>> + * amdgpu_vm_handle_fault - graceful handling of VM faults.
>>>> + * @adev: amdgpu device pointer
>>>> + * @pasid: PASID of the VM
>>>> + * @addr: Address of the fault
>>>> + *
>>>> + * Try to gracefully handle a VM fault. Return true if the fault was
>>>> handled and
>>>> + * shouldn't be reported any more.
>>>> + */
>>>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int
>>>> pasid,
>>>> +                uint64_t addr)
>>>> +{
>>>> +    struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
>>>> +    struct amdgpu_bo *root;
>>>> +    uint64_t value, flags;
>>>> +    struct amdgpu_vm *vm;
>>>> +    long r;
>>>> +
>>>> +    if (!ring->sched.ready)
>>>> +        return false;
>>>> +
>>>> +    spin_lock(&adev->vm_manager.pasid_lock);
>>>> +    vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>>>> +    if (vm)
>>>> +        root = amdgpu_bo_ref(vm->root.base.bo);
>>>> +    else
>>>> +        root = NULL;
>>>> +    spin_unlock(&adev->vm_manager.pasid_lock);
>>>> +
>>>> +    if (!root)
>>>> +        return false;
>>>> +
>>>> +    r = amdgpu_bo_reserve(root, true);
>>>> +    if (r)
>>>> +        goto error_unref;
>>>> +
>>>> +    spin_lock(&adev->vm_manager.pasid_lock);
>>>> +    vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>>>> +    spin_unlock(&adev->vm_manager.pasid_lock);
>>>> +
>>> Here get vm from pasid second time, and check if PD bo is changed, is
>>> this to handle vm fault race with vm destory?
>> Yes, exactly.
>>
>>>> +    if (!vm || vm->root.base.bo != root)
>>>> +        goto error_unlock;
>>>> +
>>>> +    addr /= AMDGPU_GPU_PAGE_SIZE;
>>>> +    flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>>>> +        AMDGPU_PTE_SYSTEM;
>>>> +
>>>> +    if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
>>>> +        /* Redirect the access to the dummy page */
>>>> +        value = adev->dummy_page_addr;
>>>> +        flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>>>> +            AMDGPU_PTE_WRITEABLE;
>>>> +    } else {
>>>> +        value = 0;
>>>> +    }
>>>> +
>>>> +    r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr
>>>> + 1,
>>>> +                    flags, value, NULL, NULL);
>>>> +    if (r)
>>>> +        goto error_unlock;
>>>> +
>>> After fault address redirect to dummy page, will the fault recover and
>>> retry continue to execute?
>> Yes, the read/write operation will just retry and use the value from the
>> dummy page instead.
>>
>>> Is this dangerous to update PTE to use system
>>> memory address 0?
>> What are you talking about? The dummy page is a page allocate by TTM
>> where we redirect faulty accesses to.
>>
> For amdgpu_vm_fault_stop equals to AMDGPU_VM_FAULT_STOP_FIRST/ALWAYS
> case, value is 0, this will redirect to system memory 0. Maybe redirect
> is only needed for AMDGPU_VM_FAULT_STOP_NEVER?

The value 0 doesn't redirect to system memory, it results in a silence 
retry when neither the R nor the W bit is set in a PTE.

Regards,
Christian.

>
> Regards,
> Philip
>
>> Regards,
>> Christian.
>>
>>>> +    r = amdgpu_vm_update_pdes(adev, vm, true);
>>>> +
>>>> +error_unlock:
>>>> +    amdgpu_bo_unreserve(root);
>>>> +    if (r < 0)
>>>> +        DRM_ERROR("Can't handle page fault (%ld)\n", r);
>>>> +
>>>> +error_unref:
>>>> +    amdgpu_bo_unref(&root);
>>>> +
>>>> +    return false;
>>>> +}
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> index 0a97dc839f3b..4dbbe1b6b413 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> @@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct
>>>> amdgpu_device *adev);
>>>>     void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned
>>>> int pasid,
>>>>                      struct amdgpu_task_info *task_info);
>>>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int
>>>> pasid,
>>>> +                uint64_t addr);
>>>>     void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>>> index 9d15679df6e0..15a1ce51befa 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>>> @@ -353,6 +353,10 @@ static int gmc_v9_0_process_interrupt(struct
>>>> amdgpu_device *adev,
>>>>         }
>>>>         /* If it's the first fault for this address, process it
>>>> normally */
>>>> +    if (retry_fault && !in_interrupt() &&
>>>> +        amdgpu_vm_handle_fault(adev, entry->pasid, addr))
>>>> +        return 1; /* This also prevents sending it to KFD */
>>>> +
>>>>         if (!amdgpu_sriov_vf(adev)) {
>>>>             /*
>>>>              * Issue a dummy read to wait for the status register to
>>>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2
       [not found]                     ` <8d28a385-6ad2-20fe-12df-72b3fbf034e5-5C7GfCeVMHo@public.gmane.org>
@ 2019-09-10 13:30                       ` Zeng, Oak
       [not found]                         ` <BL0PR12MB258025DD4004AB34CB02A1AB80B60-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Zeng, Oak @ 2019-09-10 13:30 UTC (permalink / raw)
  To: Koenig, Christian, Kuehling, Felix,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW



Regards,
Oak

-----Original Message-----
From: Koenig, Christian <Christian.Koenig@amd.com> 
Sent: Monday, September 9, 2019 1:14 PM
To: Zeng, Oak <Oak.Zeng@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2

> Well first of all we are not in interrupt context here, this is handled by a work item or otherwise we couldn't do all the locking.

This is called from amdgpu_irq_handler. I think this is interrupt context. This is also the reason why we use spin lock instead of other sleepable lock like a semaphore.

> But even in interrupt context another CPU can easily destroy the VM when we just handle a stale fault or the process was killed.

Agree with this point.

So this extra double checking is strictly necessary.

Regards,
Christian.

Am 09.09.19 um 16:08 schrieb Zeng, Oak:
> Is looking up vm twice necessary? I think we are in interrupt context, is it possible that the user space application can be switched in between? My understanding is, if user space application is can't kick in during interrupt handling, application shouldn't have chance to exit (then their vm being destroyed).
>
> Regards,
> Oak
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of 
> Christian König
> Sent: Monday, September 9, 2019 8:08 AM
> To: Kuehling, Felix <Felix.Kuehling@amd.com>; 
> amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2
>
> Am 05.09.19 um 00:47 schrieb Kuehling, Felix:
>> On 2019-09-04 11:02 a.m., Christian König wrote:
>>> Next step towards HMM support. For now just silence the retry fault 
>>> and optionally redirect the request to the dummy page.
>>>
>>> v2: make sure the VM is not destroyed while we handle the fault.
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> ---
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74 ++++++++++++++++++++++++++
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
>>>     drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
>>>     3 files changed, 80 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 951608fc1925..410d89966a66 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
>>>     		}
>>>     	}
>>>     }
>>> +
>>> +/**
>>> + * amdgpu_vm_handle_fault - graceful handling of VM faults.
>>> + * @adev: amdgpu device pointer
>>> + * @pasid: PASID of the VM
>>> + * @addr: Address of the fault
>>> + *
>>> + * Try to gracefully handle a VM fault. Return true if the fault 
>>> +was handled and
>>> + * shouldn't be reported any more.
>>> + */
>>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>>> +			    uint64_t addr)
>>> +{
>>> +	struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
>>> +	struct amdgpu_bo *root;
>>> +	uint64_t value, flags;
>>> +	struct amdgpu_vm *vm;
>>> +	long r;
>>> +
>>> +	if (!ring->sched.ready)
>>> +		return false;
>>> +
>>> +	spin_lock(&adev->vm_manager.pasid_lock);
>>> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>>> +	if (vm)
>>> +		root = amdgpu_bo_ref(vm->root.base.bo);
>>> +	else
>>> +		root = NULL;
>>> +	spin_unlock(&adev->vm_manager.pasid_lock);
>>> +
>>> +	if (!root)
>>> +		return false;
>>> +
>>> +	r = amdgpu_bo_reserve(root, true);
>>> +	if (r)
>>> +		goto error_unref;
>>> +
>>> +	spin_lock(&adev->vm_manager.pasid_lock);
>>> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>>> +	spin_unlock(&adev->vm_manager.pasid_lock);
>> I think this deserves a comment. If I understand it correctly, you're 
>> looking up the vm twice so that you have the VM root reservation to 
>> protect against user-after-free. Otherwise the vm pointer is only 
>> valid as long as you're holding the spin-lock.
>>
>>
>>> +
>>> +	if (!vm || vm->root.base.bo != root)
>> The check of vm->root.base.bo should probably still be under the 
>> spin_lock. Because you're not sure yet it's the right VM, you can't 
>> rely on the reservation here to prevent use-after-free.
> Good point, going to fix that.
>
>>
>>> +		goto error_unlock;
>>> +
>>> +	addr /= AMDGPU_GPU_PAGE_SIZE;
>>> +	flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>>> +		AMDGPU_PTE_SYSTEM;
>>> +
>>> +	if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
>>> +		/* Redirect the access to the dummy page */
>>> +		value = adev->dummy_page_addr;
>>> +		flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>>> +			AMDGPU_PTE_WRITEABLE;
>>> +	} else {
>>> +		value = 0;
>>> +	}
>>> +
>>> +	r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
>>> +					flags, value, NULL, NULL);
>>> +	if (r)
>>> +		goto error_unlock;
>>> +
>>> +	r = amdgpu_vm_update_pdes(adev, vm, true);
>>> +
>>> +error_unlock:
>>> +	amdgpu_bo_unreserve(root);
>>> +	if (r < 0)
>>> +		DRM_ERROR("Can't handle page fault (%ld)\n", r);
>>> +
>>> +error_unref:
>>> +	amdgpu_bo_unref(&root);
>>> +
>>> +	return false;
>>> +}
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> index 0a97dc839f3b..4dbbe1b6b413 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> @@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct
>>> amdgpu_device *adev);
>>>     
>>>     void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int pasid,
>>>     			     struct amdgpu_task_info *task_info);
>>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>>> +			    uint64_t addr);
>>>     
>>>     void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
>>>     
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> index 9d15679df6e0..15a1ce51befa 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> @@ -353,6 +353,10 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev,
>>>     	}
>>>     
>>>     	/* If it's the first fault for this address, process it 
>>> normally */
>>> +	if (retry_fault && !in_interrupt() &&
>>> +	    amdgpu_vm_handle_fault(adev, entry->pasid, addr))
>>> +		return 1; /* This also prevents sending it to KFD */
>> The !in_interrupt() is meant to only do this on the rerouted 
>> interrupt ring that's handled by a worker function?
> Yes, exactly. But I plan to add a workaround where the CPU redirects the fault to the other ring buffer for firmware versions which doesn't have that.
>
> Adds quite a bunch of overhead on Vega10, because of the fault storm but should work fine on Vega20.
>
> Key point is that we already released firmware without the redirection, but it's still better to have that than to run into the storm.
>
>> Looks like amdgpu_vm_handle_fault never returns true for now. So 
>> we'll never get to the "return 1" here.
> Oh, yes that actually belongs into a follow up patch.
>
> Thanks,
> Christian.
>
>> Regards,
>>      Felix
>>
>>
>>> +
>>>     	if (!amdgpu_sriov_vf(adev)) {
>>>     		/*
>>>     		 * Issue a dummy read to wait for the status register to
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2
       [not found]                         ` <BL0PR12MB258025DD4004AB34CB02A1AB80B60-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2019-09-11  9:31                           ` Koenig, Christian
  0 siblings, 0 replies; 25+ messages in thread
From: Koenig, Christian @ 2019-09-11  9:31 UTC (permalink / raw)
  To: Zeng, Oak, Kuehling, Felix, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 10.09.19 um 15:30 schrieb Zeng, Oak:
>
> Regards,
> Oak
>
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Monday, September 9, 2019 1:14 PM
> To: Zeng, Oak <Oak.Zeng@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2
>
>> Well first of all we are not in interrupt context here, this is handled by a work item or otherwise we couldn't do all the locking.
> This is called from amdgpu_irq_handler. I think this is interrupt context.

No, that's incorrect. Only the primary interrupt ring is handled in 
interrupt context. Ring 1 and 2 are handled with a work item.

>   This is also the reason why we use spin lock instead of other sleepable lock like a semaphore.

Well we do have a sleep-able lock here to make it possible to update 
page tables and directories.

That's also the reason why we protect the call with in !in_interrupt().

Regards,
Christian.

>
>> But even in interrupt context another CPU can easily destroy the VM when we just handle a stale fault or the process was killed.
> Agree with this point.
>
> So this extra double checking is strictly necessary.
>
> Regards,
> Christian.
>
> Am 09.09.19 um 16:08 schrieb Zeng, Oak:
>> Is looking up vm twice necessary? I think we are in interrupt context, is it possible that the user space application can be switched in between? My understanding is, if user space application is can't kick in during interrupt handling, application shouldn't have chance to exit (then their vm being destroyed).
>>
>> Regards,
>> Oak
>>
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
>> Christian König
>> Sent: Monday, September 9, 2019 8:08 AM
>> To: Kuehling, Felix <Felix.Kuehling@amd.com>;
>> amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2
>>
>> Am 05.09.19 um 00:47 schrieb Kuehling, Felix:
>>> On 2019-09-04 11:02 a.m., Christian König wrote:
>>>> Next step towards HMM support. For now just silence the retry fault
>>>> and optionally redirect the request to the dummy page.
>>>>
>>>> v2: make sure the VM is not destroyed while we handle the fault.
>>>>
>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>> ---
>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74 ++++++++++++++++++++++++++
>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
>>>>      drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
>>>>      3 files changed, 80 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> index 951608fc1925..410d89966a66 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> @@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
>>>>      		}
>>>>      	}
>>>>      }
>>>> +
>>>> +/**
>>>> + * amdgpu_vm_handle_fault - graceful handling of VM faults.
>>>> + * @adev: amdgpu device pointer
>>>> + * @pasid: PASID of the VM
>>>> + * @addr: Address of the fault
>>>> + *
>>>> + * Try to gracefully handle a VM fault. Return true if the fault
>>>> +was handled and
>>>> + * shouldn't be reported any more.
>>>> + */
>>>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>>>> +			    uint64_t addr)
>>>> +{
>>>> +	struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
>>>> +	struct amdgpu_bo *root;
>>>> +	uint64_t value, flags;
>>>> +	struct amdgpu_vm *vm;
>>>> +	long r;
>>>> +
>>>> +	if (!ring->sched.ready)
>>>> +		return false;
>>>> +
>>>> +	spin_lock(&adev->vm_manager.pasid_lock);
>>>> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>>>> +	if (vm)
>>>> +		root = amdgpu_bo_ref(vm->root.base.bo);
>>>> +	else
>>>> +		root = NULL;
>>>> +	spin_unlock(&adev->vm_manager.pasid_lock);
>>>> +
>>>> +	if (!root)
>>>> +		return false;
>>>> +
>>>> +	r = amdgpu_bo_reserve(root, true);
>>>> +	if (r)
>>>> +		goto error_unref;
>>>> +
>>>> +	spin_lock(&adev->vm_manager.pasid_lock);
>>>> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>>>> +	spin_unlock(&adev->vm_manager.pasid_lock);
>>> I think this deserves a comment. If I understand it correctly, you're
>>> looking up the vm twice so that you have the VM root reservation to
>>> protect against user-after-free. Otherwise the vm pointer is only
>>> valid as long as you're holding the spin-lock.
>>>
>>>
>>>> +
>>>> +	if (!vm || vm->root.base.bo != root)
>>> The check of vm->root.base.bo should probably still be under the
>>> spin_lock. Because you're not sure yet it's the right VM, you can't
>>> rely on the reservation here to prevent use-after-free.
>> Good point, going to fix that.
>>
>>>> +		goto error_unlock;
>>>> +
>>>> +	addr /= AMDGPU_GPU_PAGE_SIZE;
>>>> +	flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>>>> +		AMDGPU_PTE_SYSTEM;
>>>> +
>>>> +	if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
>>>> +		/* Redirect the access to the dummy page */
>>>> +		value = adev->dummy_page_addr;
>>>> +		flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>>>> +			AMDGPU_PTE_WRITEABLE;
>>>> +	} else {
>>>> +		value = 0;
>>>> +	}
>>>> +
>>>> +	r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
>>>> +					flags, value, NULL, NULL);
>>>> +	if (r)
>>>> +		goto error_unlock;
>>>> +
>>>> +	r = amdgpu_vm_update_pdes(adev, vm, true);
>>>> +
>>>> +error_unlock:
>>>> +	amdgpu_bo_unreserve(root);
>>>> +	if (r < 0)
>>>> +		DRM_ERROR("Can't handle page fault (%ld)\n", r);
>>>> +
>>>> +error_unref:
>>>> +	amdgpu_bo_unref(&root);
>>>> +
>>>> +	return false;
>>>> +}
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> index 0a97dc839f3b..4dbbe1b6b413 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> @@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct
>>>> amdgpu_device *adev);
>>>>      
>>>>      void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int pasid,
>>>>      			     struct amdgpu_task_info *task_info);
>>>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>>>> +			    uint64_t addr);
>>>>      
>>>>      void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
>>>>      
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>>> index 9d15679df6e0..15a1ce51befa 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>>> @@ -353,6 +353,10 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev,
>>>>      	}
>>>>      
>>>>      	/* If it's the first fault for this address, process it
>>>> normally */
>>>> +	if (retry_fault && !in_interrupt() &&
>>>> +	    amdgpu_vm_handle_fault(adev, entry->pasid, addr))
>>>> +		return 1; /* This also prevents sending it to KFD */
>>> The !in_interrupt() is meant to only do this on the rerouted
>>> interrupt ring that's handled by a worker function?
>> Yes, exactly. But I plan to add a workaround where the CPU redirects the fault to the other ring buffer for firmware versions which doesn't have that.
>>
>> Adds quite a bunch of overhead on Vega10, because of the fault storm but should work fine on Vega20.
>>
>> Key point is that we already released firmware without the redirection, but it's still better to have that than to run into the storm.
>>
>>> Looks like amdgpu_vm_handle_fault never returns true for now. So
>>> we'll never get to the "return 1" here.
>> Oh, yes that actually belongs into a follow up patch.
>>
>> Thanks,
>> Christian.
>>
>>> Regards,
>>>       Felix
>>>
>>>
>>>> +
>>>>      	if (!amdgpu_sriov_vf(adev)) {
>>>>      		/*
>>>>      		 * Issue a dummy read to wait for the status register to
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2019-09-11  9:31 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-04 15:02 Graceful page fault handling for Vega/Navi Christian König
     [not found] ` <20190904150230.13885-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2019-09-04 15:02   ` [PATCH 1/9] drm/ttm: return -EBUSY on pipelining with no_gpu_wait Christian König
2019-09-04 15:02   ` [PATCH 2/9] drm/amdgpu: split the VM entity into direct and delayed Christian König
2019-09-04 15:02   ` [PATCH 3/9] drm/amdgpu: allow direct submission in the VM backends v2 Christian König
2019-09-04 15:02   ` [PATCH 4/9] drm/amdgpu: allow direct submission of PDE updates Christian König
     [not found]     ` <20190904150230.13885-5-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2019-09-04 22:48       ` Kuehling, Felix
2019-09-04 15:02   ` [PATCH 5/9] drm/amdgpu: allow direct submission of PTE updates Christian König
2019-09-04 15:02   ` [PATCH 6/9] drm/amdgpu: allow direct submission of clears Christian König
2019-09-04 15:02   ` [PATCH 7/9] drm/amdgpu: allocate PDs/PTs with no_gpu_wait in a page fault Christian König
2019-09-04 15:02   ` [PATCH 8/9] drm/amdgpu: reserve the root PD while freeing PASIDs Christian König
2019-09-04 15:02   ` [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2 Christian König
     [not found]     ` <20190904150230.13885-10-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2019-09-04 20:12       ` Yang, Philip
     [not found]         ` <844c2c90-8be4-df05-9df9-c9c9cde9b186-5C7GfCeVMHo@public.gmane.org>
2019-09-09 12:03           ` Christian König
     [not found]             ` <db73b357-b7e4-f818-e57c-234a45f7c5cb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-09-09 13:58               ` Yang, Philip
     [not found]                 ` <4ebc27be-a403-0a0b-e9ff-4e5e18c5417c-5C7GfCeVMHo@public.gmane.org>
2019-09-09 17:15                   ` Koenig, Christian
2019-09-04 22:47       ` Kuehling, Felix
     [not found]         ` <4b5365d3-cc8c-1c2a-2675-f74baa4b9e8b-5C7GfCeVMHo@public.gmane.org>
2019-09-09 12:07           ` Christian König
     [not found]             ` <3051c690-fabb-953a-3ae6-2cccfe8a502c-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-09-09 14:08               ` Zeng, Oak
     [not found]                 ` <BL0PR12MB2580535133C5BF8E771CE30680B70-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-09-09 17:14                   ` Koenig, Christian
     [not found]                     ` <8d28a385-6ad2-20fe-12df-72b3fbf034e5-5C7GfCeVMHo@public.gmane.org>
2019-09-10 13:30                       ` Zeng, Oak
     [not found]                         ` <BL0PR12MB258025DD4004AB34CB02A1AB80B60-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-09-11  9:31                           ` Koenig, Christian
2019-09-04 22:52   ` Graceful page fault handling for Vega/Navi Kuehling, Felix
     [not found]     ` <03e2274c-a0dd-41f4-c5e0-26e371d01d23-5C7GfCeVMHo@public.gmane.org>
2019-09-09 12:09       ` Christian König
2019-09-04 23:03   ` Huang, Ray
     [not found]     ` <MN2PR12MB3309C53565E6D4E2E9FE23C7ECB80-rweVpJHSKTpWdvXm18W95QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-09-04 23:28       ` Kuehling, Felix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.