All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow
@ 2021-11-09 23:04 Philip Yang
  2021-11-09 23:04 ` [PATCH 2/5] drm/amdkfd: check child range to drain retry fault Philip Yang
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Philip Yang @ 2021-11-09 23:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling

IH ring1 is used to process GPU retry fault, overflow is enabled to
drain retry fault before unmapping the range, wptr may pass rptr,
amdgpu_ih_process should check rptr equals to the latest wptr to exit,
otherwise it will continue to recover outdatad retry fault after drain
retry fault is done, and generate false GPU vm fault because range is
unmapped from cpu.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
index f3d62e196901..d1ef61811169 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
@@ -223,7 +223,7 @@ int amdgpu_ih_wait_on_checkpoint_process(struct amdgpu_device *adev,
  */
 int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih)
 {
-	unsigned int count = AMDGPU_IH_MAX_NUM_IVS;
+	unsigned int count;
 	u32 wptr;
 
 	if (!ih->enabled || adev->shutdown)
@@ -232,6 +232,8 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih)
 	wptr = amdgpu_ih_get_wptr(adev, ih);
 
 restart_ih:
+	count = AMDGPU_IH_MAX_NUM_IVS;
+
 	DRM_DEBUG("%s: rptr %d, wptr %d\n", __func__, ih->rptr, wptr);
 
 	/* Order reading of wptr vs. reading of IH ring data */
@@ -240,6 +242,9 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih)
 	while (ih->rptr != wptr && --count) {
 		amdgpu_irq_dispatch(adev, ih);
 		ih->rptr &= ih->ptr_mask;
+
+		if (ih == &adev->irq.ih1)
+			wptr = amdgpu_ih_get_wptr(adev, ih);
 	}
 
 	amdgpu_ih_set_rptr(adev, ih);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/5] drm/amdkfd: check child range to drain retry fault
  2021-11-09 23:04 [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow Philip Yang
@ 2021-11-09 23:04 ` Philip Yang
  2021-11-10  3:26   ` Felix Kuehling
  2021-11-09 23:04 ` [PATCH 3/5] drm/amdkfd: restore pages race with vma remove Philip Yang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Philip Yang @ 2021-11-09 23:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling

If unmapping partial range, the parent prange list op is update
notifier, child range list op is unmap range, need check child range to
set drain retry fault flag.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 77239b06b236..64f642935600 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -2049,8 +2049,19 @@ svm_range_add_list_work(struct svm_range_list *svms, struct svm_range *prange,
 	 * before the range is freed to avoid straggler interrupts on
 	 * unmapped memory causing "phantom faults".
 	 */
-	if (op == SVM_OP_UNMAP_RANGE)
+	if (op == SVM_OP_UNMAP_RANGE) {
+		pr_debug("set range drain_pagefaults true\n");
 		svms->drain_pagefaults = true;
+	} else {
+		struct svm_range *pchild;
+
+		list_for_each_entry(pchild, &prange->child_list, child_list)
+			if (pchild->work_item.op == SVM_OP_UNMAP_RANGE) {
+				pr_debug("set child drain_pagefaults true\n");
+				svms->drain_pagefaults = true;
+			}
+	}
+
 	/* if prange is on the deferred list */
 	if (!list_empty(&prange->deferred_list)) {
 		pr_debug("update exist prange 0x%p work op %d\n", prange, op);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/5] drm/amdkfd: restore pages race with vma remove
  2021-11-09 23:04 [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow Philip Yang
  2021-11-09 23:04 ` [PATCH 2/5] drm/amdkfd: check child range to drain retry fault Philip Yang
@ 2021-11-09 23:04 ` Philip Yang
  2021-11-10  3:52   ` Felix Kuehling
  2021-11-09 23:04 ` [PATCH 4/5] drm/amdkfd: restore pages race with process termination Philip Yang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Philip Yang @ 2021-11-09 23:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling

Before restore pages takes mmap read or write lock, vma maybe removed.
Check if vma exists before creating unregistered range or verifying
range access permission, and return 0 if vma is removed to avoid restore
pages return failure to report GPU vm fault to application.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 64 ++++++++++++++++------------
 1 file changed, 37 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 64f642935600..8f77d5746b2c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -2336,20 +2336,13 @@ svm_range_best_restore_location(struct svm_range *prange,
 }
 
 static int
-svm_range_get_range_boundaries(struct kfd_process *p, int64_t addr,
-			       unsigned long *start, unsigned long *last,
-			       bool *is_heap_stack)
+svm_range_get_range_boundaries(struct kfd_process *p, struct vm_area_struct *vma,
+			       int64_t addr, unsigned long *start,
+			       unsigned long *last, bool *is_heap_stack)
 {
-	struct vm_area_struct *vma;
 	struct interval_tree_node *node;
 	unsigned long start_limit, end_limit;
 
-	vma = find_vma(p->mm, addr << PAGE_SHIFT);
-	if (!vma || (addr << PAGE_SHIFT) < vma->vm_start) {
-		pr_debug("VMA does not exist in address [0x%llx]\n", addr);
-		return -EFAULT;
-	}
-
 	*is_heap_stack = (vma->vm_start <= vma->vm_mm->brk &&
 			  vma->vm_end >= vma->vm_mm->start_brk) ||
 			 (vma->vm_start <= vma->vm_mm->start_stack &&
@@ -2444,9 +2437,10 @@ svm_range_check_vm_userptr(struct kfd_process *p, uint64_t start, uint64_t last,
 
 static struct
 svm_range *svm_range_create_unregistered_range(struct amdgpu_device *adev,
-						struct kfd_process *p,
-						struct mm_struct *mm,
-						int64_t addr)
+					       struct kfd_process *p,
+					       struct mm_struct *mm,
+					       struct vm_area_struct *vma,
+					       int64_t addr)
 {
 	struct svm_range *prange = NULL;
 	unsigned long start, last;
@@ -2456,7 +2450,7 @@ svm_range *svm_range_create_unregistered_range(struct amdgpu_device *adev,
 	uint64_t bo_l = 0;
 	int r;
 
-	if (svm_range_get_range_boundaries(p, addr, &start, &last,
+	if (svm_range_get_range_boundaries(p, vma, addr, &start, &last,
 					   &is_heap_stack))
 		return NULL;
 
@@ -2558,21 +2552,22 @@ svm_range_count_fault(struct amdgpu_device *adev, struct kfd_process *p,
 		WRITE_ONCE(pdd->faults, pdd->faults + 1);
 }
 
-static bool
-svm_fault_allowed(struct mm_struct *mm, uint64_t addr, bool write_fault)
+static int
+svm_fault_allowed(struct mm_struct *mm, struct vm_area_struct *vma,
+		  uint64_t addr, bool write_fault)
 {
 	unsigned long requested = VM_READ;
-	struct vm_area_struct *vma;
 
 	if (write_fault)
 		requested |= VM_WRITE;
 
-	vma = find_vma(mm, addr << PAGE_SHIFT);
-	if (!vma || (addr << PAGE_SHIFT) < vma->vm_start) {
-		pr_debug("address 0x%llx VMA is removed\n", addr);
-		return true;
+	if (!vma) {
+		vma = find_vma(mm, addr << PAGE_SHIFT);
+		if (!vma || (addr << PAGE_SHIFT) < vma->vm_start) {
+			pr_debug("address 0x%llx VMA is removed\n", addr);
+			return -EFAULT;
+		}
 	}
-
 	pr_debug("requested 0x%lx, vma permission flags 0x%lx\n", requested,
 		vma->vm_flags);
 	return (vma->vm_flags & requested) == requested;
@@ -2590,6 +2585,7 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
 	int32_t best_loc;
 	int32_t gpuidx = MAX_GPU_INSTANCE;
 	bool write_locked = false;
+	struct vm_area_struct *vma = NULL;
 	int r = 0;
 
 	if (!KFD_IS_SVM_API_SUPPORTED(adev->kfd.dev)) {
@@ -2636,7 +2632,15 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
 			write_locked = true;
 			goto retry_write_locked;
 		}
-		prange = svm_range_create_unregistered_range(adev, p, mm, addr);
+
+		vma = find_vma(p->mm, addr << PAGE_SHIFT);
+		if (!vma || (addr << PAGE_SHIFT) < vma->vm_start) {
+			pr_debug("VMA not found address [0x%llx]\n", addr);
+			mmap_write_downgrade(mm);
+			r = 0;
+			goto out_unlock_svms;
+		}
+		prange = svm_range_create_unregistered_range(adev, p, mm, vma, addr);
 		if (!prange) {
 			pr_debug("failed to create unregistered range svms 0x%p address [0x%llx]\n",
 				 svms, addr);
@@ -2663,10 +2667,16 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
 		goto out_unlock_range;
 	}
 
-	if (!svm_fault_allowed(mm, addr, write_fault)) {
-		pr_debug("fault addr 0x%llx no %s permission\n", addr,
-			write_fault ? "write" : "read");
-		r = -EPERM;
+	r = svm_fault_allowed(mm, vma, addr, write_fault);
+	if (r <= 0) {
+		if (!r) {
+			pr_debug("fault addr 0x%llx no %s permission\n", addr,
+				write_fault ? "write" : "read");
+			r = -EPERM;
+		} else  {
+			pr_debug("fault addr 0x%llx is unmapping\n", addr);
+			r = 0;
+		}
 		goto out_unlock_range;
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/5] drm/amdkfd: restore pages race with process termination
  2021-11-09 23:04 [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow Philip Yang
  2021-11-09 23:04 ` [PATCH 2/5] drm/amdkfd: check child range to drain retry fault Philip Yang
  2021-11-09 23:04 ` [PATCH 3/5] drm/amdkfd: restore pages race with vma remove Philip Yang
@ 2021-11-09 23:04 ` Philip Yang
  2021-11-10  4:16   ` Felix Kuehling
  2021-11-09 23:04 ` [PATCH 5/5] drm/amdkfd: svm deferred work pin mm Philip Yang
  2021-11-10 10:15 ` [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow Christian König
  4 siblings, 1 reply; 24+ messages in thread
From: Philip Yang @ 2021-11-09 23:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling

restore pages work can not find kfd process or mm struct if process is
destroyed before drain retry fault work schedule to run, this is not
failure, return 0 to avoid dump GPU vm fault kernel log.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 8f77d5746b2c..2083a10b35c2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -2596,7 +2596,7 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
 	p = kfd_lookup_process_by_pasid(pasid);
 	if (!p) {
 		pr_debug("kfd process not founded pasid 0x%x\n", pasid);
-		return -ESRCH;
+		return 0;
 	}
 	if (!p->xnack_enabled) {
 		pr_debug("XNACK not enabled for pasid 0x%x\n", pasid);
@@ -2610,7 +2610,7 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
 	mm = get_task_mm(p->lead_thread);
 	if (!mm) {
 		pr_debug("svms 0x%p failed to get mm\n", svms);
-		r = -ESRCH;
+		r = 0;
 		goto out;
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/5] drm/amdkfd: svm deferred work pin mm
  2021-11-09 23:04 [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow Philip Yang
                   ` (2 preceding siblings ...)
  2021-11-09 23:04 ` [PATCH 4/5] drm/amdkfd: restore pages race with process termination Philip Yang
@ 2021-11-09 23:04 ` Philip Yang
  2021-11-10  4:51   ` Felix Kuehling
  2021-11-10 10:15 ` [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow Christian König
  4 siblings, 1 reply; 24+ messages in thread
From: Philip Yang @ 2021-11-09 23:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling

Make sure mm does not remove when prange deferred work insert mmu range
notifier, to avoid WARNING:

WARNING: CPU: 6 PID: 1787 at mm/mmu_notifier.c:932 __mmu_interval_notifier_insert+0xdd/0xf0
Workqueue: events svm_range_deferred_list_work [amdgpu]
RIP: 0010:__mmu_interval_notifier_insert+0xdd/0xf0
Call Trace:
  svm_range_deferred_list_work+0x156/0x320 [amdgpu]
  process_one_work+0x29f/0x5e0
  worker_thread+0x39/0x3e0

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 2083a10b35c2..fddf0a93d6f1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1992,6 +1992,13 @@ static void svm_range_deferred_list_work(struct work_struct *work)
 			 prange->start, prange->last, prange->work_item.op);
 
 		mm = prange->work_item.mm;
+		if (!mmget_not_zero(mm)) {
+			pr_debug("skip range %p as mm is gone\n", prange);
+			spin_lock(&svms->deferred_list_lock);
+			list_del_init(&prange->deferred_list);
+			continue;
+		}
+
 retry:
 		mmap_write_lock(mm);
 		mutex_lock(&svms->lock);
@@ -2032,6 +2039,7 @@ static void svm_range_deferred_list_work(struct work_struct *work)
 		svm_range_handle_list_op(svms, prange);
 		mutex_unlock(&svms->lock);
 		mmap_write_unlock(mm);
+		mmput(mm);
 
 		spin_lock(&svms->deferred_list_lock);
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/5] drm/amdkfd: check child range to drain retry fault
  2021-11-09 23:04 ` [PATCH 2/5] drm/amdkfd: check child range to drain retry fault Philip Yang
@ 2021-11-10  3:26   ` Felix Kuehling
  2021-11-11 21:55     ` philip yang
  0 siblings, 1 reply; 24+ messages in thread
From: Felix Kuehling @ 2021-11-10  3:26 UTC (permalink / raw)
  To: Philip Yang, amd-gfx


On 2021-11-09 6:04 p.m., Philip Yang wrote:
> If unmapping partial range, the parent prange list op is update
> notifier, child range list op is unmap range, need check child range to
> set drain retry fault flag.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>

I think this could be simplified by simply setting 
svms->drain_pagefaults in svm_range_unmap_from_cpu. The mmap lock 
ensures that this is serialized with the deferred list worker reading 
and clearing svms->drain_pagefaults. You can also use READ_ONCE and 
WRITE_ONCE to be safe.

Regards,
   Felix


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 13 ++++++++++++-
>   1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 77239b06b236..64f642935600 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -2049,8 +2049,19 @@ svm_range_add_list_work(struct svm_range_list *svms, struct svm_range *prange,
>   	 * before the range is freed to avoid straggler interrupts on
>   	 * unmapped memory causing "phantom faults".
>   	 */
> -	if (op == SVM_OP_UNMAP_RANGE)
> +	if (op == SVM_OP_UNMAP_RANGE) {
> +		pr_debug("set range drain_pagefaults true\n");
>   		svms->drain_pagefaults = true;
> +	} else {
> +		struct svm_range *pchild;
> +
> +		list_for_each_entry(pchild, &prange->child_list, child_list)
> +			if (pchild->work_item.op == SVM_OP_UNMAP_RANGE) {
> +				pr_debug("set child drain_pagefaults true\n");
> +				svms->drain_pagefaults = true;
> +			}
> +	}
> +
>   	/* if prange is on the deferred list */
>   	if (!list_empty(&prange->deferred_list)) {
>   		pr_debug("update exist prange 0x%p work op %d\n", prange, op);

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/5] drm/amdkfd: restore pages race with vma remove
  2021-11-09 23:04 ` [PATCH 3/5] drm/amdkfd: restore pages race with vma remove Philip Yang
@ 2021-11-10  3:52   ` Felix Kuehling
  0 siblings, 0 replies; 24+ messages in thread
From: Felix Kuehling @ 2021-11-10  3:52 UTC (permalink / raw)
  To: Philip Yang, amd-gfx

On 2021-11-09 6:04 p.m., Philip Yang wrote:
> Before restore pages takes mmap read or write lock, vma maybe removed.
> Check if vma exists before creating unregistered range or verifying
> range access permission, and return 0 if vma is removed to avoid restore
> pages return failure to report GPU vm fault to application.

Your patch basically means, we never return a VM fault to an application 
that accesses invalid addresses. The application will just spin on that 
address forever. This basically breaks the debugger. I think you need to 
refine that.

While draining faults (svms->drain_pagefaults is set), I think we can 
return success if no VMA is found. During that time we are expecting 
"straggler" faults. But once the draining is done, any fault interrupts 
we get should be from bad application accesses and should result in a VM 
fault for the application.

Also, much of your patch seems to be refactoring to avoid a duplicate 
VMA lookup. I'd split that into a separate patch for clarity.

Regards,
   Felix

>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 64 ++++++++++++++++------------
>   1 file changed, 37 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 64f642935600..8f77d5746b2c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -2336,20 +2336,13 @@ svm_range_best_restore_location(struct svm_range *prange,
>   }
>   
>   static int
> -svm_range_get_range_boundaries(struct kfd_process *p, int64_t addr,
> -			       unsigned long *start, unsigned long *last,
> -			       bool *is_heap_stack)
> +svm_range_get_range_boundaries(struct kfd_process *p, struct vm_area_struct *vma,
> +			       int64_t addr, unsigned long *start,
> +			       unsigned long *last, bool *is_heap_stack)
>   {
> -	struct vm_area_struct *vma;
>   	struct interval_tree_node *node;
>   	unsigned long start_limit, end_limit;
>   
> -	vma = find_vma(p->mm, addr << PAGE_SHIFT);
> -	if (!vma || (addr << PAGE_SHIFT) < vma->vm_start) {
> -		pr_debug("VMA does not exist in address [0x%llx]\n", addr);
> -		return -EFAULT;
> -	}
> -
>   	*is_heap_stack = (vma->vm_start <= vma->vm_mm->brk &&
>   			  vma->vm_end >= vma->vm_mm->start_brk) ||
>   			 (vma->vm_start <= vma->vm_mm->start_stack &&
> @@ -2444,9 +2437,10 @@ svm_range_check_vm_userptr(struct kfd_process *p, uint64_t start, uint64_t last,
>   
>   static struct
>   svm_range *svm_range_create_unregistered_range(struct amdgpu_device *adev,
> -						struct kfd_process *p,
> -						struct mm_struct *mm,
> -						int64_t addr)
> +					       struct kfd_process *p,
> +					       struct mm_struct *mm,
> +					       struct vm_area_struct *vma,
> +					       int64_t addr)
>   {
>   	struct svm_range *prange = NULL;
>   	unsigned long start, last;
> @@ -2456,7 +2450,7 @@ svm_range *svm_range_create_unregistered_range(struct amdgpu_device *adev,
>   	uint64_t bo_l = 0;
>   	int r;
>   
> -	if (svm_range_get_range_boundaries(p, addr, &start, &last,
> +	if (svm_range_get_range_boundaries(p, vma, addr, &start, &last,
>   					   &is_heap_stack))
>   		return NULL;
>   
> @@ -2558,21 +2552,22 @@ svm_range_count_fault(struct amdgpu_device *adev, struct kfd_process *p,
>   		WRITE_ONCE(pdd->faults, pdd->faults + 1);
>   }
>   
> -static bool
> -svm_fault_allowed(struct mm_struct *mm, uint64_t addr, bool write_fault)
> +static int
> +svm_fault_allowed(struct mm_struct *mm, struct vm_area_struct *vma,
> +		  uint64_t addr, bool write_fault)
>   {
>   	unsigned long requested = VM_READ;
> -	struct vm_area_struct *vma;
>   
>   	if (write_fault)
>   		requested |= VM_WRITE;
>   
> -	vma = find_vma(mm, addr << PAGE_SHIFT);
> -	if (!vma || (addr << PAGE_SHIFT) < vma->vm_start) {
> -		pr_debug("address 0x%llx VMA is removed\n", addr);
> -		return true;
> +	if (!vma) {
> +		vma = find_vma(mm, addr << PAGE_SHIFT);
> +		if (!vma || (addr << PAGE_SHIFT) < vma->vm_start) {
> +			pr_debug("address 0x%llx VMA is removed\n", addr);
> +			return -EFAULT;
> +		}
>   	}
> -
>   	pr_debug("requested 0x%lx, vma permission flags 0x%lx\n", requested,
>   		vma->vm_flags);
>   	return (vma->vm_flags & requested) == requested;
> @@ -2590,6 +2585,7 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
>   	int32_t best_loc;
>   	int32_t gpuidx = MAX_GPU_INSTANCE;
>   	bool write_locked = false;
> +	struct vm_area_struct *vma = NULL;
>   	int r = 0;
>   
>   	if (!KFD_IS_SVM_API_SUPPORTED(adev->kfd.dev)) {
> @@ -2636,7 +2632,15 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
>   			write_locked = true;
>   			goto retry_write_locked;
>   		}
> -		prange = svm_range_create_unregistered_range(adev, p, mm, addr);
> +
> +		vma = find_vma(p->mm, addr << PAGE_SHIFT);
> +		if (!vma || (addr << PAGE_SHIFT) < vma->vm_start) {
> +			pr_debug("VMA not found address [0x%llx]\n", addr);
> +			mmap_write_downgrade(mm);
> +			r = 0;
> +			goto out_unlock_svms;
> +		}
> +		prange = svm_range_create_unregistered_range(adev, p, mm, vma, addr);
>   		if (!prange) {
>   			pr_debug("failed to create unregistered range svms 0x%p address [0x%llx]\n",
>   				 svms, addr);
> @@ -2663,10 +2667,16 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
>   		goto out_unlock_range;
>   	}
>   
> -	if (!svm_fault_allowed(mm, addr, write_fault)) {
> -		pr_debug("fault addr 0x%llx no %s permission\n", addr,
> -			write_fault ? "write" : "read");
> -		r = -EPERM;
> +	r = svm_fault_allowed(mm, vma, addr, write_fault);
> +	if (r <= 0) {
> +		if (!r) {
> +			pr_debug("fault addr 0x%llx no %s permission\n", addr,
> +				write_fault ? "write" : "read");
> +			r = -EPERM;
> +		} else  {
> +			pr_debug("fault addr 0x%llx is unmapping\n", addr);
> +			r = 0;
> +		}
>   		goto out_unlock_range;
>   	}
>   

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/5] drm/amdkfd: restore pages race with process termination
  2021-11-09 23:04 ` [PATCH 4/5] drm/amdkfd: restore pages race with process termination Philip Yang
@ 2021-11-10  4:16   ` Felix Kuehling
  2021-11-15 20:50     ` philip yang
  0 siblings, 1 reply; 24+ messages in thread
From: Felix Kuehling @ 2021-11-10  4:16 UTC (permalink / raw)
  To: Philip Yang, amd-gfx

On 2021-11-09 6:04 p.m., Philip Yang wrote:
> restore pages work can not find kfd process or mm struct if process is
> destroyed before drain retry fault work schedule to run, this is not
> failure, return 0 to avoid dump GPU vm fault kernel log.
I wonder if this could also be solved by draining page fault interrupts 
in kfd_process_notifier_release before we remove the process from the 
hash table. Because that function runs while holding the mmap lock, we'd 
need to detect the draining condition for the process in the page fault 
handler before it tries to take the mmap lock. Maybe that's even a 
helpful optimization that speeds up interrupt draining by just ignoring 
all retry faults during that time.

That would also allow draining faults in svm_range_unmap_from_cpu 
instead of the delayed worker. And I believe that would also elegantly 
fix the vma removal race.

Regards,
   Felix


>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 8f77d5746b2c..2083a10b35c2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -2596,7 +2596,7 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
>   	p = kfd_lookup_process_by_pasid(pasid);
>   	if (!p) {
>   		pr_debug("kfd process not founded pasid 0x%x\n", pasid);
> -		return -ESRCH;
> +		return 0;
>   	}
>   	if (!p->xnack_enabled) {
>   		pr_debug("XNACK not enabled for pasid 0x%x\n", pasid);
> @@ -2610,7 +2610,7 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
>   	mm = get_task_mm(p->lead_thread);
>   	if (!mm) {
>   		pr_debug("svms 0x%p failed to get mm\n", svms);
> -		r = -ESRCH;
> +		r = 0;
>   		goto out;
>   	}
>   

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/5] drm/amdkfd: svm deferred work pin mm
  2021-11-09 23:04 ` [PATCH 5/5] drm/amdkfd: svm deferred work pin mm Philip Yang
@ 2021-11-10  4:51   ` Felix Kuehling
  0 siblings, 0 replies; 24+ messages in thread
From: Felix Kuehling @ 2021-11-10  4:51 UTC (permalink / raw)
  To: Philip Yang, amd-gfx


On 2021-11-09 6:04 p.m., Philip Yang wrote:
> Make sure mm does not remove when prange deferred work insert mmu range
> notifier, to avoid WARNING:
>
> WARNING: CPU: 6 PID: 1787 at mm/mmu_notifier.c:932 __mmu_interval_notifier_insert+0xdd/0xf0
> Workqueue: events svm_range_deferred_list_work [amdgpu]
> RIP: 0010:__mmu_interval_notifier_insert+0xdd/0xf0
> Call Trace:
>    svm_range_deferred_list_work+0x156/0x320 [amdgpu]
>    process_one_work+0x29f/0x5e0
>    worker_thread+0x39/0x3e0
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 2083a10b35c2..fddf0a93d6f1 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -1992,6 +1992,13 @@ static void svm_range_deferred_list_work(struct work_struct *work)
>   			 prange->start, prange->last, prange->work_item.op);
>   
>   		mm = prange->work_item.mm;
> +		if (!mmget_not_zero(mm)) {

You can only rely on mmget_not_zero if there is an mmgrab-reference 
still active. Otherwise the mm_struct may already be freed and allocated 
for something else and the mm_user counter may be corrupted. You're safe 
until kfd_process_wq_release calls put_task_struct(p->lead_thread) 
because the task holds a reference to the mm_struct.

One way to guarantee that at least the mm_struct still exists while this 
worker runs is, to add cancel_work_sync(&p->svms.deferred_list_work) in 
kfd_process_notifier_release. We should probably do that anyway.

> +			pr_debug("skip range %p as mm is gone\n", prange);
> +			spin_lock(&svms->deferred_list_lock);
> +			list_del_init(&prange->deferred_list);
> +			continue;

If the mm is gone, you can probably just break here. All the items in 
the list should have the same mm.

That makes me wonder why we need the mm pointer in the work_item at all. 
We could just get an mm reference from get_task_mm(p->lead_thread) 
(where p is the container of svms). And you can do that outside the 
loop. You'll still need a matching mmput call.

Regards,
   Felix


> +		}
> +
>   retry:
>   		mmap_write_lock(mm);
>   		mutex_lock(&svms->lock);
> @@ -2032,6 +2039,7 @@ static void svm_range_deferred_list_work(struct work_struct *work)
>   		svm_range_handle_list_op(svms, prange);
>   		mutex_unlock(&svms->lock);
>   		mmap_write_unlock(mm);
> +		mmput(mm);
>   
>   		spin_lock(&svms->deferred_list_lock);
>   	}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow
  2021-11-09 23:04 [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow Philip Yang
                   ` (3 preceding siblings ...)
  2021-11-09 23:04 ` [PATCH 5/5] drm/amdkfd: svm deferred work pin mm Philip Yang
@ 2021-11-10 10:15 ` Christian König
  2021-11-10 13:59   ` philip yang
  4 siblings, 1 reply; 24+ messages in thread
From: Christian König @ 2021-11-10 10:15 UTC (permalink / raw)
  To: Philip Yang, amd-gfx; +Cc: Felix.Kuehling

Am 10.11.21 um 00:04 schrieb Philip Yang:
> IH ring1 is used to process GPU retry fault, overflow is enabled to
> drain retry fault before unmapping the range, wptr may pass rptr,
> amdgpu_ih_process should check rptr equals to the latest wptr to exit,
> otherwise it will continue to recover outdatad retry fault after drain
> retry fault is done, and generate false GPU vm fault because range is
> unmapped from cpu.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> index f3d62e196901..d1ef61811169 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> @@ -223,7 +223,7 @@ int amdgpu_ih_wait_on_checkpoint_process(struct amdgpu_device *adev,
>    */
>   int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih)
>   {
> -	unsigned int count = AMDGPU_IH_MAX_NUM_IVS;
> +	unsigned int count;
>   	u32 wptr;
>   
>   	if (!ih->enabled || adev->shutdown)
> @@ -232,6 +232,8 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih)
>   	wptr = amdgpu_ih_get_wptr(adev, ih);
>   
>   restart_ih:
> +	count = AMDGPU_IH_MAX_NUM_IVS;
> +

This looks like a bugfix to me and should probably be in a separate 
patch with CC: stable.

>   	DRM_DEBUG("%s: rptr %d, wptr %d\n", __func__, ih->rptr, wptr);
>   
>   	/* Order reading of wptr vs. reading of IH ring data */
> @@ -240,6 +242,9 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih)
>   	while (ih->rptr != wptr && --count) {
>   		amdgpu_irq_dispatch(adev, ih);
>   		ih->rptr &= ih->ptr_mask;
> +
> +		if (ih == &adev->irq.ih1)
> +			wptr = amdgpu_ih_get_wptr(adev, ih);

Well that handling does not really make much sense.

The AMDGPU_IH_MAX_NUM_IVS define controls how many IVs we can process 
before checking the wptr again.

We could of course parameterize that so that we check the wptr after 
each IV on IH1, but please not hard coded like this.

Regards,
Christian.

>   	}
>   
>   	amdgpu_ih_set_rptr(adev, ih);


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow
  2021-11-10 10:15 ` [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow Christian König
@ 2021-11-10 13:59   ` philip yang
  2021-11-10 14:31     ` Christian König
  0 siblings, 1 reply; 24+ messages in thread
From: philip yang @ 2021-11-10 13:59 UTC (permalink / raw)
  To: Christian König, Philip Yang, amd-gfx; +Cc: Felix.Kuehling

[-- Attachment #1: Type: text/html, Size: 12939 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow
  2021-11-10 13:59   ` philip yang
@ 2021-11-10 14:31     ` Christian König
  2021-11-10 14:44       ` philip yang
  2021-11-10 23:36       ` Felix Kuehling
  0 siblings, 2 replies; 24+ messages in thread
From: Christian König @ 2021-11-10 14:31 UTC (permalink / raw)
  To: philip yang, Philip Yang, amd-gfx; +Cc: Felix.Kuehling

Am 10.11.21 um 14:59 schrieb philip yang:
>
> On 2021-11-10 5:15 a.m., Christian König wrote:
>
>> [SNIP]
>
> It is hard to understand, this debug log can explain more details, 
> with this debug message patch
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> index ed6f8d24280b..8859f2bb11b1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> @@ -234,10 +235,12 @@ int amdgpu_ih_process(struct amdgpu_device 
> *adev, struct amdgpu_ih_ring *ih)
>                 return IRQ_NONE;
>
>         wptr = amdgpu_ih_get_wptr(adev, ih);
> +       if (ih == &adev->irq.ih1)
> +               pr_debug("entering rptr 0x%x, wptr 0x%x\n", ih->rptr, 
> wptr);
>
>  restart_ih:
> +       if (ih == &adev->irq.ih1)
> +               pr_debug("starting rptr 0x%x, wptr 0x%x\n", ih->rptr, 
> wptr);
>
>         /* Order reading of wptr vs. reading of IH ring data */
>         rmb();
> @@ -245,8 +248,12 @@ int amdgpu_ih_process(struct amdgpu_device *adev, 
> struct amdgpu_ih_ring *ih)
>         while (ih->rptr != wptr && --count) {
>                 amdgpu_irq_dispatch(adev, ih);
>                 ih->rptr &= ih->ptr_mask;
> +               if (ih == &adev->irq.ih1) {
> +                       pr_debug("rptr 0x%x, old wptr 0x%x, new wptr 
> 0x%x\n",
> +                               ih->rptr, wptr,
> +                               amdgpu_ih_get_wptr(adev, ih));
> +               }
>         }
>
>         amdgpu_ih_set_rptr(adev, ih);
> @@ -257,6 +264,8 @@ int amdgpu_ih_process(struct amdgpu_device *adev, 
> struct amdgpu_ih_ring *ih)
>         if (wptr != ih->rptr)
>                 goto restart_ih;
>
> +       if (ih == &adev->irq.ih1)
> +               pr_debug("exiting rptr 0x%x, wptr 0x%x\n", ih->rptr, 
> wptr);
>         return IRQ_HANDLED;
>  }
>
> This is log, timing 48.807028, ring1 drain is done, rptr == wptr, 
> ring1 is empty, but the loop continues, to handle outdated retry fault.
>

As far as I can see that is perfectly correct and expected behavior.

See the ring buffer overflowed and because of that the loop continues, 
but that is correct because an overflow means that the ring was filled 
with new entries.

So we are processing new entries here, not stale ones.

Regards,
Christian.

> [   48.802231] amdgpu_ih_process:243: amdgpu: starting rptr 0x520, 
> wptr 0xd20
> [   48.802235] amdgpu_ih_process:254: amdgpu: rptr 0x540, old wptr 
> 0xd20, new wptr 0xd20
> [   48.802256] amdgpu_ih_process:254: amdgpu: rptr 0x560, old wptr 
> 0xd20, new wptr 0xd20
> [   48.802260] amdgpu_ih_process:254: amdgpu: rptr 0x580, old wptr 
> 0xd20, new wptr 0xd20
> [   48.802281] amdgpu_ih_process:254: amdgpu: rptr 0x5a0, old wptr 
> 0xd20, new wptr 0xd20
> [   48.802314] amdgpu_ih_process:254: amdgpu: rptr 0x5c0, old wptr 
> 0xd20, new wptr 0xce0
> [   48.802335] amdgpu_ih_process:254: amdgpu: rptr 0x5e0, old wptr 
> 0xd20, new wptr 0xce0
> [   48.802356] amdgpu_ih_process:254: amdgpu: rptr 0x600, old wptr 
> 0xd20, new wptr 0xce0
> [   48.802376] amdgpu_ih_process:254: amdgpu: rptr 0x620, old wptr 
> 0xd20, new wptr 0xce0
> [   48.802396] amdgpu_ih_process:254: amdgpu: rptr 0x640, old wptr 
> 0xd20, new wptr 0xce0
> [   48.802401] amdgpu_ih_process:254: amdgpu: rptr 0x660, old wptr 
> 0xd20, new wptr 0xce0
> [   48.802421] amdgpu_ih_process:254: amdgpu: rptr 0x680, old wptr 
> 0xd20, new wptr 0xce0
> [   48.802442] amdgpu_ih_process:254: amdgpu: rptr 0x6a0, old wptr 
> 0xd20, new wptr 0xce0
> [   48.802463] amdgpu_ih_process:254: amdgpu: rptr 0x6c0, old wptr 
> 0xd20, new wptr 0xce0
> [   48.802483] amdgpu_ih_process:254: amdgpu: rptr 0x6e0, old wptr 
> 0xd20, new wptr 0xce0
> [   48.802503] amdgpu_ih_process:254: amdgpu: rptr 0x700, old wptr 
> 0xd20, new wptr 0xce0
> [   48.802523] amdgpu_ih_process:254: amdgpu: rptr 0x720, old wptr 
> 0xd20, new wptr 0xce0
> [   48.802544] amdgpu_ih_process:254: amdgpu: rptr 0x740, old wptr 
> 0xd20, new wptr 0xce0
> [   48.802565] amdgpu_ih_process:254: amdgpu: rptr 0x760, old wptr 
> 0xd20, new wptr 0xce0
> [   48.802569] amdgpu_ih_process:254: amdgpu: rptr 0x780, old wptr 
> 0xd20, new wptr 0xce0
> [   48.804392] amdgpu_ih_process:254: amdgpu: rptr 0x7a0, old wptr 
> 0xd20, new wptr 0xf00
> [   48.806122] amdgpu_ih_process:254: amdgpu: rptr 0x7c0, old wptr 
> 0xd20, new wptr 0x840
> [   48.806155] amdgpu_ih_process:254: amdgpu: rptr 0x7e0, old wptr 
> 0xd20, new wptr 0x840
> [   48.806965] amdgpu_ih_process:254: amdgpu: rptr 0x800, old wptr 
> 0xd20, new wptr 0x840
> [   48.806995] amdgpu_ih_process:254: amdgpu: rptr 0x820, old wptr 
> 0xd20, new wptr 0x840
> [   48.807028] amdgpu_ih_process:254: amdgpu: rptr 0x840, old wptr 
> 0xd20, new wptr 0x840
> [   48.807063] amdgpu_ih_process:254: amdgpu: rptr 0x860, old wptr 
> 0xd20, new wptr 0x840
> [   48.808421] amdgpu_ih_process:254: amdgpu: rptr 0x880, old wptr 
> 0xd20, new wptr 0x840
>
> Cause this gpu vm fault dump because address is unmapped from cpu.
>
> [   48.807071] svm_range_restore_pages:2617: amdgpu: restoring svms 
> 0x00000000733bf007 fault address 0x7f8a6991f
>
> [   48.807170] svm_range_restore_pages:2631: amdgpu: failed to find 
> prange svms 0x00000000733bf007 address [0x7f8a6991f]
> [   48.807179] svm_range_get_range_boundaries:2348: amdgpu: VMA does 
> not exist in address [0x7f8a6991f]
> [   48.807185] svm_range_restore_pages:2635: amdgpu: failed to create 
> unregistered range svms 0x00000000733bf007 address [0x7f8a6991f]
>
> [   48.807929] amdgpu 0000:25:00.0: amdgpu: [mmhub0] retry page fault 
> (src_id:0 ring:0 vmid:8 pasid:32770, for process kfdtest pid 3969 
> thread kfdtest pid 3969)
> [   48.808219] amdgpu 0000:25:00.0: amdgpu:   in page starting at 
> address 0x00007f8a6991f000 from IH client 0x12 (VMC)
> [   48.808230] amdgpu 0000:25:00.0: amdgpu: 
> VM_L2_PROTECTION_FAULT_STATUS:0x00800031
>
>> We could of course parameterize that so that we check the wptr after 
>> each IV on IH1, but please not hard coded like this.
>>
>> Regards,
>> Christian.
>>
>>>       }
>>>         amdgpu_ih_set_rptr(adev, ih);
>>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow
  2021-11-10 14:31     ` Christian König
@ 2021-11-10 14:44       ` philip yang
  2021-11-10 14:54         ` Christian König
  2021-11-10 23:36       ` Felix Kuehling
  1 sibling, 1 reply; 24+ messages in thread
From: philip yang @ 2021-11-10 14:44 UTC (permalink / raw)
  To: Christian König, Philip Yang, amd-gfx; +Cc: Felix.Kuehling

[-- Attachment #1: Type: text/html, Size: 11171 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow
  2021-11-10 14:44       ` philip yang
@ 2021-11-10 14:54         ` Christian König
  2021-11-10 15:45           ` philip yang
  0 siblings, 1 reply; 24+ messages in thread
From: Christian König @ 2021-11-10 14:54 UTC (permalink / raw)
  To: philip yang, Philip Yang, amd-gfx; +Cc: Felix.Kuehling

Am 10.11.21 um 15:44 schrieb philip yang:
>
> On 2021-11-10 9:31 a.m., Christian König wrote:
>
>> Am 10.11.21 um 14:59 schrieb philip yang:
>>>
>>> On 2021-11-10 5:15 a.m., Christian König wrote:
>>>
>>>> [SNIP]
>>>
>>> It is hard to understand, this debug log can explain more details, 
>>> with this debug message patch
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>> index ed6f8d24280b..8859f2bb11b1 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>> @@ -234,10 +235,12 @@ int amdgpu_ih_process(struct amdgpu_device 
>>> *adev, struct amdgpu_ih_ring *ih)
>>>                 return IRQ_NONE;
>>>
>>>         wptr = amdgpu_ih_get_wptr(adev, ih);
>>> +       if (ih == &adev->irq.ih1)
>>> +               pr_debug("entering rptr 0x%x, wptr 0x%x\n", 
>>> ih->rptr, wptr);
>>>
>>>  restart_ih:
>>> +       if (ih == &adev->irq.ih1)
>>> +               pr_debug("starting rptr 0x%x, wptr 0x%x\n", 
>>> ih->rptr, wptr);
>>>
>>>         /* Order reading of wptr vs. reading of IH ring data */
>>>         rmb();
>>> @@ -245,8 +248,12 @@ int amdgpu_ih_process(struct amdgpu_device 
>>> *adev, struct amdgpu_ih_ring *ih)
>>>         while (ih->rptr != wptr && --count) {
>>>                 amdgpu_irq_dispatch(adev, ih);
>>>                 ih->rptr &= ih->ptr_mask;
>>> +               if (ih == &adev->irq.ih1) {
>>> +                       pr_debug("rptr 0x%x, old wptr 0x%x, new wptr 
>>> 0x%x\n",
>>> +                               ih->rptr, wptr,
>>> +                               amdgpu_ih_get_wptr(adev, ih));
>>> +               }
>>>         }
>>>
>>>         amdgpu_ih_set_rptr(adev, ih);
>>> @@ -257,6 +264,8 @@ int amdgpu_ih_process(struct amdgpu_device 
>>> *adev, struct amdgpu_ih_ring *ih)
>>>         if (wptr != ih->rptr)
>>>                 goto restart_ih;
>>>
>>> +       if (ih == &adev->irq.ih1)
>>> +               pr_debug("exiting rptr 0x%x, wptr 0x%x\n", ih->rptr, 
>>> wptr);
>>>         return IRQ_HANDLED;
>>>  }
>>>
>>> This is log, timing 48.807028, ring1 drain is done, rptr == wptr, 
>>> ring1 is empty, but the loop continues, to handle outdated retry fault.
>>>
>>
>> As far as I can see that is perfectly correct and expected behavior.
>>
>> See the ring buffer overflowed and because of that the loop 
>> continues, but that is correct because an overflow means that the 
>> ring was filled with new entries.
>>
>> So we are processing new entries here, not stale ones.
>
> wptr is 0x840, 0x860.....0xd20 is not new entries, it is stale retry 
> fault, the loop will continue handle stale fault to 0xd20 and then 
> exit loop, it is because wptr pass rptr after timing 48.806122.
>

Yeah, but 0x840..0xd20 still contain perfectly valid IVs which you drop 
on the ground with your approach. That's not something we can do.

What can happen is that the ring buffer overflows and we process the 
same IV twice, but that is also perfectly expected behavior when an 
overflow happens.

Regards,
Christian.

> Regards.
>
> Philip
>
>>
>> Regards,
>> Christian.
>>
>>> [   48.802231] amdgpu_ih_process:243: amdgpu: starting rptr 0x520, 
>>> wptr 0xd20
>>> [   48.802235] amdgpu_ih_process:254: amdgpu: rptr 0x540, old wptr 
>>> 0xd20, new wptr 0xd20
>>> [   48.802256] amdgpu_ih_process:254: amdgpu: rptr 0x560, old wptr 
>>> 0xd20, new wptr 0xd20
>>> [   48.802260] amdgpu_ih_process:254: amdgpu: rptr 0x580, old wptr 
>>> 0xd20, new wptr 0xd20
>>> [   48.802281] amdgpu_ih_process:254: amdgpu: rptr 0x5a0, old wptr 
>>> 0xd20, new wptr 0xd20
>>> [   48.802314] amdgpu_ih_process:254: amdgpu: rptr 0x5c0, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802335] amdgpu_ih_process:254: amdgpu: rptr 0x5e0, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802356] amdgpu_ih_process:254: amdgpu: rptr 0x600, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802376] amdgpu_ih_process:254: amdgpu: rptr 0x620, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802396] amdgpu_ih_process:254: amdgpu: rptr 0x640, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802401] amdgpu_ih_process:254: amdgpu: rptr 0x660, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802421] amdgpu_ih_process:254: amdgpu: rptr 0x680, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802442] amdgpu_ih_process:254: amdgpu: rptr 0x6a0, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802463] amdgpu_ih_process:254: amdgpu: rptr 0x6c0, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802483] amdgpu_ih_process:254: amdgpu: rptr 0x6e0, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802503] amdgpu_ih_process:254: amdgpu: rptr 0x700, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802523] amdgpu_ih_process:254: amdgpu: rptr 0x720, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802544] amdgpu_ih_process:254: amdgpu: rptr 0x740, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802565] amdgpu_ih_process:254: amdgpu: rptr 0x760, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802569] amdgpu_ih_process:254: amdgpu: rptr 0x780, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.804392] amdgpu_ih_process:254: amdgpu: rptr 0x7a0, old wptr 
>>> 0xd20, new wptr 0xf00
>>> [   48.806122] amdgpu_ih_process:254: amdgpu: rptr 0x7c0, old wptr 
>>> 0xd20, new wptr 0x840
>>> [   48.806155] amdgpu_ih_process:254: amdgpu: rptr 0x7e0, old wptr 
>>> 0xd20, new wptr 0x840
>>> [   48.806965] amdgpu_ih_process:254: amdgpu: rptr 0x800, old wptr 
>>> 0xd20, new wptr 0x840
>>> [   48.806995] amdgpu_ih_process:254: amdgpu: rptr 0x820, old wptr 
>>> 0xd20, new wptr 0x840
>>> [   48.807028] amdgpu_ih_process:254: amdgpu: rptr 0x840, old wptr 
>>> 0xd20, new wptr 0x840
>>> [   48.807063] amdgpu_ih_process:254: amdgpu: rptr 0x860, old wptr 
>>> 0xd20, new wptr 0x840
>>> [   48.808421] amdgpu_ih_process:254: amdgpu: rptr 0x880, old wptr 
>>> 0xd20, new wptr 0x840
>>>
>>> Cause this gpu vm fault dump because address is unmapped from cpu.
>>>
>>> [   48.807071] svm_range_restore_pages:2617: amdgpu: restoring svms 
>>> 0x00000000733bf007 fault address 0x7f8a6991f
>>>
>>> [   48.807170] svm_range_restore_pages:2631: amdgpu: failed to find 
>>> prange svms 0x00000000733bf007 address [0x7f8a6991f]
>>> [   48.807179] svm_range_get_range_boundaries:2348: amdgpu: VMA does 
>>> not exist in address [0x7f8a6991f]
>>> [   48.807185] svm_range_restore_pages:2635: amdgpu: failed to 
>>> create unregistered range svms 0x00000000733bf007 address [0x7f8a6991f]
>>>
>>> [   48.807929] amdgpu 0000:25:00.0: amdgpu: [mmhub0] retry page 
>>> fault (src_id:0 ring:0 vmid:8 pasid:32770, for process kfdtest pid 
>>> 3969 thread kfdtest pid 3969)
>>> [   48.808219] amdgpu 0000:25:00.0: amdgpu:   in page starting at 
>>> address 0x00007f8a6991f000 from IH client 0x12 (VMC)
>>> [   48.808230] amdgpu 0000:25:00.0: amdgpu: 
>>> VM_L2_PROTECTION_FAULT_STATUS:0x00800031
>>>
>>>> We could of course parameterize that so that we check the wptr 
>>>> after each IV on IH1, but please not hard coded like this.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>       }
>>>>>         amdgpu_ih_set_rptr(adev, ih);
>>>>
>>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow
  2021-11-10 14:54         ` Christian König
@ 2021-11-10 15:45           ` philip yang
  2021-11-10 15:48             ` Christian König
  0 siblings, 1 reply; 24+ messages in thread
From: philip yang @ 2021-11-10 15:45 UTC (permalink / raw)
  To: Christian König, Philip Yang, amd-gfx; +Cc: Felix.Kuehling

[-- Attachment #1: Type: text/html, Size: 13973 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow
  2021-11-10 15:45           ` philip yang
@ 2021-11-10 15:48             ` Christian König
  0 siblings, 0 replies; 24+ messages in thread
From: Christian König @ 2021-11-10 15:48 UTC (permalink / raw)
  To: philip yang, Philip Yang, amd-gfx; +Cc: Felix.Kuehling

Am 10.11.21 um 16:45 schrieb philip yang:
>
> On 2021-11-10 9:54 a.m., Christian König wrote:
>
>> Am 10.11.21 um 15:44 schrieb philip yang:
>>>
>>> On 2021-11-10 9:31 a.m., Christian König wrote:
>>>
>>>> Am 10.11.21 um 14:59 schrieb philip yang:
>>>>>
>>>>> On 2021-11-10 5:15 a.m., Christian König wrote:
>>>>>
>>>>>> [SNIP]
>>>>>
>>>>> It is hard to understand, this debug log can explain more details, 
>>>>> with this debug message patch
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c 
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>>>> index ed6f8d24280b..8859f2bb11b1 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>>>> @@ -234,10 +235,12 @@ int amdgpu_ih_process(struct amdgpu_device 
>>>>> *adev, struct amdgpu_ih_ring *ih)
>>>>>                 return IRQ_NONE;
>>>>>
>>>>>         wptr = amdgpu_ih_get_wptr(adev, ih);
>>>>> +       if (ih == &adev->irq.ih1)
>>>>> +               pr_debug("entering rptr 0x%x, wptr 0x%x\n", 
>>>>> ih->rptr, wptr);
>>>>>
>>>>>  restart_ih:
>>>>> +       if (ih == &adev->irq.ih1)
>>>>> +               pr_debug("starting rptr 0x%x, wptr 0x%x\n", 
>>>>> ih->rptr, wptr);
>>>>>
>>>>>         /* Order reading of wptr vs. reading of IH ring data */
>>>>>         rmb();
>>>>> @@ -245,8 +248,12 @@ int amdgpu_ih_process(struct amdgpu_device 
>>>>> *adev, struct amdgpu_ih_ring *ih)
>>>>>         while (ih->rptr != wptr && --count) {
>>>>>                 amdgpu_irq_dispatch(adev, ih);
>>>>>                 ih->rptr &= ih->ptr_mask;
>>>>> +               if (ih == &adev->irq.ih1) {
>>>>> +                       pr_debug("rptr 0x%x, old wptr 0x%x, new 
>>>>> wptr 0x%x\n",
>>>>> +                               ih->rptr, wptr,
>>>>> +                               amdgpu_ih_get_wptr(adev, ih));
>>>>> +               }
>>>>>         }
>>>>>
>>>>>         amdgpu_ih_set_rptr(adev, ih);
>>>>> @@ -257,6 +264,8 @@ int amdgpu_ih_process(struct amdgpu_device 
>>>>> *adev, struct amdgpu_ih_ring *ih)
>>>>>         if (wptr != ih->rptr)
>>>>>                 goto restart_ih;
>>>>>
>>>>> +       if (ih == &adev->irq.ih1)
>>>>> +               pr_debug("exiting rptr 0x%x, wptr 0x%x\n", 
>>>>> ih->rptr, wptr);
>>>>>         return IRQ_HANDLED;
>>>>>  }
>>>>>
>>>>> This is log, timing 48.807028, ring1 drain is done, rptr == wptr, 
>>>>> ring1 is empty, but the loop continues, to handle outdated retry 
>>>>> fault.
>>>>>
>>>>
>>>> As far as I can see that is perfectly correct and expected behavior.
>>>>
>>>> See the ring buffer overflowed and because of that the loop 
>>>> continues, but that is correct because an overflow means that the 
>>>> ring was filled with new entries.
>>>>
>>>> So we are processing new entries here, not stale ones.
>>>
>>> wptr is 0x840, 0x860.....0xd20 is not new entries, it is stale retry 
>>> fault, the loop will continue handle stale fault to 0xd20 and then 
>>> exit loop, it is because wptr pass rptr after timing 48.806122.
>>>
>>
>> Yeah, but 0x840..0xd20 still contain perfectly valid IVs which you 
>> drop on the ground with your approach. That's not something we can do.
>
> We drain retry fault in unmap from cpu notifier, drain finish 
> condition is ring checkpoint is processed, or ring is empty rptr=wptr 
> (to fix another issue, IH ring1 do not setup ring wptr overflow flag 
> after wptr exceed rptr).
>
> After drain retry fault returns, we are not expecting retry fault on 
> the range as queue should not access the range, so we free range as it 
> is unmapped from cpu. From this point of view, 0x860..0xd20 are stale 
> retry fault.
>

I think we should probably rethink that approach.

>>
>> What can happen is that the ring buffer overflows and we process the 
>> same IV twice, but that is also perfectly expected behavior when an 
>> overflow happens.
>
> After wptr exceed rptr, because no overflow flag, we just drain the 
> fault until rptr=wptr, yes, this drops many retry faults, it is ok as 
> real retry fault will come in again.
>

No, that is absolutely not ok.For the current hardware we are flooded 
with retry faults because of the hw design, but this here is supposed to 
work independent on the hardware design.

Dropping IVs is completely out of question. Processing the same IV twice 
is just the lesser evil here.

So from that point of view this patch is clearly a NAK.

Regards,
Christian.

> Regards,
>
> Philip
>
>>
>> Regards,
>> Christian.
>>
>>> Regards.
>>>
>>> Philip
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> [   48.802231] amdgpu_ih_process:243: amdgpu: starting rptr 0x520, 
>>>>> wptr 0xd20
>>>>> [   48.802235] amdgpu_ih_process:254: amdgpu: rptr 0x540, old wptr 
>>>>> 0xd20, new wptr 0xd20
>>>>> [   48.802256] amdgpu_ih_process:254: amdgpu: rptr 0x560, old wptr 
>>>>> 0xd20, new wptr 0xd20
>>>>> [   48.802260] amdgpu_ih_process:254: amdgpu: rptr 0x580, old wptr 
>>>>> 0xd20, new wptr 0xd20
>>>>> [   48.802281] amdgpu_ih_process:254: amdgpu: rptr 0x5a0, old wptr 
>>>>> 0xd20, new wptr 0xd20
>>>>> [   48.802314] amdgpu_ih_process:254: amdgpu: rptr 0x5c0, old wptr 
>>>>> 0xd20, new wptr 0xce0
>>>>> [   48.802335] amdgpu_ih_process:254: amdgpu: rptr 0x5e0, old wptr 
>>>>> 0xd20, new wptr 0xce0
>>>>> [   48.802356] amdgpu_ih_process:254: amdgpu: rptr 0x600, old wptr 
>>>>> 0xd20, new wptr 0xce0
>>>>> [   48.802376] amdgpu_ih_process:254: amdgpu: rptr 0x620, old wptr 
>>>>> 0xd20, new wptr 0xce0
>>>>> [   48.802396] amdgpu_ih_process:254: amdgpu: rptr 0x640, old wptr 
>>>>> 0xd20, new wptr 0xce0
>>>>> [   48.802401] amdgpu_ih_process:254: amdgpu: rptr 0x660, old wptr 
>>>>> 0xd20, new wptr 0xce0
>>>>> [   48.802421] amdgpu_ih_process:254: amdgpu: rptr 0x680, old wptr 
>>>>> 0xd20, new wptr 0xce0
>>>>> [   48.802442] amdgpu_ih_process:254: amdgpu: rptr 0x6a0, old wptr 
>>>>> 0xd20, new wptr 0xce0
>>>>> [   48.802463] amdgpu_ih_process:254: amdgpu: rptr 0x6c0, old wptr 
>>>>> 0xd20, new wptr 0xce0
>>>>> [   48.802483] amdgpu_ih_process:254: amdgpu: rptr 0x6e0, old wptr 
>>>>> 0xd20, new wptr 0xce0
>>>>> [   48.802503] amdgpu_ih_process:254: amdgpu: rptr 0x700, old wptr 
>>>>> 0xd20, new wptr 0xce0
>>>>> [   48.802523] amdgpu_ih_process:254: amdgpu: rptr 0x720, old wptr 
>>>>> 0xd20, new wptr 0xce0
>>>>> [   48.802544] amdgpu_ih_process:254: amdgpu: rptr 0x740, old wptr 
>>>>> 0xd20, new wptr 0xce0
>>>>> [   48.802565] amdgpu_ih_process:254: amdgpu: rptr 0x760, old wptr 
>>>>> 0xd20, new wptr 0xce0
>>>>> [   48.802569] amdgpu_ih_process:254: amdgpu: rptr 0x780, old wptr 
>>>>> 0xd20, new wptr 0xce0
>>>>> [   48.804392] amdgpu_ih_process:254: amdgpu: rptr 0x7a0, old wptr 
>>>>> 0xd20, new wptr 0xf00
>>>>> [   48.806122] amdgpu_ih_process:254: amdgpu: rptr 0x7c0, old wptr 
>>>>> 0xd20, new wptr 0x840
>>>>> [   48.806155] amdgpu_ih_process:254: amdgpu: rptr 0x7e0, old wptr 
>>>>> 0xd20, new wptr 0x840
>>>>> [   48.806965] amdgpu_ih_process:254: amdgpu: rptr 0x800, old wptr 
>>>>> 0xd20, new wptr 0x840
>>>>> [   48.806995] amdgpu_ih_process:254: amdgpu: rptr 0x820, old wptr 
>>>>> 0xd20, new wptr 0x840
>>>>> [   48.807028] amdgpu_ih_process:254: amdgpu: rptr 0x840, old wptr 
>>>>> 0xd20, new wptr 0x840
>>>>> [   48.807063] amdgpu_ih_process:254: amdgpu: rptr 0x860, old wptr 
>>>>> 0xd20, new wptr 0x840
>>>>> [   48.808421] amdgpu_ih_process:254: amdgpu: rptr 0x880, old wptr 
>>>>> 0xd20, new wptr 0x840
>>>>>
>>>>> Cause this gpu vm fault dump because address is unmapped from cpu.
>>>>>
>>>>> [   48.807071] svm_range_restore_pages:2617: amdgpu: restoring 
>>>>> svms 0x00000000733bf007 fault address 0x7f8a6991f
>>>>>
>>>>> [   48.807170] svm_range_restore_pages:2631: amdgpu: failed to 
>>>>> find prange svms 0x00000000733bf007 address [0x7f8a6991f]
>>>>> [   48.807179] svm_range_get_range_boundaries:2348: amdgpu: VMA 
>>>>> does not exist in address [0x7f8a6991f]
>>>>> [   48.807185] svm_range_restore_pages:2635: amdgpu: failed to 
>>>>> create unregistered range svms 0x00000000733bf007 address 
>>>>> [0x7f8a6991f]
>>>>>
>>>>> [   48.807929] amdgpu 0000:25:00.0: amdgpu: [mmhub0] retry page 
>>>>> fault (src_id:0 ring:0 vmid:8 pasid:32770, for process kfdtest pid 
>>>>> 3969 thread kfdtest pid 3969)
>>>>> [   48.808219] amdgpu 0000:25:00.0: amdgpu:   in page starting at 
>>>>> address 0x00007f8a6991f000 from IH client 0x12 (VMC)
>>>>> [   48.808230] amdgpu 0000:25:00.0: amdgpu: 
>>>>> VM_L2_PROTECTION_FAULT_STATUS:0x00800031
>>>>>
>>>>>> We could of course parameterize that so that we check the wptr 
>>>>>> after each IV on IH1, but please not hard coded like this.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>>       }
>>>>>>>         amdgpu_ih_set_rptr(adev, ih);
>>>>>>
>>>>
>>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow
  2021-11-10 14:31     ` Christian König
  2021-11-10 14:44       ` philip yang
@ 2021-11-10 23:36       ` Felix Kuehling
  2021-11-11  7:00         ` Christian König
  1 sibling, 1 reply; 24+ messages in thread
From: Felix Kuehling @ 2021-11-10 23:36 UTC (permalink / raw)
  To: Christian König, philip yang, Philip Yang, amd-gfx

On 2021-11-10 9:31 a.m., Christian König wrote:
> Am 10.11.21 um 14:59 schrieb philip yang:
>>
>> On 2021-11-10 5:15 a.m., Christian König wrote:
>>
>>> [SNIP]
>>
>> It is hard to understand, this debug log can explain more details, 
>> with this debug message patch
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>> index ed6f8d24280b..8859f2bb11b1 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>> @@ -234,10 +235,12 @@ int amdgpu_ih_process(struct amdgpu_device 
>> *adev, struct amdgpu_ih_ring *ih)
>>                 return IRQ_NONE;
>>
>>         wptr = amdgpu_ih_get_wptr(adev, ih);
>> +       if (ih == &adev->irq.ih1)
>> +               pr_debug("entering rptr 0x%x, wptr 0x%x\n", ih->rptr, 
>> wptr);
>>
>>  restart_ih:
>> +       if (ih == &adev->irq.ih1)
>> +               pr_debug("starting rptr 0x%x, wptr 0x%x\n", ih->rptr, 
>> wptr);
>>
>>         /* Order reading of wptr vs. reading of IH ring data */
>>         rmb();
>> @@ -245,8 +248,12 @@ int amdgpu_ih_process(struct amdgpu_device 
>> *adev, struct amdgpu_ih_ring *ih)
>>         while (ih->rptr != wptr && --count) {
>>                 amdgpu_irq_dispatch(adev, ih);
>>                 ih->rptr &= ih->ptr_mask;
>> +               if (ih == &adev->irq.ih1) {
>> +                       pr_debug("rptr 0x%x, old wptr 0x%x, new wptr 
>> 0x%x\n",
>> +                               ih->rptr, wptr,
>> +                               amdgpu_ih_get_wptr(adev, ih));
>> +               }
>>         }
>>
>>         amdgpu_ih_set_rptr(adev, ih);
>> @@ -257,6 +264,8 @@ int amdgpu_ih_process(struct amdgpu_device *adev, 
>> struct amdgpu_ih_ring *ih)
>>         if (wptr != ih->rptr)
>>                 goto restart_ih;
>>
>> +       if (ih == &adev->irq.ih1)
>> +               pr_debug("exiting rptr 0x%x, wptr 0x%x\n", ih->rptr, 
>> wptr);
>>         return IRQ_HANDLED;
>>  }
>>
>> This is log, timing 48.807028, ring1 drain is done, rptr == wptr, 
>> ring1 is empty, but the loop continues, to handle outdated retry fault.
>>
>
> As far as I can see that is perfectly correct and expected behavior.
>
> See the ring buffer overflowed and because of that the loop continues, 
> but that is correct because an overflow means that the ring was filled 
> with new entries.
>
> So we are processing new entries here, not stale ones.

Aren't we processing interrupts out-of-order in this case. We're 
processing newer ones before older ones. Is that the root of the problem 
because it confuses our interrupt draining function?

Maybe we need to detect overflows in the interrupt draining function to 
make it wait longer in that case.

Regards,
   Felix


>
> Regards,
> Christian.
>
>> [   48.802231] amdgpu_ih_process:243: amdgpu: starting rptr 0x520, 
>> wptr 0xd20
>> [   48.802235] amdgpu_ih_process:254: amdgpu: rptr 0x540, old wptr 
>> 0xd20, new wptr 0xd20
>> [   48.802256] amdgpu_ih_process:254: amdgpu: rptr 0x560, old wptr 
>> 0xd20, new wptr 0xd20
>> [   48.802260] amdgpu_ih_process:254: amdgpu: rptr 0x580, old wptr 
>> 0xd20, new wptr 0xd20
>> [   48.802281] amdgpu_ih_process:254: amdgpu: rptr 0x5a0, old wptr 
>> 0xd20, new wptr 0xd20
>> [   48.802314] amdgpu_ih_process:254: amdgpu: rptr 0x5c0, old wptr 
>> 0xd20, new wptr 0xce0
>> [   48.802335] amdgpu_ih_process:254: amdgpu: rptr 0x5e0, old wptr 
>> 0xd20, new wptr 0xce0
>> [   48.802356] amdgpu_ih_process:254: amdgpu: rptr 0x600, old wptr 
>> 0xd20, new wptr 0xce0
>> [   48.802376] amdgpu_ih_process:254: amdgpu: rptr 0x620, old wptr 
>> 0xd20, new wptr 0xce0
>> [   48.802396] amdgpu_ih_process:254: amdgpu: rptr 0x640, old wptr 
>> 0xd20, new wptr 0xce0
>> [   48.802401] amdgpu_ih_process:254: amdgpu: rptr 0x660, old wptr 
>> 0xd20, new wptr 0xce0
>> [   48.802421] amdgpu_ih_process:254: amdgpu: rptr 0x680, old wptr 
>> 0xd20, new wptr 0xce0
>> [   48.802442] amdgpu_ih_process:254: amdgpu: rptr 0x6a0, old wptr 
>> 0xd20, new wptr 0xce0
>> [   48.802463] amdgpu_ih_process:254: amdgpu: rptr 0x6c0, old wptr 
>> 0xd20, new wptr 0xce0
>> [   48.802483] amdgpu_ih_process:254: amdgpu: rptr 0x6e0, old wptr 
>> 0xd20, new wptr 0xce0
>> [   48.802503] amdgpu_ih_process:254: amdgpu: rptr 0x700, old wptr 
>> 0xd20, new wptr 0xce0
>> [   48.802523] amdgpu_ih_process:254: amdgpu: rptr 0x720, old wptr 
>> 0xd20, new wptr 0xce0
>> [   48.802544] amdgpu_ih_process:254: amdgpu: rptr 0x740, old wptr 
>> 0xd20, new wptr 0xce0
>> [   48.802565] amdgpu_ih_process:254: amdgpu: rptr 0x760, old wptr 
>> 0xd20, new wptr 0xce0
>> [   48.802569] amdgpu_ih_process:254: amdgpu: rptr 0x780, old wptr 
>> 0xd20, new wptr 0xce0
>> [   48.804392] amdgpu_ih_process:254: amdgpu: rptr 0x7a0, old wptr 
>> 0xd20, new wptr 0xf00
>> [   48.806122] amdgpu_ih_process:254: amdgpu: rptr 0x7c0, old wptr 
>> 0xd20, new wptr 0x840
>> [   48.806155] amdgpu_ih_process:254: amdgpu: rptr 0x7e0, old wptr 
>> 0xd20, new wptr 0x840
>> [   48.806965] amdgpu_ih_process:254: amdgpu: rptr 0x800, old wptr 
>> 0xd20, new wptr 0x840
>> [   48.806995] amdgpu_ih_process:254: amdgpu: rptr 0x820, old wptr 
>> 0xd20, new wptr 0x840
>> [   48.807028] amdgpu_ih_process:254: amdgpu: rptr 0x840, old wptr 
>> 0xd20, new wptr 0x840
>> [   48.807063] amdgpu_ih_process:254: amdgpu: rptr 0x860, old wptr 
>> 0xd20, new wptr 0x840
>> [   48.808421] amdgpu_ih_process:254: amdgpu: rptr 0x880, old wptr 
>> 0xd20, new wptr 0x840
>>
>> Cause this gpu vm fault dump because address is unmapped from cpu.
>>
>> [   48.807071] svm_range_restore_pages:2617: amdgpu: restoring svms 
>> 0x00000000733bf007 fault address 0x7f8a6991f
>>
>> [   48.807170] svm_range_restore_pages:2631: amdgpu: failed to find 
>> prange svms 0x00000000733bf007 address [0x7f8a6991f]
>> [   48.807179] svm_range_get_range_boundaries:2348: amdgpu: VMA does 
>> not exist in address [0x7f8a6991f]
>> [   48.807185] svm_range_restore_pages:2635: amdgpu: failed to create 
>> unregistered range svms 0x00000000733bf007 address [0x7f8a6991f]
>>
>> [   48.807929] amdgpu 0000:25:00.0: amdgpu: [mmhub0] retry page fault 
>> (src_id:0 ring:0 vmid:8 pasid:32770, for process kfdtest pid 3969 
>> thread kfdtest pid 3969)
>> [   48.808219] amdgpu 0000:25:00.0: amdgpu:   in page starting at 
>> address 0x00007f8a6991f000 from IH client 0x12 (VMC)
>> [   48.808230] amdgpu 0000:25:00.0: amdgpu: 
>> VM_L2_PROTECTION_FAULT_STATUS:0x00800031
>>
>>> We could of course parameterize that so that we check the wptr after 
>>> each IV on IH1, but please not hard coded like this.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>       }
>>>>         amdgpu_ih_set_rptr(adev, ih);
>>>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow
  2021-11-10 23:36       ` Felix Kuehling
@ 2021-11-11  7:00         ` Christian König
  2021-11-11 12:13           ` Felix Kuehling
  0 siblings, 1 reply; 24+ messages in thread
From: Christian König @ 2021-11-11  7:00 UTC (permalink / raw)
  To: Felix Kuehling, philip yang, Philip Yang, amd-gfx

Am 11.11.21 um 00:36 schrieb Felix Kuehling:
> On 2021-11-10 9:31 a.m., Christian König wrote:
>> Am 10.11.21 um 14:59 schrieb philip yang:
>>>
>>> On 2021-11-10 5:15 a.m., Christian König wrote:
>>>
>>>> [SNIP]
>>>
>>> It is hard to understand, this debug log can explain more details, 
>>> with this debug message patch
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>> index ed6f8d24280b..8859f2bb11b1 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>> @@ -234,10 +235,12 @@ int amdgpu_ih_process(struct amdgpu_device 
>>> *adev, struct amdgpu_ih_ring *ih)
>>>                 return IRQ_NONE;
>>>
>>>         wptr = amdgpu_ih_get_wptr(adev, ih);
>>> +       if (ih == &adev->irq.ih1)
>>> +               pr_debug("entering rptr 0x%x, wptr 0x%x\n", 
>>> ih->rptr, wptr);
>>>
>>>  restart_ih:
>>> +       if (ih == &adev->irq.ih1)
>>> +               pr_debug("starting rptr 0x%x, wptr 0x%x\n", 
>>> ih->rptr, wptr);
>>>
>>>         /* Order reading of wptr vs. reading of IH ring data */
>>>         rmb();
>>> @@ -245,8 +248,12 @@ int amdgpu_ih_process(struct amdgpu_device 
>>> *adev, struct amdgpu_ih_ring *ih)
>>>         while (ih->rptr != wptr && --count) {
>>>                 amdgpu_irq_dispatch(adev, ih);
>>>                 ih->rptr &= ih->ptr_mask;
>>> +               if (ih == &adev->irq.ih1) {
>>> +                       pr_debug("rptr 0x%x, old wptr 0x%x, new wptr 
>>> 0x%x\n",
>>> +                               ih->rptr, wptr,
>>> +                               amdgpu_ih_get_wptr(adev, ih));
>>> +               }
>>>         }
>>>
>>>         amdgpu_ih_set_rptr(adev, ih);
>>> @@ -257,6 +264,8 @@ int amdgpu_ih_process(struct amdgpu_device 
>>> *adev, struct amdgpu_ih_ring *ih)
>>>         if (wptr != ih->rptr)
>>>                 goto restart_ih;
>>>
>>> +       if (ih == &adev->irq.ih1)
>>> +               pr_debug("exiting rptr 0x%x, wptr 0x%x\n", ih->rptr, 
>>> wptr);
>>>         return IRQ_HANDLED;
>>>  }
>>>
>>> This is log, timing 48.807028, ring1 drain is done, rptr == wptr, 
>>> ring1 is empty, but the loop continues, to handle outdated retry fault.
>>>
>>
>> As far as I can see that is perfectly correct and expected behavior.
>>
>> See the ring buffer overflowed and because of that the loop 
>> continues, but that is correct because an overflow means that the 
>> ring was filled with new entries.
>>
>> So we are processing new entries here, not stale ones.
>
> Aren't we processing interrupts out-of-order in this case. We're 
> processing newer ones before older ones. Is that the root of the 
> problem because it confuses our interrupt draining function?

Good point.

> Maybe we need to detect overflows in the interrupt draining function 
> to make it wait longer in that case.

Ideally we should use something which is completely separate from all 
those implementation details.

Like for example using the timestamp or a separate indicator/counter 
instead.

Regards,
Christian.

>
> Regards,
>   Felix
>
>
>>
>> Regards,
>> Christian.
>>
>>> [   48.802231] amdgpu_ih_process:243: amdgpu: starting rptr 0x520, 
>>> wptr 0xd20
>>> [   48.802235] amdgpu_ih_process:254: amdgpu: rptr 0x540, old wptr 
>>> 0xd20, new wptr 0xd20
>>> [   48.802256] amdgpu_ih_process:254: amdgpu: rptr 0x560, old wptr 
>>> 0xd20, new wptr 0xd20
>>> [   48.802260] amdgpu_ih_process:254: amdgpu: rptr 0x580, old wptr 
>>> 0xd20, new wptr 0xd20
>>> [   48.802281] amdgpu_ih_process:254: amdgpu: rptr 0x5a0, old wptr 
>>> 0xd20, new wptr 0xd20
>>> [   48.802314] amdgpu_ih_process:254: amdgpu: rptr 0x5c0, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802335] amdgpu_ih_process:254: amdgpu: rptr 0x5e0, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802356] amdgpu_ih_process:254: amdgpu: rptr 0x600, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802376] amdgpu_ih_process:254: amdgpu: rptr 0x620, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802396] amdgpu_ih_process:254: amdgpu: rptr 0x640, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802401] amdgpu_ih_process:254: amdgpu: rptr 0x660, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802421] amdgpu_ih_process:254: amdgpu: rptr 0x680, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802442] amdgpu_ih_process:254: amdgpu: rptr 0x6a0, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802463] amdgpu_ih_process:254: amdgpu: rptr 0x6c0, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802483] amdgpu_ih_process:254: amdgpu: rptr 0x6e0, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802503] amdgpu_ih_process:254: amdgpu: rptr 0x700, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802523] amdgpu_ih_process:254: amdgpu: rptr 0x720, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802544] amdgpu_ih_process:254: amdgpu: rptr 0x740, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802565] amdgpu_ih_process:254: amdgpu: rptr 0x760, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.802569] amdgpu_ih_process:254: amdgpu: rptr 0x780, old wptr 
>>> 0xd20, new wptr 0xce0
>>> [   48.804392] amdgpu_ih_process:254: amdgpu: rptr 0x7a0, old wptr 
>>> 0xd20, new wptr 0xf00
>>> [   48.806122] amdgpu_ih_process:254: amdgpu: rptr 0x7c0, old wptr 
>>> 0xd20, new wptr 0x840
>>> [   48.806155] amdgpu_ih_process:254: amdgpu: rptr 0x7e0, old wptr 
>>> 0xd20, new wptr 0x840
>>> [   48.806965] amdgpu_ih_process:254: amdgpu: rptr 0x800, old wptr 
>>> 0xd20, new wptr 0x840
>>> [   48.806995] amdgpu_ih_process:254: amdgpu: rptr 0x820, old wptr 
>>> 0xd20, new wptr 0x840
>>> [   48.807028] amdgpu_ih_process:254: amdgpu: rptr 0x840, old wptr 
>>> 0xd20, new wptr 0x840
>>> [   48.807063] amdgpu_ih_process:254: amdgpu: rptr 0x860, old wptr 
>>> 0xd20, new wptr 0x840
>>> [   48.808421] amdgpu_ih_process:254: amdgpu: rptr 0x880, old wptr 
>>> 0xd20, new wptr 0x840
>>>
>>> Cause this gpu vm fault dump because address is unmapped from cpu.
>>>
>>> [   48.807071] svm_range_restore_pages:2617: amdgpu: restoring svms 
>>> 0x00000000733bf007 fault address 0x7f8a6991f
>>>
>>> [   48.807170] svm_range_restore_pages:2631: amdgpu: failed to find 
>>> prange svms 0x00000000733bf007 address [0x7f8a6991f]
>>> [   48.807179] svm_range_get_range_boundaries:2348: amdgpu: VMA does 
>>> not exist in address [0x7f8a6991f]
>>> [   48.807185] svm_range_restore_pages:2635: amdgpu: failed to 
>>> create unregistered range svms 0x00000000733bf007 address [0x7f8a6991f]
>>>
>>> [   48.807929] amdgpu 0000:25:00.0: amdgpu: [mmhub0] retry page 
>>> fault (src_id:0 ring:0 vmid:8 pasid:32770, for process kfdtest pid 
>>> 3969 thread kfdtest pid 3969)
>>> [   48.808219] amdgpu 0000:25:00.0: amdgpu:   in page starting at 
>>> address 0x00007f8a6991f000 from IH client 0x12 (VMC)
>>> [   48.808230] amdgpu 0000:25:00.0: amdgpu: 
>>> VM_L2_PROTECTION_FAULT_STATUS:0x00800031
>>>
>>>> We could of course parameterize that so that we check the wptr 
>>>> after each IV on IH1, but please not hard coded like this.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>       }
>>>>>         amdgpu_ih_set_rptr(adev, ih);
>>>>
>>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow
  2021-11-11  7:00         ` Christian König
@ 2021-11-11 12:13           ` Felix Kuehling
  2021-11-11 13:43             ` Christian König
  0 siblings, 1 reply; 24+ messages in thread
From: Felix Kuehling @ 2021-11-11 12:13 UTC (permalink / raw)
  To: Christian König, philip yang, Philip Yang, amd-gfx

Am 2021-11-11 um 2:00 a.m. schrieb Christian König:
> Am 11.11.21 um 00:36 schrieb Felix Kuehling:
>> On 2021-11-10 9:31 a.m., Christian König wrote:
>>> Am 10.11.21 um 14:59 schrieb philip yang:
>>>>
>>>> On 2021-11-10 5:15 a.m., Christian König wrote:
>>>>
>>>>> [SNIP]
>>>>
>>>> It is hard to understand, this debug log can explain more details,
>>>> with this debug message patch
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>>> index ed6f8d24280b..8859f2bb11b1 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>>> @@ -234,10 +235,12 @@ int amdgpu_ih_process(struct amdgpu_device
>>>> *adev, struct amdgpu_ih_ring *ih)
>>>>                 return IRQ_NONE;
>>>>
>>>>         wptr = amdgpu_ih_get_wptr(adev, ih);
>>>> +       if (ih == &adev->irq.ih1)
>>>> +               pr_debug("entering rptr 0x%x, wptr 0x%x\n",
>>>> ih->rptr, wptr);
>>>>
>>>>  restart_ih:
>>>> +       if (ih == &adev->irq.ih1)
>>>> +               pr_debug("starting rptr 0x%x, wptr 0x%x\n",
>>>> ih->rptr, wptr);
>>>>
>>>>         /* Order reading of wptr vs. reading of IH ring data */
>>>>         rmb();
>>>> @@ -245,8 +248,12 @@ int amdgpu_ih_process(struct amdgpu_device
>>>> *adev, struct amdgpu_ih_ring *ih)
>>>>         while (ih->rptr != wptr && --count) {
>>>>                 amdgpu_irq_dispatch(adev, ih);
>>>>                 ih->rptr &= ih->ptr_mask;
>>>> +               if (ih == &adev->irq.ih1) {
>>>> +                       pr_debug("rptr 0x%x, old wptr 0x%x, new
>>>> wptr 0x%x\n",
>>>> +                               ih->rptr, wptr,
>>>> +                               amdgpu_ih_get_wptr(adev, ih));
>>>> +               }
>>>>         }
>>>>
>>>>         amdgpu_ih_set_rptr(adev, ih);
>>>> @@ -257,6 +264,8 @@ int amdgpu_ih_process(struct amdgpu_device
>>>> *adev, struct amdgpu_ih_ring *ih)
>>>>         if (wptr != ih->rptr)
>>>>                 goto restart_ih;
>>>>
>>>> +       if (ih == &adev->irq.ih1)
>>>> +               pr_debug("exiting rptr 0x%x, wptr 0x%x\n",
>>>> ih->rptr, wptr);
>>>>         return IRQ_HANDLED;
>>>>  }
>>>>
>>>> This is log, timing 48.807028, ring1 drain is done, rptr == wptr,
>>>> ring1 is empty, but the loop continues, to handle outdated retry
>>>> fault.
>>>>
>>>
>>> As far as I can see that is perfectly correct and expected behavior.
>>>
>>> See the ring buffer overflowed and because of that the loop
>>> continues, but that is correct because an overflow means that the
>>> ring was filled with new entries.
>>>
>>> So we are processing new entries here, not stale ones.
>>
>> Aren't we processing interrupts out-of-order in this case. We're
>> processing newer ones before older ones. Is that the root of the
>> problem because it confuses our interrupt draining function?
>
> Good point.
>
>> Maybe we need to detect overflows in the interrupt draining function
>> to make it wait longer in that case.
>
> Ideally we should use something which is completely separate from all
> those implementation details.
>
> Like for example using the timestamp or a separate indicator/counter
> instead.

Even a timestamp will be broken if the interrupt processing function
handles interrupts out of order.

I think we need a different amdgpu_ih_process function for IH ring 1 the
way we set it up to handle overflows. Because IH is just overwriting
older entries, and we can't read entire IH entries atomically, we have
to use a watermark. If IH WPTR passes the watermark, we have to consider
the ring overflowed and reset our RPTR. We have to set RPTR far enough
"ahead" of the current WPTR to make sure WPTR is under the watermark.
And the watermark needs to be low enough to ensure amdgpu_irq_dispatch
can read out the next IH entry before the WPTR catches up with the RPTR.

Since we don't read the WPTR on every iteration, and out page fault
handling code can take quite a while to process one fault, the watermark
needs to provide a lot of buffer. Maybe we also need to read the WPTR
register more frequently if the last read was more than a jiffy ago.

Whenever an overflow (over the watermark) is detected, we can set a
sticky overflow bit that our page fault handling code can use to clean
up. E.g. once we start using the CAM filter, we'll have to invalidate
all CAM entries when this happens (although I'd expect overflows to
become impossible once we enable the CAM filter).

Thanks,
  Felix


>
> Regards,
> Christian.
>
>>
>> Regards,
>>   Felix
>>
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>> [   48.802231] amdgpu_ih_process:243: amdgpu: starting rptr 0x520,
>>>> wptr 0xd20
>>>> [   48.802235] amdgpu_ih_process:254: amdgpu: rptr 0x540, old wptr
>>>> 0xd20, new wptr 0xd20
>>>> [   48.802256] amdgpu_ih_process:254: amdgpu: rptr 0x560, old wptr
>>>> 0xd20, new wptr 0xd20
>>>> [   48.802260] amdgpu_ih_process:254: amdgpu: rptr 0x580, old wptr
>>>> 0xd20, new wptr 0xd20
>>>> [   48.802281] amdgpu_ih_process:254: amdgpu: rptr 0x5a0, old wptr
>>>> 0xd20, new wptr 0xd20
>>>> [   48.802314] amdgpu_ih_process:254: amdgpu: rptr 0x5c0, old wptr
>>>> 0xd20, new wptr 0xce0
>>>> [   48.802335] amdgpu_ih_process:254: amdgpu: rptr 0x5e0, old wptr
>>>> 0xd20, new wptr 0xce0
>>>> [   48.802356] amdgpu_ih_process:254: amdgpu: rptr 0x600, old wptr
>>>> 0xd20, new wptr 0xce0
>>>> [   48.802376] amdgpu_ih_process:254: amdgpu: rptr 0x620, old wptr
>>>> 0xd20, new wptr 0xce0
>>>> [   48.802396] amdgpu_ih_process:254: amdgpu: rptr 0x640, old wptr
>>>> 0xd20, new wptr 0xce0
>>>> [   48.802401] amdgpu_ih_process:254: amdgpu: rptr 0x660, old wptr
>>>> 0xd20, new wptr 0xce0
>>>> [   48.802421] amdgpu_ih_process:254: amdgpu: rptr 0x680, old wptr
>>>> 0xd20, new wptr 0xce0
>>>> [   48.802442] amdgpu_ih_process:254: amdgpu: rptr 0x6a0, old wptr
>>>> 0xd20, new wptr 0xce0
>>>> [   48.802463] amdgpu_ih_process:254: amdgpu: rptr 0x6c0, old wptr
>>>> 0xd20, new wptr 0xce0
>>>> [   48.802483] amdgpu_ih_process:254: amdgpu: rptr 0x6e0, old wptr
>>>> 0xd20, new wptr 0xce0
>>>> [   48.802503] amdgpu_ih_process:254: amdgpu: rptr 0x700, old wptr
>>>> 0xd20, new wptr 0xce0
>>>> [   48.802523] amdgpu_ih_process:254: amdgpu: rptr 0x720, old wptr
>>>> 0xd20, new wptr 0xce0
>>>> [   48.802544] amdgpu_ih_process:254: amdgpu: rptr 0x740, old wptr
>>>> 0xd20, new wptr 0xce0
>>>> [   48.802565] amdgpu_ih_process:254: amdgpu: rptr 0x760, old wptr
>>>> 0xd20, new wptr 0xce0
>>>> [   48.802569] amdgpu_ih_process:254: amdgpu: rptr 0x780, old wptr
>>>> 0xd20, new wptr 0xce0
>>>> [   48.804392] amdgpu_ih_process:254: amdgpu: rptr 0x7a0, old wptr
>>>> 0xd20, new wptr 0xf00
>>>> [   48.806122] amdgpu_ih_process:254: amdgpu: rptr 0x7c0, old wptr
>>>> 0xd20, new wptr 0x840
>>>> [   48.806155] amdgpu_ih_process:254: amdgpu: rptr 0x7e0, old wptr
>>>> 0xd20, new wptr 0x840
>>>> [   48.806965] amdgpu_ih_process:254: amdgpu: rptr 0x800, old wptr
>>>> 0xd20, new wptr 0x840
>>>> [   48.806995] amdgpu_ih_process:254: amdgpu: rptr 0x820, old wptr
>>>> 0xd20, new wptr 0x840
>>>> [   48.807028] amdgpu_ih_process:254: amdgpu: rptr 0x840, old wptr
>>>> 0xd20, new wptr 0x840
>>>> [   48.807063] amdgpu_ih_process:254: amdgpu: rptr 0x860, old wptr
>>>> 0xd20, new wptr 0x840
>>>> [   48.808421] amdgpu_ih_process:254: amdgpu: rptr 0x880, old wptr
>>>> 0xd20, new wptr 0x840
>>>>
>>>> Cause this gpu vm fault dump because address is unmapped from cpu.
>>>>
>>>> [   48.807071] svm_range_restore_pages:2617: amdgpu: restoring svms
>>>> 0x00000000733bf007 fault address 0x7f8a6991f
>>>>
>>>> [   48.807170] svm_range_restore_pages:2631: amdgpu: failed to find
>>>> prange svms 0x00000000733bf007 address [0x7f8a6991f]
>>>> [   48.807179] svm_range_get_range_boundaries:2348: amdgpu: VMA
>>>> does not exist in address [0x7f8a6991f]
>>>> [   48.807185] svm_range_restore_pages:2635: amdgpu: failed to
>>>> create unregistered range svms 0x00000000733bf007 address
>>>> [0x7f8a6991f]
>>>>
>>>> [   48.807929] amdgpu 0000:25:00.0: amdgpu: [mmhub0] retry page
>>>> fault (src_id:0 ring:0 vmid:8 pasid:32770, for process kfdtest pid
>>>> 3969 thread kfdtest pid 3969)
>>>> [   48.808219] amdgpu 0000:25:00.0: amdgpu:   in page starting at
>>>> address 0x00007f8a6991f000 from IH client 0x12 (VMC)
>>>> [   48.808230] amdgpu 0000:25:00.0: amdgpu:
>>>> VM_L2_PROTECTION_FAULT_STATUS:0x00800031
>>>>
>>>>> We could of course parameterize that so that we check the wptr
>>>>> after each IV on IH1, but please not hard coded like this.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>>       }
>>>>>>         amdgpu_ih_set_rptr(adev, ih);
>>>>>
>>>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow
  2021-11-11 12:13           ` Felix Kuehling
@ 2021-11-11 13:43             ` Christian König
  2021-11-11 13:57               ` Felix Kuehling
  0 siblings, 1 reply; 24+ messages in thread
From: Christian König @ 2021-11-11 13:43 UTC (permalink / raw)
  To: Felix Kuehling, philip yang, Philip Yang, amd-gfx

Am 11.11.21 um 13:13 schrieb Felix Kuehling:
> Am 2021-11-11 um 2:00 a.m. schrieb Christian König:
>> Am 11.11.21 um 00:36 schrieb Felix Kuehling:
>>> On 2021-11-10 9:31 a.m., Christian König wrote:
>>> [SNIP]
>>> Aren't we processing interrupts out-of-order in this case. We're
>>> processing newer ones before older ones. Is that the root of the
>>> problem because it confuses our interrupt draining function?
>> Good point.
>>
>>> Maybe we need to detect overflows in the interrupt draining function
>>> to make it wait longer in that case.
>> Ideally we should use something which is completely separate from all
>> those implementation details.
>>
>> Like for example using the timestamp or a separate indicator/counter
>> instead.
> Even a timestamp will be broken if the interrupt processing function
> handles interrupts out of order.

We can easily detect if the timestamp is going backwards and so filter 
out stale entries.

> I think we need a different amdgpu_ih_process function for IH ring 1 the
> way we set it up to handle overflows. Because IH is just overwriting
> older entries, and we can't read entire IH entries atomically, we have
> to use a watermark. If IH WPTR passes the watermark, we have to consider
> the ring overflowed and reset our RPTR. We have to set RPTR far enough
> "ahead" of the current WPTR to make sure WPTR is under the watermark.
> And the watermark needs to be low enough to ensure amdgpu_irq_dispatch
> can read out the next IH entry before the WPTR catches up with the RPTR.
>
> Since we don't read the WPTR on every iteration, and out page fault
> handling code can take quite a while to process one fault, the watermark
> needs to provide a lot of buffer. Maybe we also need to read the WPTR
> register more frequently if the last read was more than a jiffy ago.

I think trying to solve that with the IH code or hardware is the 
completely wrong approach.

As I said before we need to something more robust and using the 
timestamp sounds like the most logical approach to me.

The only alternative I can see would be to have a hardware assisted flag 
which tells you if you had an overflow or not like we have for IH ring 0.

E.g. something like the following:
1. Copy the next N IV from the RPTR location.
2. Get the current WPTR.
3. If the overflow bit in the WPTR is set update the RPTR to something 
like WPTR+window, clear the overflow bit and repeat at #1.
4. Process the valid IVs.

The down side is that we are loosing a lot of IVs with that. That is 
probably ok for the current approach, but most likely a bad idea if we 
enable the CAM.

Regards,
Christian.

>
> Whenever an overflow (over the watermark) is detected, we can set a
> sticky overflow bit that our page fault handling code can use to clean
> up. E.g. once we start using the CAM filter, we'll have to invalidate
> all CAM entries when this happens (although I'd expect overflows to
> become impossible once we enable the CAM filter).
>
> Thanks,
>    Felix
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow
  2021-11-11 13:43             ` Christian König
@ 2021-11-11 13:57               ` Felix Kuehling
  2021-11-11 21:31                 ` philip yang
  0 siblings, 1 reply; 24+ messages in thread
From: Felix Kuehling @ 2021-11-11 13:57 UTC (permalink / raw)
  To: Christian König, philip yang, Philip Yang, amd-gfx

Am 2021-11-11 um 8:43 a.m. schrieb Christian König:
> Am 11.11.21 um 13:13 schrieb Felix Kuehling:
>> Am 2021-11-11 um 2:00 a.m. schrieb Christian König:
>>> Am 11.11.21 um 00:36 schrieb Felix Kuehling:
>>>> On 2021-11-10 9:31 a.m., Christian König wrote:
>>>> [SNIP]
>>>> Aren't we processing interrupts out-of-order in this case. We're
>>>> processing newer ones before older ones. Is that the root of the
>>>> problem because it confuses our interrupt draining function?
>>> Good point.
>>>
>>>> Maybe we need to detect overflows in the interrupt draining function
>>>> to make it wait longer in that case.
>>> Ideally we should use something which is completely separate from all
>>> those implementation details.
>>>
>>> Like for example using the timestamp or a separate indicator/counter
>>> instead.
>> Even a timestamp will be broken if the interrupt processing function
>> handles interrupts out of order.
>
> We can easily detect if the timestamp is going backwards and so filter
> out stale entries.
>
>> I think we need a different amdgpu_ih_process function for IH ring 1 the
>> way we set it up to handle overflows. Because IH is just overwriting
>> older entries, and we can't read entire IH entries atomically, we have
>> to use a watermark. If IH WPTR passes the watermark, we have to consider
>> the ring overflowed and reset our RPTR. We have to set RPTR far enough
>> "ahead" of the current WPTR to make sure WPTR is under the watermark.
>> And the watermark needs to be low enough to ensure amdgpu_irq_dispatch
>> can read out the next IH entry before the WPTR catches up with the RPTR.
>>
>> Since we don't read the WPTR on every iteration, and out page fault
>> handling code can take quite a while to process one fault, the watermark
>> needs to provide a lot of buffer. Maybe we also need to read the WPTR
>> register more frequently if the last read was more than a jiffy ago.
>
> I think trying to solve that with the IH code or hardware is the
> completely wrong approach.
>
> As I said before we need to something more robust and using the
> timestamp sounds like the most logical approach to me.
>
> The only alternative I can see would be to have a hardware assisted
> flag which tells you if you had an overflow or not like we have for IH
> ring 0.

The *_ih_get_wptr functions already do some overflow handling. I think
we'll need a function to read the overflow bit that amdgpu_ih_process
can call separately, after reading IH entries.


>
> E.g. something like the following:
> 1. Copy the next N IV from the RPTR location.
> 2. Get the current WPTR.
> 3. If the overflow bit in the WPTR is set update the RPTR to something
> like WPTR+window, clear the overflow bit and repeat at #1.
> 4. Process the valid IVs.

OK. Current amdgpu_irq_dispatch reads directly from the IH ring. I think
you're proposing to move reading of the ring into amdgpu_ih_process
where we can discard the entries if an overflow is detected.

Then let amdgpu_irq_dispatch use a copy that's guaranteed to be consistent.


>
> The down side is that we are loosing a lot of IVs with that. That is
> probably ok for the current approach, but most likely a bad idea if we
> enable the CAM.

Right. Once we use the CAM we cannot afford to lose faults. If we do, we
need to clear the CAM.

Regards,
  Felix


>
> Regards,
> Christian.
>
>>
>> Whenever an overflow (over the watermark) is detected, we can set a
>> sticky overflow bit that our page fault handling code can use to clean
>> up. E.g. once we start using the CAM filter, we'll have to invalidate
>> all CAM entries when this happens (although I'd expect overflows to
>> become impossible once we enable the CAM filter).
>>
>> Thanks,
>>    Felix
>>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow
  2021-11-11 13:57               ` Felix Kuehling
@ 2021-11-11 21:31                 ` philip yang
  0 siblings, 0 replies; 24+ messages in thread
From: philip yang @ 2021-11-11 21:31 UTC (permalink / raw)
  To: Felix Kuehling, Christian König, Philip Yang, amd-gfx

[-- Attachment #1: Type: text/html, Size: 6612 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/5] drm/amdkfd: check child range to drain retry fault
  2021-11-10  3:26   ` Felix Kuehling
@ 2021-11-11 21:55     ` philip yang
  0 siblings, 0 replies; 24+ messages in thread
From: philip yang @ 2021-11-11 21:55 UTC (permalink / raw)
  To: Felix Kuehling, Philip Yang, amd-gfx

[-- Attachment #1: Type: text/html, Size: 4119 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/5] drm/amdkfd: restore pages race with process termination
  2021-11-10  4:16   ` Felix Kuehling
@ 2021-11-15 20:50     ` philip yang
  0 siblings, 0 replies; 24+ messages in thread
From: philip yang @ 2021-11-15 20:50 UTC (permalink / raw)
  To: Felix Kuehling, Philip Yang, amd-gfx

[-- Attachment #1: Type: text/html, Size: 5188 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-11-15 20:50 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-09 23:04 [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow Philip Yang
2021-11-09 23:04 ` [PATCH 2/5] drm/amdkfd: check child range to drain retry fault Philip Yang
2021-11-10  3:26   ` Felix Kuehling
2021-11-11 21:55     ` philip yang
2021-11-09 23:04 ` [PATCH 3/5] drm/amdkfd: restore pages race with vma remove Philip Yang
2021-11-10  3:52   ` Felix Kuehling
2021-11-09 23:04 ` [PATCH 4/5] drm/amdkfd: restore pages race with process termination Philip Yang
2021-11-10  4:16   ` Felix Kuehling
2021-11-15 20:50     ` philip yang
2021-11-09 23:04 ` [PATCH 5/5] drm/amdkfd: svm deferred work pin mm Philip Yang
2021-11-10  4:51   ` Felix Kuehling
2021-11-10 10:15 ` [PATCH 1/5] drm/amdgpu: handle IH ring1 overflow Christian König
2021-11-10 13:59   ` philip yang
2021-11-10 14:31     ` Christian König
2021-11-10 14:44       ` philip yang
2021-11-10 14:54         ` Christian König
2021-11-10 15:45           ` philip yang
2021-11-10 15:48             ` Christian König
2021-11-10 23:36       ` Felix Kuehling
2021-11-11  7:00         ` Christian König
2021-11-11 12:13           ` Felix Kuehling
2021-11-11 13:43             ` Christian König
2021-11-11 13:57               ` Felix Kuehling
2021-11-11 21:31                 ` philip yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.