All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/amdkfd: wait migration done only if migration starts
@ 2021-04-29  1:53 Philip Yang
  2021-04-29  1:53 ` [PATCH 2/2] drm/amdkfd: flush TLB after updating GPU page table Philip Yang
  2021-04-29  6:10 ` [PATCH 1/2] drm/amdkfd: wait migration done only if migration starts Felix Kuehling
  0 siblings, 2 replies; 5+ messages in thread
From: Philip Yang @ 2021-04-29  1:53 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang

If migration vma setup, but failed before start sdma memory copy, e.g.
process is killed, don't wait for sdma fence done.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 6b810863f6ba..19b08247ba8a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -460,10 +460,12 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
 	}
 
 	if (migrate.cpages) {
-		svm_migrate_copy_to_vram(adev, prange, &migrate, &mfence,
-					 scratch);
-		migrate_vma_pages(&migrate);
-		svm_migrate_copy_done(adev, mfence);
+		r = svm_migrate_copy_to_vram(adev, prange, &migrate, &mfence,
+					     scratch);
+		if (!r) {
+			migrate_vma_pages(&migrate);
+			svm_migrate_copy_done(adev, mfence);
+		}
 		migrate_vma_finalize(&migrate);
 	}
 
@@ -663,10 +665,12 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct svm_range *prange,
 	pr_debug("cpages %ld\n", migrate.cpages);
 
 	if (migrate.cpages) {
-		svm_migrate_copy_to_ram(adev, prange, &migrate, &mfence,
-					scratch);
-		migrate_vma_pages(&migrate);
-		svm_migrate_copy_done(adev, mfence);
+		r = svm_migrate_copy_to_ram(adev, prange, &migrate, &mfence,
+					    scratch);
+		if (!r) {
+			migrate_vma_pages(&migrate);
+			svm_migrate_copy_done(adev, mfence);
+		}
 		migrate_vma_finalize(&migrate);
 	} else {
 		pr_debug("failed collect migrate device pages [0x%lx 0x%lx]\n",
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] drm/amdkfd: flush TLB after updating GPU page table
  2021-04-29  1:53 [PATCH 1/2] drm/amdkfd: wait migration done only if migration starts Philip Yang
@ 2021-04-29  1:53 ` Philip Yang
  2021-04-29  6:11   ` Felix Kuehling
  2021-04-29  6:10 ` [PATCH 1/2] drm/amdkfd: wait migration done only if migration starts Felix Kuehling
  1 sibling, 1 reply; 5+ messages in thread
From: Philip Yang @ 2021-04-29  1:53 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang

To workaround the situation that vm retry fault keep coming after page
table update. We are investigating the root cause, but once this issue
happens, application will stuck and sometimes have to reboot to recover.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index d9111fea724b..a165e51c4a1c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1225,6 +1225,9 @@ static int svm_range_map_to_gpus(struct svm_range *prange,
 				break;
 			}
 		}
+
+		amdgpu_amdkfd_flush_gpu_tlb_pasid((struct kgd_dev *)adev,
+						  p->pasid);
 	}
 
 	return r;
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] drm/amdkfd: wait migration done only if migration starts
  2021-04-29  1:53 [PATCH 1/2] drm/amdkfd: wait migration done only if migration starts Philip Yang
  2021-04-29  1:53 ` [PATCH 2/2] drm/amdkfd: flush TLB after updating GPU page table Philip Yang
@ 2021-04-29  6:10 ` Felix Kuehling
  2021-05-05 17:54   ` philip yang
  1 sibling, 1 reply; 5+ messages in thread
From: Felix Kuehling @ 2021-04-29  6:10 UTC (permalink / raw)
  To: amd-gfx, Yang, Philip

Am 2021-04-28 um 9:53 p.m. schrieb Philip Yang:

> If migration vma setup, but failed before start sdma memory copy, e.g.
> process is killed, don't wait for sdma fence done.

I think you could describe this more generally as "Handle errors
returned by svm_migrate_copy_to_vram/ram".


>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> index 6b810863f6ba..19b08247ba8a 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> @@ -460,10 +460,12 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
>  	}
>  
>  	if (migrate.cpages) {
> -		svm_migrate_copy_to_vram(adev, prange, &migrate, &mfence,
> -					 scratch);
> -		migrate_vma_pages(&migrate);
> -		svm_migrate_copy_done(adev, mfence);
> +		r = svm_migrate_copy_to_vram(adev, prange, &migrate, &mfence,
> +					     scratch);
> +		if (!r) {
> +			migrate_vma_pages(&migrate);
> +			svm_migrate_copy_done(adev, mfence);

I think there are failure cases where svm_migrate_copy_to_vram
successfully copies some pages but fails somewhere in the middle. I
think in those cases you still want to call migrate_vma_pages and
svm_migrate_copy_done. If the copy never started for some reason, there
should be no mfence and svm_migrate_copy_done should be a no-op.

I probably don't understand the failure scenario you encountered. Can
you explain that in more detail?

Thanks,
  Felix


> +		}
>  		migrate_vma_finalize(&migrate);
>  	}
>  
> @@ -663,10 +665,12 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct svm_range *prange,
>  	pr_debug("cpages %ld\n", migrate.cpages);
>  
>  	if (migrate.cpages) {
> -		svm_migrate_copy_to_ram(adev, prange, &migrate, &mfence,
> -					scratch);
> -		migrate_vma_pages(&migrate);
> -		svm_migrate_copy_done(adev, mfence);
> +		r = svm_migrate_copy_to_ram(adev, prange, &migrate, &mfence,
> +					    scratch);
> +		if (!r) {
> +			migrate_vma_pages(&migrate);
> +			svm_migrate_copy_done(adev, mfence);
> +		}
>  		migrate_vma_finalize(&migrate);
>  	} else {
>  		pr_debug("failed collect migrate device pages [0x%lx 0x%lx]\n",
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] drm/amdkfd: flush TLB after updating GPU page table
  2021-04-29  1:53 ` [PATCH 2/2] drm/amdkfd: flush TLB after updating GPU page table Philip Yang
@ 2021-04-29  6:11   ` Felix Kuehling
  0 siblings, 0 replies; 5+ messages in thread
From: Felix Kuehling @ 2021-04-29  6:11 UTC (permalink / raw)
  To: Philip Yang, amd-gfx

Am 2021-04-28 um 9:53 p.m. schrieb Philip Yang:
> To workaround the situation that vm retry fault keep coming after page
> table update. We are investigating the root cause, but once this issue
> happens, application will stuck and sometimes have to reboot to recover.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>

This patch is

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index d9111fea724b..a165e51c4a1c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -1225,6 +1225,9 @@ static int svm_range_map_to_gpus(struct svm_range *prange,
>  				break;
>  			}
>  		}
> +
> +		amdgpu_amdkfd_flush_gpu_tlb_pasid((struct kgd_dev *)adev,
> +						  p->pasid);
>  	}
>  
>  	return r;
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] drm/amdkfd: wait migration done only if migration starts
  2021-04-29  6:10 ` [PATCH 1/2] drm/amdkfd: wait migration done only if migration starts Felix Kuehling
@ 2021-05-05 17:54   ` philip yang
  0 siblings, 0 replies; 5+ messages in thread
From: philip yang @ 2021-05-05 17:54 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx, Yang, Philip

[-- Attachment #1: Type: text/html, Size: 14652 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-05-05 17:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-29  1:53 [PATCH 1/2] drm/amdkfd: wait migration done only if migration starts Philip Yang
2021-04-29  1:53 ` [PATCH 2/2] drm/amdkfd: flush TLB after updating GPU page table Philip Yang
2021-04-29  6:11   ` Felix Kuehling
2021-04-29  6:10 ` [PATCH 1/2] drm/amdkfd: wait migration done only if migration starts Felix Kuehling
2021-05-05 17:54   ` philip yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.