All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] amdgpu/sriov Stop data exchange for wholegpu reset
@ 2021-01-07 10:46 Jack Zhang
  2021-01-08  3:07 ` Zhang, Jack (Jian)
  2021-01-12  9:16 ` Paul Menzel
  0 siblings, 2 replies; 5+ messages in thread
From: Jack Zhang @ 2021-01-07 10:46 UTC (permalink / raw)
  To: amd-gfx; +Cc: jazha, Jack Zhang, Jingwen Chen

[Why]
When host trigger a whole gpu reset, guest will keep
waiting till host finish reset. But there's a work
queue in guest exchanging data between vf&pf which need
to access frame buffer. During whole gpu reset, frame
buffer is not accessable, and this causes the call trace.

[How]
After vf get reset notification from pf, stop data exchange.

Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 +
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c    | 1 +
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c    | 1 +
 3 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 83ca5cbffe2c..3e212862cf5d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -571,6 +571,7 @@ void amdgpu_virt_fini_data_exchange(struct amdgpu_device *adev)
 		DRM_INFO("clean up the vf2pf work item\n");
 		flush_delayed_work(&adev->virt.vf2pf_work);
 		cancel_delayed_work_sync(&adev->virt.vf2pf_work);
+		adev->virt.vf2pf_update_interval_ms = 0;
 	}
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index 7767ccca526b..3ee481557fc9 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -255,6 +255,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work)
 	if (!down_read_trylock(&adev->reset_sem))
 		return;
 
+	amdgpu_virt_fini_data_exchange(adev);
 	atomic_set(&adev->in_gpu_reset, 1);
 
 	do {
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
index dd5c1e6ce009..48e588d3c409 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
@@ -276,6 +276,7 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work)
 	if (!down_read_trylock(&adev->reset_sem))
 		return;
 
+	amdgpu_virt_fini_data_exchange(adev);
 	atomic_set(&adev->in_gpu_reset, 1);
 
 	do {
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH] amdgpu/sriov Stop data exchange for wholegpu reset
  2021-01-07 10:46 [PATCH] amdgpu/sriov Stop data exchange for wholegpu reset Jack Zhang
@ 2021-01-08  3:07 ` Zhang, Jack (Jian)
  2021-01-12  3:19   ` Zhang, Jack (Jian)
  2021-01-12  9:16 ` Paul Menzel
  1 sibling, 1 reply; 5+ messages in thread
From: Zhang, Jack (Jian) @ 2021-01-08  3:07 UTC (permalink / raw)
  To: amd-gfx, Liu, Monk, Chen, JingWen, Deucher, Alexander, Deng, Emily

Ping

-----Original Message-----
From: Jack Zhang <Jack.Zhang1@amd.com> 
Sent: Thursday, January 7, 2021 6:47 PM
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Chen, JingWen <JingWen.Chen2@amd.com>
Subject: [PATCH] amdgpu/sriov Stop data exchange for wholegpu reset

[Why]
When host trigger a whole gpu reset, guest will keep waiting till host finish reset. But there's a work queue in guest exchanging data between vf&pf which need to access frame buffer. During whole gpu reset, frame buffer is not accessable, and this causes the call trace.

[How]
After vf get reset notification from pf, stop data exchange.

Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 +
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c    | 1 +
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c    | 1 +
 3 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 83ca5cbffe2c..3e212862cf5d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -571,6 +571,7 @@ void amdgpu_virt_fini_data_exchange(struct amdgpu_device *adev)
 		DRM_INFO("clean up the vf2pf work item\n");
 		flush_delayed_work(&adev->virt.vf2pf_work);
 		cancel_delayed_work_sync(&adev->virt.vf2pf_work);
+		adev->virt.vf2pf_update_interval_ms = 0;
 	}
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index 7767ccca526b..3ee481557fc9 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -255,6 +255,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work)
 	if (!down_read_trylock(&adev->reset_sem))
 		return;
 
+	amdgpu_virt_fini_data_exchange(adev);
 	atomic_set(&adev->in_gpu_reset, 1);
 
 	do {
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
index dd5c1e6ce009..48e588d3c409 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
@@ -276,6 +276,7 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work)
 	if (!down_read_trylock(&adev->reset_sem))
 		return;
 
+	amdgpu_virt_fini_data_exchange(adev);
 	atomic_set(&adev->in_gpu_reset, 1);
 
 	do {
--
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH] amdgpu/sriov Stop data exchange for wholegpu reset
  2021-01-08  3:07 ` Zhang, Jack (Jian)
@ 2021-01-12  3:19   ` Zhang, Jack (Jian)
  2021-01-12  5:55     ` Liu, Monk
  0 siblings, 1 reply; 5+ messages in thread
From: Zhang, Jack (Jian) @ 2021-01-12  3:19 UTC (permalink / raw)
  To: Zhang, Hawking, amd-gfx, Liu, Monk, Chen, JingWen, Deucher,
	Alexander, Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

Ping...

-----Original Message-----
From: Zhang, Jack (Jian)
Sent: Friday, January 8, 2021 11:07 AM
To: amd-gfx@lists.freedesktop.org; Liu, Monk <Monk.Liu@amd.com>; Chen, JingWen <JingWen.Chen2@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Deng, Emily <Emily.Deng@amd.com>
Subject: RE: [PATCH] amdgpu/sriov Stop data exchange for wholegpu reset

Ping

-----Original Message-----
From: Jack Zhang <Jack.Zhang1@amd.com>
Sent: Thursday, January 7, 2021 6:47 PM
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Chen, JingWen <JingWen.Chen2@amd.com>
Subject: [PATCH] amdgpu/sriov Stop data exchange for wholegpu reset

[Why]
When host trigger a whole gpu reset, guest will keep waiting till host finish reset. But there's a work queue in guest exchanging data between vf&pf which need to access frame buffer. During whole gpu reset, frame buffer is not accessable, and this causes the call trace.

[How]
After vf get reset notification from pf, stop data exchange.

Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 +
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c    | 1 +
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c    | 1 +
 3 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 83ca5cbffe2c..3e212862cf5d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -571,6 +571,7 @@ void amdgpu_virt_fini_data_exchange(struct amdgpu_device *adev)
 DRM_INFO("clean up the vf2pf work item\n");
 flush_delayed_work(&adev->virt.vf2pf_work);
 cancel_delayed_work_sync(&adev->virt.vf2pf_work);
+adev->virt.vf2pf_update_interval_ms = 0;
 }
 }

diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index 7767ccca526b..3ee481557fc9 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -255,6 +255,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work)
 if (!down_read_trylock(&adev->reset_sem))
 return;

+amdgpu_virt_fini_data_exchange(adev);
 atomic_set(&adev->in_gpu_reset, 1);

 do {
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
index dd5c1e6ce009..48e588d3c409 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
@@ -276,6 +276,7 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work)
 if (!down_read_trylock(&adev->reset_sem))
 return;

+amdgpu_virt_fini_data_exchange(adev);
 atomic_set(&adev->in_gpu_reset, 1);

 do {
--
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH] amdgpu/sriov Stop data exchange for wholegpu reset
  2021-01-12  3:19   ` Zhang, Jack (Jian)
@ 2021-01-12  5:55     ` Liu, Monk
  0 siblings, 0 replies; 5+ messages in thread
From: Liu, Monk @ 2021-01-12  5:55 UTC (permalink / raw)
  To: Zhang, Jack (Jian),
	Zhang, Hawking, amd-gfx, Chen, JingWen, Deucher, Alexander, Deng,
	Emily

[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Monk Liu <monk.liu@amd.com>

Thanks 

------------------------------------------
Monk Liu | Cloud-GPU Core team
------------------------------------------

-----Original Message-----
From: Zhang, Jack (Jian) <Jack.Zhang1@amd.com> 
Sent: Tuesday, January 12, 2021 11:20 AM
To: Zhang, Hawking <Hawking.Zhang@amd.com>; amd-gfx@lists.freedesktop.org; Liu, Monk <Monk.Liu@amd.com>; Chen, JingWen <JingWen.Chen2@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Deng, Emily <Emily.Deng@amd.com>
Subject: RE: [PATCH] amdgpu/sriov Stop data exchange for wholegpu reset

[AMD Official Use Only - Internal Distribution Only]

Ping...

-----Original Message-----
From: Zhang, Jack (Jian)
Sent: Friday, January 8, 2021 11:07 AM
To: amd-gfx@lists.freedesktop.org; Liu, Monk <Monk.Liu@amd.com>; Chen, JingWen <JingWen.Chen2@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Deng, Emily <Emily.Deng@amd.com>
Subject: RE: [PATCH] amdgpu/sriov Stop data exchange for wholegpu reset

Ping

-----Original Message-----
From: Jack Zhang <Jack.Zhang1@amd.com>
Sent: Thursday, January 7, 2021 6:47 PM
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Zhang, Jack (Jian) <Jack.Zhang1@amd.com>; Chen, JingWen <JingWen.Chen2@amd.com>
Subject: [PATCH] amdgpu/sriov Stop data exchange for wholegpu reset

[Why]
When host trigger a whole gpu reset, guest will keep waiting till host finish reset. But there's a work queue in guest exchanging data between vf&pf which need to access frame buffer. During whole gpu reset, frame buffer is not accessable, and this causes the call trace.

[How]
After vf get reset notification from pf, stop data exchange.

Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 +
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c    | 1 +
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c    | 1 +
 3 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 83ca5cbffe2c..3e212862cf5d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -571,6 +571,7 @@ void amdgpu_virt_fini_data_exchange(struct amdgpu_device *adev)  DRM_INFO("clean up the vf2pf work item\n");  flush_delayed_work(&adev->virt.vf2pf_work);
 cancel_delayed_work_sync(&adev->virt.vf2pf_work);
+adev->virt.vf2pf_update_interval_ms = 0;
 }
 }

diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index 7767ccca526b..3ee481557fc9 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -255,6 +255,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work)  if (!down_read_trylock(&adev->reset_sem))
 return;

+amdgpu_virt_fini_data_exchange(adev);
 atomic_set(&adev->in_gpu_reset, 1);

 do {
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
index dd5c1e6ce009..48e588d3c409 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
@@ -276,6 +276,7 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work)  if (!down_read_trylock(&adev->reset_sem))
 return;

+amdgpu_virt_fini_data_exchange(adev);
 atomic_set(&adev->in_gpu_reset, 1);

 do {
--
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] amdgpu/sriov Stop data exchange for wholegpu reset
  2021-01-07 10:46 [PATCH] amdgpu/sriov Stop data exchange for wholegpu reset Jack Zhang
  2021-01-08  3:07 ` Zhang, Jack (Jian)
@ 2021-01-12  9:16 ` Paul Menzel
  1 sibling, 0 replies; 5+ messages in thread
From: Paul Menzel @ 2021-01-12  9:16 UTC (permalink / raw)
  To: Jack Zhang; +Cc: jazha, Jingwen Chen, amd-gfx

Dear Jack,


Thank you for your patch.

Please add a colon after amdgpu/sriov in the commit message summary.

Am 07.01.21 um 11:46 schrieb Jack Zhang:
> [Why]
> When host trigger a whole gpu reset, guest will keep

*hosts trigger* or *host triggers*

> waiting till host finish reset. But there's a work

finishes

> queue in guest exchanging data between vf&pf which need

needs

> to access frame buffer. During whole gpu reset, frame
> buffer is not accessable, and this causes the call trace.

accessible (a spell checker should have caught that)

Can you please paste part of the trace, so it’s easily findable by users 
running into this.

> [How]
> After vf get reset notification from pf, stop data exchange.

How can this be reproduced and tested?


Kind regards,

Paul


> Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
> Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 +
>   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c    | 1 +
>   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c    | 1 +
>   3 files changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> index 83ca5cbffe2c..3e212862cf5d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> @@ -571,6 +571,7 @@ void amdgpu_virt_fini_data_exchange(struct amdgpu_device *adev)
>   		DRM_INFO("clean up the vf2pf work item\n");
>   		flush_delayed_work(&adev->virt.vf2pf_work);
>   		cancel_delayed_work_sync(&adev->virt.vf2pf_work);
> +		adev->virt.vf2pf_update_interval_ms = 0;
>   	}
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
> index 7767ccca526b..3ee481557fc9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
> @@ -255,6 +255,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work)
>   	if (!down_read_trylock(&adev->reset_sem))
>   		return;
>   
> +	amdgpu_virt_fini_data_exchange(adev);
>   	atomic_set(&adev->in_gpu_reset, 1);
>   
>   	do {
> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
> index dd5c1e6ce009..48e588d3c409 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
> @@ -276,6 +276,7 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work)
>   	if (!down_read_trylock(&adev->reset_sem))
>   		return;
>   
> +	amdgpu_virt_fini_data_exchange(adev);
>   	atomic_set(&adev->in_gpu_reset, 1);
>   
>   	do {
> 
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-01-12  9:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-07 10:46 [PATCH] amdgpu/sriov Stop data exchange for wholegpu reset Jack Zhang
2021-01-08  3:07 ` Zhang, Jack (Jian)
2021-01-12  3:19   ` Zhang, Jack (Jian)
2021-01-12  5:55     ` Liu, Monk
2021-01-12  9:16 ` Paul Menzel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.