All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
@ 2018-12-07  6:09 wentalou
       [not found] ` <1544162942-17349-1-git-send-email-Wentao.Lou-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: wentalou @ 2018-12-07  6:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: wentalou

XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev,
but outside req_full_gpu of sriov.
It would make sriov hang during reset.

Change-Id: I5b3e2a42c77b3b9635419df4470d021df7be34d1
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ef36cc5..659dd40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3474,14 +3474,16 @@ static void amdgpu_device_lock_adev(struct amdgpu_device *adev)
 	mutex_lock(&adev->lock_reset);
 	atomic_inc(&adev->gpu_reset_counter);
 	adev->in_gpu_reset = 1;
-	/* Block kfd */
-	amdgpu_amdkfd_pre_reset(adev);
+	/* Block kfd: SRIOV would do it separately */
+	if (!amdgpu_sriov_vf(adev))
+                amdgpu_amdkfd_pre_reset(adev);
 }
 
 static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
 {
-	/*unlock kfd */
-	amdgpu_amdkfd_post_reset(adev);
+	/*unlock kfd: SRIOV would do it separately */
+	if (!amdgpu_sriov_vf(adev))
+                amdgpu_amdkfd_post_reset(adev);
 	amdgpu_vf_error_trans_all(adev);
 	adev->in_gpu_reset = 0;
 	mutex_unlock(&adev->lock_reset);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
       [not found] ` <1544162942-17349-1-git-send-email-Wentao.Lou-5C7GfCeVMHo@public.gmane.org>
@ 2018-12-10 16:09   ` Liu, Shaoyun
       [not found]     ` <BN4PR12MB0882CB80C6E0194436444AD4F4A50-aH9FTdWx9BYw01zZLexVOwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Liu, Shaoyun @ 2018-12-10 16:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Grodzovsky, Andrey,
	Kuehling, Felix
  Cc: Lou, Wentao

But KFD still need to be notified during reset , the  pre_reset call to KFD will let KFD have  a chance to suspend all the  running process queues.  Was the reset works normally on SRIOV before the refactor change for  XGMI support ?  We shouldn't change the logic . 

Regards
shaoyun.liu

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of wentalou
Sent: Friday, December 7, 2018 1:09 AM
To: amd-gfx@lists.freedesktop.org
Cc: Lou, Wentao <Wentao.Lou@amd.com>
Subject: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang

XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev, but outside req_full_gpu of sriov.
It would make sriov hang during reset.

Change-Id: I5b3e2a42c77b3b9635419df4470d021df7be34d1
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ef36cc5..659dd40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3474,14 +3474,16 @@ static void amdgpu_device_lock_adev(struct amdgpu_device *adev)
 	mutex_lock(&adev->lock_reset);
 	atomic_inc(&adev->gpu_reset_counter);
 	adev->in_gpu_reset = 1;
-	/* Block kfd */
-	amdgpu_amdkfd_pre_reset(adev);
+	/* Block kfd: SRIOV would do it separately */
+	if (!amdgpu_sriov_vf(adev))
+                amdgpu_amdkfd_pre_reset(adev);
 }
 
 static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)  {
-	/*unlock kfd */
-	amdgpu_amdkfd_post_reset(adev);
+	/*unlock kfd: SRIOV would do it separately */
+	if (!amdgpu_sriov_vf(adev))
+                amdgpu_amdkfd_post_reset(adev);
 	amdgpu_vf_error_trans_all(adev);
 	adev->in_gpu_reset = 0;
 	mutex_unlock(&adev->lock_reset);
--
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
       [not found]     ` <BN4PR12MB0882CB80C6E0194436444AD4F4A50-aH9FTdWx9BYw01zZLexVOwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2018-12-11  4:53       ` Lou, Wentao
       [not found]         ` <BYAPR12MB27424E2E6236D4E73E233E4083A60-ZGDeBxoHBPmbrehcvEBedAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Lou, Wentao @ 2018-12-11  4:53 UTC (permalink / raw)
  To: Liu, Shaoyun, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Grodzovsky, Andrey, Kuehling, Felix

Sriov should not call amdgpu_amdkfd_pre_reset inside amdgpu_device_lock_adev,
either amdgpu_amdkfd_post_reset inside amdgpu_device_unlock_adev.
In branch amd-staging-dkms-4.18, sriov already called amdgpu_amdkfd_pre_reset and amdgpu_amdkfd_post_reset inside amdgpu_device_reset_sriov.
There 2 functions needed to be inside sriov's amdgpu_virt_request_full_gpu, or it would cause sriov hang.
amdgpu_amdkfd_pre_reset inside amdgpu_device_lock_adev was duplicated for sriov, and caused  cause sriov hang when entering amdgpu_device_lock_adev.
That's the reason to add " if (!amdgpu_sriov_vf(adev))" based on branch amd-staging-dkms-4.18.

BR,
Wentao

-----Original Message-----
From: Liu, Shaoyun <Shaoyun.Liu@amd.com> 
Sent: Tuesday, December 11, 2018 12:10 AM
To: Lou, Wentao <Wentao.Lou@amd.com>; amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Cc: Lou, Wentao <Wentao.Lou@amd.com>
Subject: RE: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang

But KFD still need to be notified during reset , the  pre_reset call to KFD will let KFD have  a chance to suspend all the  running process queues.  Was the reset works normally on SRIOV before the refactor change for  XGMI support ?  We shouldn't change the logic . 

Regards
shaoyun.liu

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of wentalou
Sent: Friday, December 7, 2018 1:09 AM
To: amd-gfx@lists.freedesktop.org
Cc: Lou, Wentao <Wentao.Lou@amd.com>
Subject: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang

XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev, but outside req_full_gpu of sriov.
It would make sriov hang during reset.

Change-Id: I5b3e2a42c77b3b9635419df4470d021df7be34d1
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ef36cc5..659dd40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3474,14 +3474,16 @@ static void amdgpu_device_lock_adev(struct amdgpu_device *adev)
 	mutex_lock(&adev->lock_reset);
 	atomic_inc(&adev->gpu_reset_counter);
 	adev->in_gpu_reset = 1;
-	/* Block kfd */
-	amdgpu_amdkfd_pre_reset(adev);
+	/* Block kfd: SRIOV would do it separately */
+	if (!amdgpu_sriov_vf(adev))
+                amdgpu_amdkfd_pre_reset(adev);
 }
 
 static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)  {
-	/*unlock kfd */
-	amdgpu_amdkfd_post_reset(adev);
+	/*unlock kfd: SRIOV would do it separately */
+	if (!amdgpu_sriov_vf(adev))
+                amdgpu_amdkfd_post_reset(adev);
 	amdgpu_vf_error_trans_all(adev);
 	adev->in_gpu_reset = 0;
 	mutex_unlock(&adev->lock_reset);
--
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
       [not found]         ` <BYAPR12MB27424E2E6236D4E73E233E4083A60-ZGDeBxoHBPmbrehcvEBedAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2018-12-11 16:21           ` Liu, Shaoyun
  0 siblings, 0 replies; 5+ messages in thread
From: Liu, Shaoyun @ 2018-12-11 16:21 UTC (permalink / raw)
  To: Lou, Wentao, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Grodzovsky, Andrey, Kuehling, Felix

I see,  so ok for me .  You can added 
Reviewed-by :Shaoyun.liu <Shaoyun.Liu@amd.com>


-----Original Message-----
From: Lou, Wentao 
Sent: Monday, December 10, 2018 11:54 PM
To: Liu, Shaoyun <Shaoyun.Liu@amd.com>; amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang

Sriov should not call amdgpu_amdkfd_pre_reset inside amdgpu_device_lock_adev, either amdgpu_amdkfd_post_reset inside amdgpu_device_unlock_adev.
In branch amd-staging-dkms-4.18, sriov already called amdgpu_amdkfd_pre_reset and amdgpu_amdkfd_post_reset inside amdgpu_device_reset_sriov.
There 2 functions needed to be inside sriov's amdgpu_virt_request_full_gpu, or it would cause sriov hang.
amdgpu_amdkfd_pre_reset inside amdgpu_device_lock_adev was duplicated for sriov, and caused  cause sriov hang when entering amdgpu_device_lock_adev.
That's the reason to add " if (!amdgpu_sriov_vf(adev))" based on branch amd-staging-dkms-4.18.

BR,
Wentao

-----Original Message-----
From: Liu, Shaoyun <Shaoyun.Liu@amd.com>
Sent: Tuesday, December 11, 2018 12:10 AM
To: Lou, Wentao <Wentao.Lou@amd.com>; amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Cc: Lou, Wentao <Wentao.Lou@amd.com>
Subject: RE: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang

But KFD still need to be notified during reset , the  pre_reset call to KFD will let KFD have  a chance to suspend all the  running process queues.  Was the reset works normally on SRIOV before the refactor change for  XGMI support ?  We shouldn't change the logic . 

Regards
shaoyun.liu

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of wentalou
Sent: Friday, December 7, 2018 1:09 AM
To: amd-gfx@lists.freedesktop.org
Cc: Lou, Wentao <Wentao.Lou@amd.com>
Subject: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang

XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev, but outside req_full_gpu of sriov.
It would make sriov hang during reset.

Change-Id: I5b3e2a42c77b3b9635419df4470d021df7be34d1
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ef36cc5..659dd40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3474,14 +3474,16 @@ static void amdgpu_device_lock_adev(struct amdgpu_device *adev)
 	mutex_lock(&adev->lock_reset);
 	atomic_inc(&adev->gpu_reset_counter);
 	adev->in_gpu_reset = 1;
-	/* Block kfd */
-	amdgpu_amdkfd_pre_reset(adev);
+	/* Block kfd: SRIOV would do it separately */
+	if (!amdgpu_sriov_vf(adev))
+                amdgpu_amdkfd_pre_reset(adev);
 }
 
 static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)  {
-	/*unlock kfd */
-	amdgpu_amdkfd_post_reset(adev);
+	/*unlock kfd: SRIOV would do it separately */
+	if (!amdgpu_sriov_vf(adev))
+                amdgpu_amdkfd_post_reset(adev);
 	amdgpu_vf_error_trans_all(adev);
 	adev->in_gpu_reset = 0;
 	mutex_unlock(&adev->lock_reset);
--
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
@ 2018-12-10  6:17 wentalou
  0 siblings, 0 replies; 5+ messages in thread
From: wentalou @ 2018-12-10  6:17 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: wentalou

XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev,
but outside req_full_gpu of sriov.
It would make sriov hang during reset.

Change-Id: I5b3e2a42c77b3b9635419df4470d021df7be34d1
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ef36cc5..659dd40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3474,14 +3474,16 @@ static void amdgpu_device_lock_adev(struct amdgpu_device *adev)
 	mutex_lock(&adev->lock_reset);
 	atomic_inc(&adev->gpu_reset_counter);
 	adev->in_gpu_reset = 1;
-	/* Block kfd */
-	amdgpu_amdkfd_pre_reset(adev);
+	/* Block kfd: SRIOV would do it separately */
+	if (!amdgpu_sriov_vf(adev))
+                amdgpu_amdkfd_pre_reset(adev);
 }
 
 static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
 {
-	/*unlock kfd */
-	amdgpu_amdkfd_post_reset(adev);
+	/*unlock kfd: SRIOV would do it separately */
+	if (!amdgpu_sriov_vf(adev))
+                amdgpu_amdkfd_post_reset(adev);
 	amdgpu_vf_error_trans_all(adev);
 	adev->in_gpu_reset = 0;
 	mutex_unlock(&adev->lock_reset);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-12-11 16:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-07  6:09 [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang wentalou
     [not found] ` <1544162942-17349-1-git-send-email-Wentao.Lou-5C7GfCeVMHo@public.gmane.org>
2018-12-10 16:09   ` Liu, Shaoyun
     [not found]     ` <BN4PR12MB0882CB80C6E0194436444AD4F4A50-aH9FTdWx9BYw01zZLexVOwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-12-11  4:53       ` Lou, Wentao
     [not found]         ` <BYAPR12MB27424E2E6236D4E73E233E4083A60-ZGDeBxoHBPmbrehcvEBedAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-12-11 16:21           ` Liu, Shaoyun
2018-12-10  6:17 wentalou

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.