* [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
@ 2018-12-07 6:09 wentalou
[not found] ` <1544162942-17349-1-git-send-email-Wentao.Lou-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: wentalou @ 2018-12-07 6:09 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: wentalou
XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev,
but outside req_full_gpu of sriov.
It would make sriov hang during reset.
Change-Id: I5b3e2a42c77b3b9635419df4470d021df7be34d1
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ef36cc5..659dd40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3474,14 +3474,16 @@ static void amdgpu_device_lock_adev(struct amdgpu_device *adev)
mutex_lock(&adev->lock_reset);
atomic_inc(&adev->gpu_reset_counter);
adev->in_gpu_reset = 1;
- /* Block kfd */
- amdgpu_amdkfd_pre_reset(adev);
+ /* Block kfd: SRIOV would do it separately */
+ if (!amdgpu_sriov_vf(adev))
+ amdgpu_amdkfd_pre_reset(adev);
}
static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
{
- /*unlock kfd */
- amdgpu_amdkfd_post_reset(adev);
+ /*unlock kfd: SRIOV would do it separately */
+ if (!amdgpu_sriov_vf(adev))
+ amdgpu_amdkfd_post_reset(adev);
amdgpu_vf_error_trans_all(adev);
adev->in_gpu_reset = 0;
mutex_unlock(&adev->lock_reset);
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 5+ messages in thread
* RE: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
[not found] ` <1544162942-17349-1-git-send-email-Wentao.Lou-5C7GfCeVMHo@public.gmane.org>
@ 2018-12-10 16:09 ` Liu, Shaoyun
[not found] ` <BN4PR12MB0882CB80C6E0194436444AD4F4A50-aH9FTdWx9BYw01zZLexVOwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Liu, Shaoyun @ 2018-12-10 16:09 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Grodzovsky, Andrey,
Kuehling, Felix
Cc: Lou, Wentao
But KFD still need to be notified during reset , the pre_reset call to KFD will let KFD have a chance to suspend all the running process queues. Was the reset works normally on SRIOV before the refactor change for XGMI support ? We shouldn't change the logic .
Regards
shaoyun.liu
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of wentalou
Sent: Friday, December 7, 2018 1:09 AM
To: amd-gfx@lists.freedesktop.org
Cc: Lou, Wentao <Wentao.Lou@amd.com>
Subject: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev, but outside req_full_gpu of sriov.
It would make sriov hang during reset.
Change-Id: I5b3e2a42c77b3b9635419df4470d021df7be34d1
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ef36cc5..659dd40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3474,14 +3474,16 @@ static void amdgpu_device_lock_adev(struct amdgpu_device *adev)
mutex_lock(&adev->lock_reset);
atomic_inc(&adev->gpu_reset_counter);
adev->in_gpu_reset = 1;
- /* Block kfd */
- amdgpu_amdkfd_pre_reset(adev);
+ /* Block kfd: SRIOV would do it separately */
+ if (!amdgpu_sriov_vf(adev))
+ amdgpu_amdkfd_pre_reset(adev);
}
static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) {
- /*unlock kfd */
- amdgpu_amdkfd_post_reset(adev);
+ /*unlock kfd: SRIOV would do it separately */
+ if (!amdgpu_sriov_vf(adev))
+ amdgpu_amdkfd_post_reset(adev);
amdgpu_vf_error_trans_all(adev);
adev->in_gpu_reset = 0;
mutex_unlock(&adev->lock_reset);
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 5+ messages in thread
* RE: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
[not found] ` <BN4PR12MB0882CB80C6E0194436444AD4F4A50-aH9FTdWx9BYw01zZLexVOwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2018-12-11 4:53 ` Lou, Wentao
[not found] ` <BYAPR12MB27424E2E6236D4E73E233E4083A60-ZGDeBxoHBPmbrehcvEBedAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Lou, Wentao @ 2018-12-11 4:53 UTC (permalink / raw)
To: Liu, Shaoyun, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
Grodzovsky, Andrey, Kuehling, Felix
Sriov should not call amdgpu_amdkfd_pre_reset inside amdgpu_device_lock_adev,
either amdgpu_amdkfd_post_reset inside amdgpu_device_unlock_adev.
In branch amd-staging-dkms-4.18, sriov already called amdgpu_amdkfd_pre_reset and amdgpu_amdkfd_post_reset inside amdgpu_device_reset_sriov.
There 2 functions needed to be inside sriov's amdgpu_virt_request_full_gpu, or it would cause sriov hang.
amdgpu_amdkfd_pre_reset inside amdgpu_device_lock_adev was duplicated for sriov, and caused cause sriov hang when entering amdgpu_device_lock_adev.
That's the reason to add " if (!amdgpu_sriov_vf(adev))" based on branch amd-staging-dkms-4.18.
BR,
Wentao
-----Original Message-----
From: Liu, Shaoyun <Shaoyun.Liu@amd.com>
Sent: Tuesday, December 11, 2018 12:10 AM
To: Lou, Wentao <Wentao.Lou@amd.com>; amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Cc: Lou, Wentao <Wentao.Lou@amd.com>
Subject: RE: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
But KFD still need to be notified during reset , the pre_reset call to KFD will let KFD have a chance to suspend all the running process queues. Was the reset works normally on SRIOV before the refactor change for XGMI support ? We shouldn't change the logic .
Regards
shaoyun.liu
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of wentalou
Sent: Friday, December 7, 2018 1:09 AM
To: amd-gfx@lists.freedesktop.org
Cc: Lou, Wentao <Wentao.Lou@amd.com>
Subject: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev, but outside req_full_gpu of sriov.
It would make sriov hang during reset.
Change-Id: I5b3e2a42c77b3b9635419df4470d021df7be34d1
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ef36cc5..659dd40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3474,14 +3474,16 @@ static void amdgpu_device_lock_adev(struct amdgpu_device *adev)
mutex_lock(&adev->lock_reset);
atomic_inc(&adev->gpu_reset_counter);
adev->in_gpu_reset = 1;
- /* Block kfd */
- amdgpu_amdkfd_pre_reset(adev);
+ /* Block kfd: SRIOV would do it separately */
+ if (!amdgpu_sriov_vf(adev))
+ amdgpu_amdkfd_pre_reset(adev);
}
static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) {
- /*unlock kfd */
- amdgpu_amdkfd_post_reset(adev);
+ /*unlock kfd: SRIOV would do it separately */
+ if (!amdgpu_sriov_vf(adev))
+ amdgpu_amdkfd_post_reset(adev);
amdgpu_vf_error_trans_all(adev);
adev->in_gpu_reset = 0;
mutex_unlock(&adev->lock_reset);
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 5+ messages in thread
* RE: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
[not found] ` <BYAPR12MB27424E2E6236D4E73E233E4083A60-ZGDeBxoHBPmbrehcvEBedAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2018-12-11 16:21 ` Liu, Shaoyun
0 siblings, 0 replies; 5+ messages in thread
From: Liu, Shaoyun @ 2018-12-11 16:21 UTC (permalink / raw)
To: Lou, Wentao, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
Grodzovsky, Andrey, Kuehling, Felix
I see, so ok for me . You can added
Reviewed-by :Shaoyun.liu <Shaoyun.Liu@amd.com>
-----Original Message-----
From: Lou, Wentao
Sent: Monday, December 10, 2018 11:54 PM
To: Liu, Shaoyun <Shaoyun.Liu@amd.com>; amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Subject: RE: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
Sriov should not call amdgpu_amdkfd_pre_reset inside amdgpu_device_lock_adev, either amdgpu_amdkfd_post_reset inside amdgpu_device_unlock_adev.
In branch amd-staging-dkms-4.18, sriov already called amdgpu_amdkfd_pre_reset and amdgpu_amdkfd_post_reset inside amdgpu_device_reset_sriov.
There 2 functions needed to be inside sriov's amdgpu_virt_request_full_gpu, or it would cause sriov hang.
amdgpu_amdkfd_pre_reset inside amdgpu_device_lock_adev was duplicated for sriov, and caused cause sriov hang when entering amdgpu_device_lock_adev.
That's the reason to add " if (!amdgpu_sriov_vf(adev))" based on branch amd-staging-dkms-4.18.
BR,
Wentao
-----Original Message-----
From: Liu, Shaoyun <Shaoyun.Liu@amd.com>
Sent: Tuesday, December 11, 2018 12:10 AM
To: Lou, Wentao <Wentao.Lou@amd.com>; amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Cc: Lou, Wentao <Wentao.Lou@amd.com>
Subject: RE: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
But KFD still need to be notified during reset , the pre_reset call to KFD will let KFD have a chance to suspend all the running process queues. Was the reset works normally on SRIOV before the refactor change for XGMI support ? We shouldn't change the logic .
Regards
shaoyun.liu
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of wentalou
Sent: Friday, December 7, 2018 1:09 AM
To: amd-gfx@lists.freedesktop.org
Cc: Lou, Wentao <Wentao.Lou@amd.com>
Subject: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev, but outside req_full_gpu of sriov.
It would make sriov hang during reset.
Change-Id: I5b3e2a42c77b3b9635419df4470d021df7be34d1
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ef36cc5..659dd40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3474,14 +3474,16 @@ static void amdgpu_device_lock_adev(struct amdgpu_device *adev)
mutex_lock(&adev->lock_reset);
atomic_inc(&adev->gpu_reset_counter);
adev->in_gpu_reset = 1;
- /* Block kfd */
- amdgpu_amdkfd_pre_reset(adev);
+ /* Block kfd: SRIOV would do it separately */
+ if (!amdgpu_sriov_vf(adev))
+ amdgpu_amdkfd_pre_reset(adev);
}
static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) {
- /*unlock kfd */
- amdgpu_amdkfd_post_reset(adev);
+ /*unlock kfd: SRIOV would do it separately */
+ if (!amdgpu_sriov_vf(adev))
+ amdgpu_amdkfd_post_reset(adev);
amdgpu_vf_error_trans_all(adev);
adev->in_gpu_reset = 0;
mutex_unlock(&adev->lock_reset);
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
@ 2018-12-10 6:17 wentalou
0 siblings, 0 replies; 5+ messages in thread
From: wentalou @ 2018-12-10 6:17 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: wentalou
XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev,
but outside req_full_gpu of sriov.
It would make sriov hang during reset.
Change-Id: I5b3e2a42c77b3b9635419df4470d021df7be34d1
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ef36cc5..659dd40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3474,14 +3474,16 @@ static void amdgpu_device_lock_adev(struct amdgpu_device *adev)
mutex_lock(&adev->lock_reset);
atomic_inc(&adev->gpu_reset_counter);
adev->in_gpu_reset = 1;
- /* Block kfd */
- amdgpu_amdkfd_pre_reset(adev);
+ /* Block kfd: SRIOV would do it separately */
+ if (!amdgpu_sriov_vf(adev))
+ amdgpu_amdkfd_pre_reset(adev);
}
static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
{
- /*unlock kfd */
- amdgpu_amdkfd_post_reset(adev);
+ /*unlock kfd: SRIOV would do it separately */
+ if (!amdgpu_sriov_vf(adev))
+ amdgpu_amdkfd_post_reset(adev);
amdgpu_vf_error_trans_all(adev);
adev->in_gpu_reset = 0;
mutex_unlock(&adev->lock_reset);
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-12-11 16:21 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-07 6:09 [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang wentalou
[not found] ` <1544162942-17349-1-git-send-email-Wentao.Lou-5C7GfCeVMHo@public.gmane.org>
2018-12-10 16:09 ` Liu, Shaoyun
[not found] ` <BN4PR12MB0882CB80C6E0194436444AD4F4A50-aH9FTdWx9BYw01zZLexVOwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-12-11 4:53 ` Lou, Wentao
[not found] ` <BYAPR12MB27424E2E6236D4E73E233E4083A60-ZGDeBxoHBPmbrehcvEBedAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-12-11 16:21 ` Liu, Shaoyun
2018-12-10 6:17 wentalou
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.