* [PATCH] drm/amdgpu: fix ib test hang with gfxoff enabled @ 2018-06-01 6:41 Huang Rui [not found] ` <1527835264-31040-1-git-send-email-ray.huang-5C7GfCeVMHo@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Huang Rui @ 2018-06-01 6:41 UTC (permalink / raw) To: Alex Deucher, Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Cc: Huang Rui, Hawking Zhang After defer the execution of gfx/compute ib tests. However, at that time, the gfx already go into "mid state" of gfxoff. PWR_MISC_CNTL_STATUS: PWR_GFXOFF_STATUS field (2:1 bits) 0 = GFXOFF. 1 = Transition out of GFXOFF state. 2 = Not in GFXOFF. 3 = Transition into GFXOFF. If hit the mid state (1 or 3), the doorbell writing interrupt cannot wake up the gfx back successfully. And the field value is 1 when we issue the ib test at that, so we got the hang. This is the root cause that we encountered the issue. Meanwhile, we cannot set clockgating of GFX after gfx is already in "off" state. So here we should move the gfx powergating and gfxoff enabling behavior at the end of initialization behind ib test and clockgating. Signed-off-by: Huang Rui <ray.huang@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++++ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 5 ----- drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 2 +- drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c | 4 ++-- 4 files changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index f509d32..e1c8806 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1723,6 +1723,16 @@ static int amdgpu_device_ip_late_set_cg_state(struct amdgpu_device *adev) } } } + + if (adev->powerplay.pp_feature & PP_GFXOFF_MASK) { + amdgpu_device_ip_set_powergating_state(adev, + AMD_IP_BLOCK_TYPE_GFX, + AMD_CG_STATE_GATE); + amdgpu_device_ip_set_powergating_state(adev, + AMD_IP_BLOCK_TYPE_SMC, + AMD_CG_STATE_GATE); + } + return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 2c5e2a4..31ecc86 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -3358,11 +3358,6 @@ static int gfx_v9_0_late_init(void *handle) if (r) return r; - r = amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_GFX, - AMD_PG_STATE_GATE); - if (r) - return r; - return 0; } diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c index b493369..d0e6e2d 100644 --- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c +++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c @@ -245,7 +245,7 @@ static int pp_set_powergating_state(void *handle, } if (hwmgr->hwmgr_func->enable_per_cu_power_gating == NULL) { - pr_info("%s was not implemented.\n", __func__); + pr_debug("%s was not implemented.\n", __func__); return 0; } diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c index 7712eb6..b72d089 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c @@ -284,7 +284,7 @@ static int smu10_disable_gfx_off(struct pp_hwmgr *hwmgr) static int smu10_disable_dpm_tasks(struct pp_hwmgr *hwmgr) { - return smu10_disable_gfx_off(hwmgr); + return 0; } static int smu10_enable_gfx_off(struct pp_hwmgr *hwmgr) @@ -299,7 +299,7 @@ static int smu10_enable_gfx_off(struct pp_hwmgr *hwmgr) static int smu10_enable_dpm_tasks(struct pp_hwmgr *hwmgr) { - return smu10_enable_gfx_off(hwmgr); + return 0; } static int smu10_gfx_off_control(struct pp_hwmgr *hwmgr, bool enable) -- 2.7.4 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 7+ messages in thread
[parent not found: <1527835264-31040-1-git-send-email-ray.huang-5C7GfCeVMHo@public.gmane.org>]
* Re: [PATCH] drm/amdgpu: fix ib test hang with gfxoff enabled [not found] ` <1527835264-31040-1-git-send-email-ray.huang-5C7GfCeVMHo@public.gmane.org> @ 2018-06-01 9:13 ` Christian König [not found] ` <0898687d-fd87-c1d5-d484-f44d4c56d2a6-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Christian König @ 2018-06-01 9:13 UTC (permalink / raw) To: Huang Rui, Alex Deucher, Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Cc: Hawking Zhang Am 01.06.2018 um 08:41 schrieb Huang Rui: > After defer the execution of gfx/compute ib tests. However, at that time, the > gfx already go into "mid state" of gfxoff. > > PWR_MISC_CNTL_STATUS: PWR_GFXOFF_STATUS field (2:1 bits) > 0 = GFXOFF. > 1 = Transition out of GFXOFF state. > 2 = Not in GFXOFF. > 3 = Transition into GFXOFF. > > If hit the mid state (1 or 3), the doorbell writing interrupt cannot wake up the > gfx back successfully. And the field value is 1 when we issue the ib test at > that, so we got the hang. This is the root cause that we encountered the issue. > > Meanwhile, we cannot set clockgating of GFX after gfx is already in "off" state. > So here we should move the gfx powergating and gfxoff enabling behavior at the > end of initialization behind ib test and clockgating. Mhm, that still looks like a only halve backed solution: 1. What prevents this bug from happening during "normal" IB submission from userspace? 2. Shouldn't we poll the PWR_MISC_CNTL_STATUS register to make sure we are not in any transition phase instead? Regards, Christian. > > Signed-off-by: Huang Rui <ray.huang@amd.com> > Cc: Hawking Zhang <Hawking.Zhang@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++++ > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 5 ----- > drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 2 +- > drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c | 4 ++-- > 4 files changed, 13 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index f509d32..e1c8806 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -1723,6 +1723,16 @@ static int amdgpu_device_ip_late_set_cg_state(struct amdgpu_device *adev) > } > } > } > + > + if (adev->powerplay.pp_feature & PP_GFXOFF_MASK) { > + amdgpu_device_ip_set_powergating_state(adev, > + AMD_IP_BLOCK_TYPE_GFX, > + AMD_CG_STATE_GATE); > + amdgpu_device_ip_set_powergating_state(adev, > + AMD_IP_BLOCK_TYPE_SMC, > + AMD_CG_STATE_GATE); > + } > + > return 0; > } > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > index 2c5e2a4..31ecc86 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > @@ -3358,11 +3358,6 @@ static int gfx_v9_0_late_init(void *handle) > if (r) > return r; > > - r = amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_GFX, > - AMD_PG_STATE_GATE); > - if (r) > - return r; > - > return 0; > } > > diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c > index b493369..d0e6e2d 100644 > --- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c > +++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c > @@ -245,7 +245,7 @@ static int pp_set_powergating_state(void *handle, > } > > if (hwmgr->hwmgr_func->enable_per_cu_power_gating == NULL) { > - pr_info("%s was not implemented.\n", __func__); > + pr_debug("%s was not implemented.\n", __func__); > return 0; > } > > diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c > index 7712eb6..b72d089 100644 > --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c > +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c > @@ -284,7 +284,7 @@ static int smu10_disable_gfx_off(struct pp_hwmgr *hwmgr) > > static int smu10_disable_dpm_tasks(struct pp_hwmgr *hwmgr) > { > - return smu10_disable_gfx_off(hwmgr); > + return 0; > } > > static int smu10_enable_gfx_off(struct pp_hwmgr *hwmgr) > @@ -299,7 +299,7 @@ static int smu10_enable_gfx_off(struct pp_hwmgr *hwmgr) > > static int smu10_enable_dpm_tasks(struct pp_hwmgr *hwmgr) > { > - return smu10_enable_gfx_off(hwmgr); > + return 0; > } > > static int smu10_gfx_off_control(struct pp_hwmgr *hwmgr, bool enable) _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <0898687d-fd87-c1d5-d484-f44d4c56d2a6-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: [PATCH] drm/amdgpu: fix ib test hang with gfxoff enabled [not found] ` <0898687d-fd87-c1d5-d484-f44d4c56d2a6-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2018-06-01 9:29 ` Huang Rui 2018-06-01 10:09 ` Christian König 0 siblings, 1 reply; 7+ messages in thread From: Huang Rui @ 2018-06-01 9:29 UTC (permalink / raw) To: Koenig, Christian Cc: Deucher, Alexander, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Zhang, Hawking On Fri, Jun 01, 2018 at 05:13:49PM +0800, Christian König wrote: > Am 01.06.2018 um 08:41 schrieb Huang Rui: > > After defer the execution of gfx/compute ib tests. However, at that time, the > > gfx already go into "mid state" of gfxoff. > > > > PWR_MISC_CNTL_STATUS: PWR_GFXOFF_STATUS field (2:1 bits) > > 0 = GFXOFF. > > 1 = Transition out of GFXOFF state. > > 2 = Not in GFXOFF. > > 3 = Transition into GFXOFF. > > > > If hit the mid state (1 or 3), the doorbell writing interrupt cannot wake up the > > gfx back successfully. And the field value is 1 when we issue the ib test at > > that, so we got the hang. This is the root cause that we encountered the issue. > > > > Meanwhile, we cannot set clockgating of GFX after gfx is already in "off" state. > > So here we should move the gfx powergating and gfxoff enabling behavior at the > > end of initialization behind ib test and clockgating. > > Mhm, that still looks like a only halve backed solution: > > 1. What prevents this bug from happening during "normal" IB submission > from userspace? > > 2. Shouldn't we poll the PWR_MISC_CNTL_STATUS register to make sure we > are not in any transition phase instead? > Yes, right. How about also add polling of PWR_MISC_CNTL_STATUS in amdgpu_ring_commit() behind set_wptr that confirm the status as "0" or "2"? Thanks, Ray _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/amdgpu: fix ib test hang with gfxoff enabled 2018-06-01 9:29 ` Huang Rui @ 2018-06-01 10:09 ` Christian König [not found] ` <4266ad90-6d02-646e-994b-c492fbdbf0eb-5C7GfCeVMHo@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Christian König @ 2018-06-01 10:09 UTC (permalink / raw) To: Huang Rui Cc: Deucher, Alexander, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Zhang, Hawking Am 01.06.2018 um 11:29 schrieb Huang Rui: > On Fri, Jun 01, 2018 at 05:13:49PM +0800, Christian König wrote: >> Am 01.06.2018 um 08:41 schrieb Huang Rui: >>> After defer the execution of gfx/compute ib tests. However, at that time, the >>> gfx already go into "mid state" of gfxoff. >>> >>> PWR_MISC_CNTL_STATUS: PWR_GFXOFF_STATUS field (2:1 bits) >>> 0 = GFXOFF. >>> 1 = Transition out of GFXOFF state. >>> 2 = Not in GFXOFF. >>> 3 = Transition into GFXOFF. >>> >>> If hit the mid state (1 or 3), the doorbell writing interrupt cannot wake up the >>> gfx back successfully. And the field value is 1 when we issue the ib test at >>> that, so we got the hang. This is the root cause that we encountered the issue. >>> >>> Meanwhile, we cannot set clockgating of GFX after gfx is already in "off" state. >>> So here we should move the gfx powergating and gfxoff enabling behavior at the >>> end of initialization behind ib test and clockgating. >> Mhm, that still looks like a only halve backed solution: >> >> 1. What prevents this bug from happening during "normal" IB submission >> from userspace? >> >> 2. Shouldn't we poll the PWR_MISC_CNTL_STATUS register to make sure we >> are not in any transition phase instead? >> > Yes, right. How about also add polling of PWR_MISC_CNTL_STATUS in > amdgpu_ring_commit() behind set_wptr that confirm the status as "0" or "2"? You could add an end_use() callback for that, but I think we rather need to do this in gfx_v9_0_ring_set_wptr_gfx() before we write the doorbell. Christian. > > Thanks, > Ray _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <4266ad90-6d02-646e-994b-c492fbdbf0eb-5C7GfCeVMHo@public.gmane.org>]
* Re: [PATCH] drm/amdgpu: fix ib test hang with gfxoff enabled [not found] ` <4266ad90-6d02-646e-994b-c492fbdbf0eb-5C7GfCeVMHo@public.gmane.org> @ 2018-06-01 19:01 ` Felix Kuehling [not found] ` <87096349-b8de-8268-3893-d89fb54385bd-5C7GfCeVMHo@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Felix Kuehling @ 2018-06-01 19:01 UTC (permalink / raw) To: Christian König, Huang Rui Cc: Deucher, Alexander, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Zhang, Hawking On 2018-06-01 06:09 AM, Christian König wrote: > Am 01.06.2018 um 11:29 schrieb Huang Rui: >> On Fri, Jun 01, 2018 at 05:13:49PM +0800, Christian König wrote: >>> Am 01.06.2018 um 08:41 schrieb Huang Rui: >>>> After defer the execution of gfx/compute ib tests. However, at that >>>> time, the >>>> gfx already go into "mid state" of gfxoff. >>>> >>>> PWR_MISC_CNTL_STATUS: PWR_GFXOFF_STATUS field (2:1 bits) >>>> 0 = GFXOFF. >>>> 1 = Transition out of GFXOFF state. >>>> 2 = Not in GFXOFF. >>>> 3 = Transition into GFXOFF. >>>> >>>> If hit the mid state (1 or 3), the doorbell writing interrupt >>>> cannot wake up the >>>> gfx back successfully. And the field value is 1 when we issue the >>>> ib test at >>>> that, so we got the hang. This is the root cause that we >>>> encountered the issue. >>>> >>>> Meanwhile, we cannot set clockgating of GFX after gfx is already in >>>> "off" state. >>>> So here we should move the gfx powergating and gfxoff enabling >>>> behavior at the >>>> end of initialization behind ib test and clockgating. >>> Mhm, that still looks like a only halve backed solution: >>> >>> 1. What prevents this bug from happening during "normal" IB submission >>> from userspace? >>> >>> 2. Shouldn't we poll the PWR_MISC_CNTL_STATUS register to make sure we >>> are not in any transition phase instead? >>> >> Yes, right. How about also add polling of PWR_MISC_CNTL_STATUS in >> amdgpu_ring_commit() behind set_wptr that confirm the status as "0" >> or "2"? > > You could add an end_use() callback for that, but I think we rather > need to do this in gfx_v9_0_ring_set_wptr_gfx() before we write the > doorbell. Isn't testing the status like this is a potential race condition. Having to do this at all is contrary to the documentation that I've read. Writing a doorbell should wake up the GFX engine. Are we sure that we understand the cause of the problem correctly? Does the IB test use any MMIO? Maybe it's doing an HDP flush using MMIO for a ring that doesn't support HDP flushing. Regards, Felix > > Christian. > >> >> Thanks, >> Ray > > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <87096349-b8de-8268-3893-d89fb54385bd-5C7GfCeVMHo@public.gmane.org>]
* Re: [PATCH] drm/amdgpu: fix ib test hang with gfxoff enabled [not found] ` <87096349-b8de-8268-3893-d89fb54385bd-5C7GfCeVMHo@public.gmane.org> @ 2018-06-04 7:52 ` Huang Rui 2018-06-04 7:53 ` Christian König 0 siblings, 1 reply; 7+ messages in thread From: Huang Rui @ 2018-06-04 7:52 UTC (permalink / raw) To: Kuehling, Felix Cc: Deucher, Alexander, Pan, Morris, Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Zhang, Hawking On Sat, Jun 02, 2018 at 03:01:57AM +0800, Kuehling, Felix wrote: > On 2018-06-01 06:09 AM, Christian König wrote: > > Am 01.06.2018 um 11:29 schrieb Huang Rui: > >> On Fri, Jun 01, 2018 at 05:13:49PM +0800, Christian König wrote: > >>> Am 01.06.2018 um 08:41 schrieb Huang Rui: > >>>> After defer the execution of gfx/compute ib tests. However, at that > >>>> time, the > >>>> gfx already go into "mid state" of gfxoff. > >>>> > >>>> PWR_MISC_CNTL_STATUS: PWR_GFXOFF_STATUS field (2:1 bits) > >>>> 0 = GFXOFF. > >>>> 1 = Transition out of GFXOFF state. > >>>> 2 = Not in GFXOFF. > >>>> 3 = Transition into GFXOFF. > >>>> > >>>> If hit the mid state (1 or 3), the doorbell writing interrupt > >>>> cannot wake up the > >>>> gfx back successfully. And the field value is 1 when we issue the > >>>> ib test at > >>>> that, so we got the hang. This is the root cause that we > >>>> encountered the issue. > >>>> > >>>> Meanwhile, we cannot set clockgating of GFX after gfx is already in > >>>> "off" state. > >>>> So here we should move the gfx powergating and gfxoff enabling > >>>> behavior at the > >>>> end of initialization behind ib test and clockgating. > >>> Mhm, that still looks like a only halve backed solution: > >>> > >>> 1. What prevents this bug from happening during "normal" IB submission > >>> from userspace? > >>> > >>> 2. Shouldn't we poll the PWR_MISC_CNTL_STATUS register to make sure we > >>> are not in any transition phase instead? > >>> > >> Yes, right. How about also add polling of PWR_MISC_CNTL_STATUS in > >> amdgpu_ring_commit() behind set_wptr that confirm the status as "0" > >> or "2"? > > > > You could add an end_use() callback for that, but I think we rather > > need to do this in gfx_v9_0_ring_set_wptr_gfx() before we write the > > doorbell. > Isn't testing the status like this is a potential race condition. > > Having to do this at all is contrary to the documentation that I've > read. Writing a doorbell should wake up the GFX engine. Are we sure that > we understand the cause of the problem correctly? Does the IB test use > any MMIO? Maybe it's doing an HDP flush using MMIO for a ring that > doesn't support HDP flushing. > Felix, thanks to reminder. I supposed you mentioned MMIO using is to avoid runtime gfx register access, right? Our IB test uses WRITE_DATA packet to write specific pattern value into the gart memory. I don't use any gfx registers. And gfxoff is only supported on raven, we don't emit hdp flush on apu. Actually, I also doubted whether it is caused race condition. But the hang happens when only modprobe amdgpu module, and not startx at that time. It won't have any other commands from user space. + Morris, who works for raven SMC firmware. After discuessed with him, he suggested that we would better to confirm the GFXOFF_STATUS as 0 or 2, then write the doorbell. Because if GFXOFF_STATUS is 1 or 3 in mid state (in-progress of translation), SMC will drop the doorbell interrupt. When GFXOFF status is 0 or 2, already in the target state, SMC can repond interrupt at once. Morris, please correct me if I was wrong. Thanks, Ray _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/amdgpu: fix ib test hang with gfxoff enabled 2018-06-04 7:52 ` Huang Rui @ 2018-06-04 7:53 ` Christian König 0 siblings, 0 replies; 7+ messages in thread From: Christian König @ 2018-06-04 7:53 UTC (permalink / raw) To: Huang Rui, Kuehling, Felix Cc: Deucher, Alexander, Pan, Morris, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Zhang, Hawking Am 04.06.2018 um 09:52 schrieb Huang Rui: > On Sat, Jun 02, 2018 at 03:01:57AM +0800, Kuehling, Felix wrote: >> On 2018-06-01 06:09 AM, Christian König wrote: >>> Am 01.06.2018 um 11:29 schrieb Huang Rui: >>>> On Fri, Jun 01, 2018 at 05:13:49PM +0800, Christian König wrote: >>>>> Am 01.06.2018 um 08:41 schrieb Huang Rui: >>>>>> After defer the execution of gfx/compute ib tests. However, at that >>>>>> time, the >>>>>> gfx already go into "mid state" of gfxoff. >>>>>> >>>>>> PWR_MISC_CNTL_STATUS: PWR_GFXOFF_STATUS field (2:1 bits) >>>>>> 0 = GFXOFF. >>>>>> 1 = Transition out of GFXOFF state. >>>>>> 2 = Not in GFXOFF. >>>>>> 3 = Transition into GFXOFF. >>>>>> >>>>>> If hit the mid state (1 or 3), the doorbell writing interrupt >>>>>> cannot wake up the >>>>>> gfx back successfully. And the field value is 1 when we issue the >>>>>> ib test at >>>>>> that, so we got the hang. This is the root cause that we >>>>>> encountered the issue. >>>>>> >>>>>> Meanwhile, we cannot set clockgating of GFX after gfx is already in >>>>>> "off" state. >>>>>> So here we should move the gfx powergating and gfxoff enabling >>>>>> behavior at the >>>>>> end of initialization behind ib test and clockgating. >>>>> Mhm, that still looks like a only halve backed solution: >>>>> >>>>> 1. What prevents this bug from happening during "normal" IB submission >>>>> from userspace? >>>>> >>>>> 2. Shouldn't we poll the PWR_MISC_CNTL_STATUS register to make sure we >>>>> are not in any transition phase instead? >>>>> >>>> Yes, right. How about also add polling of PWR_MISC_CNTL_STATUS in >>>> amdgpu_ring_commit() behind set_wptr that confirm the status as "0" >>>> or "2"? >>> You could add an end_use() callback for that, but I think we rather >>> need to do this in gfx_v9_0_ring_set_wptr_gfx() before we write the >>> doorbell. >> Isn't testing the status like this is a potential race condition. Well it could when we use both GFX and compute at the same time. >> >> Having to do this at all is contrary to the documentation that I've >> read. Writing a doorbell should wake up the GFX engine. Are we sure that >> we understand the cause of the problem correctly? Does the IB test use >> any MMIO? Maybe it's doing an HDP flush using MMIO for a ring that >> doesn't support HDP flushing. >> > Felix, thanks to reminder. I supposed you mentioned MMIO using is to avoid > runtime gfx register access, right? Our IB test uses WRITE_DATA packet to > write specific pattern value into the gart memory. I don't use any gfx > registers. And gfxoff is only supported on raven, we don't emit hdp flush > on apu. Actually, I also doubted whether it is caused race condition. But > the hang happens when only modprobe amdgpu module, and not startx at that > time. It won't have any other commands from user space. Felix is perfectly right that this doesn't sounds like a complete solution to the problem. The IB test only brings the issue to the surface, working around it by delaying enabling gfxoff would only hide the real problem. I'm pretty sure that the exact same race can happen with startx or other command submissions as well. > + Morris, who works for raven SMC firmware. > After discuessed with him, he suggested that we would better to confirm the > GFXOFF_STATUS as 0 or 2, then write the doorbell. Because if GFXOFF_STATUS > is 1 or 3 in mid state (in-progress of translation), SMC will drop the > doorbell interrupt. When GFXOFF status is 0 or 2, already in the target > state, SMC can repond interrupt at once. Morris, please correct me if I was > wrong. That will be rather hard to guarantee and would completely circumvent the idea behind gfxoff, e.g. the driver would need to assist the firmware again turning things on/off. Regards, Christian. > > Thanks, > Ray _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-06-04 7:53 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-06-01 6:41 [PATCH] drm/amdgpu: fix ib test hang with gfxoff enabled Huang Rui [not found] ` <1527835264-31040-1-git-send-email-ray.huang-5C7GfCeVMHo@public.gmane.org> 2018-06-01 9:13 ` Christian König [not found] ` <0898687d-fd87-c1d5-d484-f44d4c56d2a6-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2018-06-01 9:29 ` Huang Rui 2018-06-01 10:09 ` Christian König [not found] ` <4266ad90-6d02-646e-994b-c492fbdbf0eb-5C7GfCeVMHo@public.gmane.org> 2018-06-01 19:01 ` Felix Kuehling [not found] ` <87096349-b8de-8268-3893-d89fb54385bd-5C7GfCeVMHo@public.gmane.org> 2018-06-04 7:52 ` Huang Rui 2018-06-04 7:53 ` Christian König
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.