* [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list @ 2019-04-01 8:58 wentalou [not found] ` <1554109125-21890-1-git-send-email-Wentao.Lou-5C7GfCeVMHo@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: wentalou @ 2019-04-01 8:58 UTC (permalink / raw) To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: wentalou amdgpu_bo_restore_shadow would assign zero to r if succeeded. r would remain zero if there is only one node in shadow_list. current code would always return failure when r <= 0. Change-Id: Iae6880e7c78b71fde6a6754c69665c2e312a80a5 Signed-off-by: Wentao Lou <Wentao.Lou@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index c4c61e9..5cf21a4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3171,6 +3171,7 @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev) struct dma_fence *fence = NULL, *next = NULL; struct amdgpu_bo *shadow; long r = 1, tmo; + bool single_shadow = false; if (amdgpu_sriov_runtime(adev)) tmo = msecs_to_jiffies(8000); @@ -3194,10 +3195,12 @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev) r = dma_fence_wait_timeout(fence, false, tmo); dma_fence_put(fence); fence = next; + single_shadow = false; if (r <= 0) break; } else { fence = next; + single_shadow = true; } } mutex_unlock(&adev->shadow_list_lock); @@ -3206,7 +3209,8 @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev) tmo = dma_fence_wait_timeout(fence, false, tmo); dma_fence_put(fence); - if (r <= 0 || tmo <= 0) { + /* r would be zero even if amdgpu_bo_restore_shadow succeeded when single shadow in list */ + if (r < 0 || (r == 0 && !single_shadow) || tmo <= 0) { DRM_ERROR("recover vram bo from shadow failed\n"); return -EIO; } -- 2.7.4 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 7+ messages in thread
[parent not found: <1554109125-21890-1-git-send-email-Wentao.Lou-5C7GfCeVMHo@public.gmane.org>]
* RE: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list [not found] ` <1554109125-21890-1-git-send-email-Wentao.Lou-5C7GfCeVMHo@public.gmane.org> @ 2019-04-02 4:03 ` Deng, Emily [not found] ` <BN7PR12MB26445D172F6617137A8F905C8F560-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Deng, Emily @ 2019-04-02 4:03 UTC (permalink / raw) To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Lou, Wentao Maybe it will be better to add follow check, and change “if (r <= 0 || tmo <= 0) " to "if (r <0 || tmo <= 0)". r = dma_fence_wait_timeout(f, false, timeout); if (r == 0) { r = -ETIMEDOUT; break; } else if (r < 0) { break; } Best wishes Emily Deng >-----Original Message----- >From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of wentalou >Sent: Monday, April 1, 2019 4:59 PM >To: amd-gfx@lists.freedesktop.org >Cc: Lou, Wentao <Wentao.Lou@amd.com> >Subject: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if >only one node in shadow_list > >amdgpu_bo_restore_shadow would assign zero to r if succeeded. >r would remain zero if there is only one node in shadow_list. >current code would always return failure when r <= 0. > >Change-Id: Iae6880e7c78b71fde6a6754c69665c2e312a80a5 >Signed-off-by: Wentao Lou <Wentao.Lou@amd.com> >--- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >index c4c61e9..5cf21a4 100644 >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >@@ -3171,6 +3171,7 @@ static int amdgpu_device_recover_vram(struct >amdgpu_device *adev) > struct dma_fence *fence = NULL, *next = NULL; > struct amdgpu_bo *shadow; > long r = 1, tmo; >+ bool single_shadow = false; > > if (amdgpu_sriov_runtime(adev)) > tmo = msecs_to_jiffies(8000); >@@ -3194,10 +3195,12 @@ static int amdgpu_device_recover_vram(struct >amdgpu_device *adev) > r = dma_fence_wait_timeout(fence, false, tmo); > dma_fence_put(fence); > fence = next; >+ single_shadow = false; > if (r <= 0) > break; > } else { > fence = next; >+ single_shadow = true; > } > } > mutex_unlock(&adev->shadow_list_lock); >@@ -3206,7 +3209,8 @@ static int amdgpu_device_recover_vram(struct >amdgpu_device *adev) > tmo = dma_fence_wait_timeout(fence, false, tmo); > dma_fence_put(fence); > >- if (r <= 0 || tmo <= 0) { >+ /* r would be zero even if amdgpu_bo_restore_shadow succeeded when >single shadow in list */ >+ if (r < 0 || (r == 0 && !single_shadow) || tmo <= 0) { > DRM_ERROR("recover vram bo from shadow failed\n"); > return -EIO; > } >-- >2.7.4 > >_______________________________________________ >amd-gfx mailing list >amd-gfx@lists.freedesktop.org >https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <BN7PR12MB26445D172F6617137A8F905C8F560-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>]
* Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list [not found] ` <BN7PR12MB26445D172F6617137A8F905C8F560-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> @ 2019-04-02 7:01 ` Christian König [not found] ` <03f69f82-1352-8e4e-bd3b-21e5c031ec09-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Christian König @ 2019-04-02 7:01 UTC (permalink / raw) To: Deng, Emily, Lou, Wentao, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Yeah, agree that is much closer to a correct solution. But even better would be to correctly update the timeout as well, e.g: tmo = dma_fence_wait_timeout(fence, false, tmo); dma_fence_put(fence); fence = next; if (tmo == 0) r = -ETIMEDOUT; break } else if (tmo < 0) { r = tmo; break; } That we restart the timeout for each wait looks like a rather problematic bug to me as well. Christian. Am 02.04.19 um 06:03 schrieb Deng, Emily: > Maybe it will be better to add follow check, and change “if (r <= 0 || tmo <= 0) " to "if (r <0 || tmo <= 0)". > r = dma_fence_wait_timeout(f, false, timeout); > if (r == 0) { > r = -ETIMEDOUT; > break; > } else if (r < 0) { > break; > } > > Best wishes > Emily Deng > > >> -----Original Message----- >> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of wentalou >> Sent: Monday, April 1, 2019 4:59 PM >> To: amd-gfx@lists.freedesktop.org >> Cc: Lou, Wentao <Wentao.Lou@amd.com> >> Subject: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if >> only one node in shadow_list >> >> amdgpu_bo_restore_shadow would assign zero to r if succeeded. >> r would remain zero if there is only one node in shadow_list. >> current code would always return failure when r <= 0. >> >> Change-Id: Iae6880e7c78b71fde6a6754c69665c2e312a80a5 >> Signed-off-by: Wentao Lou <Wentao.Lou@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index c4c61e9..5cf21a4 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3171,6 +3171,7 @@ static int amdgpu_device_recover_vram(struct >> amdgpu_device *adev) >> struct dma_fence *fence = NULL, *next = NULL; >> struct amdgpu_bo *shadow; >> long r = 1, tmo; >> + bool single_shadow = false; >> >> if (amdgpu_sriov_runtime(adev)) >> tmo = msecs_to_jiffies(8000); >> @@ -3194,10 +3195,12 @@ static int amdgpu_device_recover_vram(struct >> amdgpu_device *adev) >> r = dma_fence_wait_timeout(fence, false, tmo); >> dma_fence_put(fence); >> fence = next; >> + single_shadow = false; >> if (r <= 0) >> break; >> } else { >> fence = next; >> + single_shadow = true; >> } >> } >> mutex_unlock(&adev->shadow_list_lock); >> @@ -3206,7 +3209,8 @@ static int amdgpu_device_recover_vram(struct >> amdgpu_device *adev) >> tmo = dma_fence_wait_timeout(fence, false, tmo); >> dma_fence_put(fence); >> >> - if (r <= 0 || tmo <= 0) { >> + /* r would be zero even if amdgpu_bo_restore_shadow succeeded when >> single shadow in list */ >> + if (r < 0 || (r == 0 && !single_shadow) || tmo <= 0) { >> DRM_ERROR("recover vram bo from shadow failed\n"); >> return -EIO; >> } >> -- >> 2.7.4 >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <03f69f82-1352-8e4e-bd3b-21e5c031ec09-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* RE: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list [not found] ` <03f69f82-1352-8e4e-bd3b-21e5c031ec09-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2019-04-02 7:29 ` Lou, Wentao [not found] ` <MN2PR12MB3280B0482A8DCD4D6499D42F83560-rweVpJHSKToYX/8oZD8ObAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Lou, Wentao @ 2019-04-02 7:29 UTC (permalink / raw) To: Koenig, Christian, Deng, Emily, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Hi Christian, If " tmo = dma_fence_wait_timeout(fence, false, tmo); " was executed inside list_for_each_entry, the value of tmo might be changed. But " tmo = dma_fence_wait_timeout(fence, false, tmo); " might be called after exiting the loop of list_for_each_entry. It might pass a different value of tmo into dma_fence_wait_timeout. BR, Wentao -----Original Message----- From: Christian König <ckoenig.leichtzumerken@gmail.com> Sent: Tuesday, April 2, 2019 3:01 PM To: Deng, Emily <Emily.Deng@amd.com>; Lou, Wentao <Wentao.Lou@amd.com>; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list Yeah, agree that is much closer to a correct solution. But even better would be to correctly update the timeout as well, e.g: tmo = dma_fence_wait_timeout(fence, false, tmo); dma_fence_put(fence); fence = next; if (tmo == 0) r = -ETIMEDOUT; break } else if (tmo < 0) { r = tmo; break; } That we restart the timeout for each wait looks like a rather problematic bug to me as well. Christian. Am 02.04.19 um 06:03 schrieb Deng, Emily: > Maybe it will be better to add follow check, and change “if (r <= 0 || tmo <= 0) " to "if (r <0 || tmo <= 0)". > r = dma_fence_wait_timeout(f, false, timeout); > if (r == 0) { > r = -ETIMEDOUT; > break; > } else if (r < 0) { > break; > } > > Best wishes > Emily Deng > > >> -----Original Message----- >> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of >> wentalou >> Sent: Monday, April 1, 2019 4:59 PM >> To: amd-gfx@lists.freedesktop.org >> Cc: Lou, Wentao <Wentao.Lou@amd.com> >> Subject: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed >> if only one node in shadow_list >> >> amdgpu_bo_restore_shadow would assign zero to r if succeeded. >> r would remain zero if there is only one node in shadow_list. >> current code would always return failure when r <= 0. >> >> Change-Id: Iae6880e7c78b71fde6a6754c69665c2e312a80a5 >> Signed-off-by: Wentao Lou <Wentao.Lou@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index c4c61e9..5cf21a4 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3171,6 +3171,7 @@ static int amdgpu_device_recover_vram(struct >> amdgpu_device *adev) >> struct dma_fence *fence = NULL, *next = NULL; >> struct amdgpu_bo *shadow; >> long r = 1, tmo; >> + bool single_shadow = false; >> >> if (amdgpu_sriov_runtime(adev)) >> tmo = msecs_to_jiffies(8000); >> @@ -3194,10 +3195,12 @@ static int amdgpu_device_recover_vram(struct >> amdgpu_device *adev) >> r = dma_fence_wait_timeout(fence, false, tmo); >> dma_fence_put(fence); >> fence = next; >> + single_shadow = false; >> if (r <= 0) >> break; >> } else { >> fence = next; >> + single_shadow = true; >> } >> } >> mutex_unlock(&adev->shadow_list_lock); >> @@ -3206,7 +3209,8 @@ static int amdgpu_device_recover_vram(struct >> amdgpu_device *adev) >> tmo = dma_fence_wait_timeout(fence, false, tmo); >> dma_fence_put(fence); >> >> - if (r <= 0 || tmo <= 0) { >> + /* r would be zero even if amdgpu_bo_restore_shadow succeeded when >> single shadow in list */ >> + if (r < 0 || (r == 0 && !single_shadow) || tmo <= 0) { >> DRM_ERROR("recover vram bo from shadow failed\n"); >> return -EIO; >> } >> -- >> 2.7.4 >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <MN2PR12MB3280B0482A8DCD4D6499D42F83560-rweVpJHSKToYX/8oZD8ObAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>]
* Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list [not found] ` <MN2PR12MB3280B0482A8DCD4D6499D42F83560-rweVpJHSKToYX/8oZD8ObAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> @ 2019-04-02 7:39 ` Koenig, Christian [not found] ` <d358855c-c37c-7623-cd24-86f4cb2be16f-5C7GfCeVMHo@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Koenig, Christian @ 2019-04-02 7:39 UTC (permalink / raw) To: Lou, Wentao, Deng, Emily, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Yeah, exactly that's what should happen here. The value of tmo SHOULD be changed, otherwise we wait tmo jiffies on each loop. Christian. Am 02.04.19 um 09:29 schrieb Lou, Wentao: > Hi Christian, > > If " tmo = dma_fence_wait_timeout(fence, false, tmo); " was executed inside list_for_each_entry, the value of tmo might be changed. > But " tmo = dma_fence_wait_timeout(fence, false, tmo); " might be called after exiting the loop of list_for_each_entry. > It might pass a different value of tmo into dma_fence_wait_timeout. > > BR, > Wentao > > > -----Original Message----- > From: Christian König <ckoenig.leichtzumerken@gmail.com> > Sent: Tuesday, April 2, 2019 3:01 PM > To: Deng, Emily <Emily.Deng@amd.com>; Lou, Wentao <Wentao.Lou@amd.com>; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list > > Yeah, agree that is much closer to a correct solution. > > But even better would be to correctly update the timeout as well, e.g: > > tmo = dma_fence_wait_timeout(fence, false, tmo); dma_fence_put(fence); fence = next; if (tmo == 0) > r = -ETIMEDOUT; > break > } else if (tmo < 0) { > r = tmo; > break; > } > > That we restart the timeout for each wait looks like a rather problematic bug to me as well. > > Christian. > > Am 02.04.19 um 06:03 schrieb Deng, Emily: >> Maybe it will be better to add follow check, and change “if (r <= 0 || tmo <= 0) " to "if (r <0 || tmo <= 0)". >> r = dma_fence_wait_timeout(f, false, timeout); >> if (r == 0) { >> r = -ETIMEDOUT; >> break; >> } else if (r < 0) { >> break; >> } >> >> Best wishes >> Emily Deng >> >> >>> -----Original Message----- >>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of >>> wentalou >>> Sent: Monday, April 1, 2019 4:59 PM >>> To: amd-gfx@lists.freedesktop.org >>> Cc: Lou, Wentao <Wentao.Lou@amd.com> >>> Subject: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed >>> if only one node in shadow_list >>> >>> amdgpu_bo_restore_shadow would assign zero to r if succeeded. >>> r would remain zero if there is only one node in shadow_list. >>> current code would always return failure when r <= 0. >>> >>> Change-Id: Iae6880e7c78b71fde6a6754c69665c2e312a80a5 >>> Signed-off-by: Wentao Lou <Wentao.Lou@amd.com> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++- >>> 1 file changed, 5 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> index c4c61e9..5cf21a4 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> @@ -3171,6 +3171,7 @@ static int amdgpu_device_recover_vram(struct >>> amdgpu_device *adev) >>> struct dma_fence *fence = NULL, *next = NULL; >>> struct amdgpu_bo *shadow; >>> long r = 1, tmo; >>> + bool single_shadow = false; >>> >>> if (amdgpu_sriov_runtime(adev)) >>> tmo = msecs_to_jiffies(8000); >>> @@ -3194,10 +3195,12 @@ static int amdgpu_device_recover_vram(struct >>> amdgpu_device *adev) >>> r = dma_fence_wait_timeout(fence, false, tmo); >>> dma_fence_put(fence); >>> fence = next; >>> + single_shadow = false; >>> if (r <= 0) >>> break; >>> } else { >>> fence = next; >>> + single_shadow = true; >>> } >>> } >>> mutex_unlock(&adev->shadow_list_lock); >>> @@ -3206,7 +3209,8 @@ static int amdgpu_device_recover_vram(struct >>> amdgpu_device *adev) >>> tmo = dma_fence_wait_timeout(fence, false, tmo); >>> dma_fence_put(fence); >>> >>> - if (r <= 0 || tmo <= 0) { >>> + /* r would be zero even if amdgpu_bo_restore_shadow succeeded when >>> single shadow in list */ >>> + if (r < 0 || (r == 0 && !single_shadow) || tmo <= 0) { >>> DRM_ERROR("recover vram bo from shadow failed\n"); >>> return -EIO; >>> } >>> -- >>> 2.7.4 >>> >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <d358855c-c37c-7623-cd24-86f4cb2be16f-5C7GfCeVMHo@public.gmane.org>]
* RE: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list [not found] ` <d358855c-c37c-7623-cd24-86f4cb2be16f-5C7GfCeVMHo@public.gmane.org> @ 2019-04-02 9:23 ` Lou, Wentao [not found] ` <MN2PR12MB328037EF2991597695B6FB9F83560-rweVpJHSKToYX/8oZD8ObAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Lou, Wentao @ 2019-04-02 9:23 UTC (permalink / raw) To: Koenig, Christian, Deng, Emily, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Thanks Christian. You mean msecs_to_jiffies(8000) should be time to complete whole loop, not each loop. Just sent out another patch for review. Thanks. BR, Wentao -----Original Message----- From: Koenig, Christian <Christian.Koenig@amd.com> Sent: Tuesday, April 2, 2019 3:39 PM To: Lou, Wentao <Wentao.Lou@amd.com>; Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list Yeah, exactly that's what should happen here. The value of tmo SHOULD be changed, otherwise we wait tmo jiffies on each loop. Christian. Am 02.04.19 um 09:29 schrieb Lou, Wentao: > Hi Christian, > > If " tmo = dma_fence_wait_timeout(fence, false, tmo); " was executed inside list_for_each_entry, the value of tmo might be changed. > But " tmo = dma_fence_wait_timeout(fence, false, tmo); " might be called after exiting the loop of list_for_each_entry. > It might pass a different value of tmo into dma_fence_wait_timeout. > > BR, > Wentao > > > -----Original Message----- > From: Christian König <ckoenig.leichtzumerken@gmail.com> > Sent: Tuesday, April 2, 2019 3:01 PM > To: Deng, Emily <Emily.Deng@amd.com>; Lou, Wentao > <Wentao.Lou@amd.com>; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always > failed if only one node in shadow_list > > Yeah, agree that is much closer to a correct solution. > > But even better would be to correctly update the timeout as well, e.g: > > tmo = dma_fence_wait_timeout(fence, false, tmo); dma_fence_put(fence); fence = next; if (tmo == 0) > r = -ETIMEDOUT; > break > } else if (tmo < 0) { > r = tmo; > break; > } > > That we restart the timeout for each wait looks like a rather problematic bug to me as well. > > Christian. > > Am 02.04.19 um 06:03 schrieb Deng, Emily: >> Maybe it will be better to add follow check, and change “if (r <= 0 || tmo <= 0) " to "if (r <0 || tmo <= 0)". >> r = dma_fence_wait_timeout(f, false, timeout); >> if (r == 0) { >> r = -ETIMEDOUT; >> break; >> } else if (r < 0) { >> break; >> } >> >> Best wishes >> Emily Deng >> >> >>> -----Original Message----- >>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of >>> wentalou >>> Sent: Monday, April 1, 2019 4:59 PM >>> To: amd-gfx@lists.freedesktop.org >>> Cc: Lou, Wentao <Wentao.Lou@amd.com> >>> Subject: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always >>> failed if only one node in shadow_list >>> >>> amdgpu_bo_restore_shadow would assign zero to r if succeeded. >>> r would remain zero if there is only one node in shadow_list. >>> current code would always return failure when r <= 0. >>> >>> Change-Id: Iae6880e7c78b71fde6a6754c69665c2e312a80a5 >>> Signed-off-by: Wentao Lou <Wentao.Lou@amd.com> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++- >>> 1 file changed, 5 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> index c4c61e9..5cf21a4 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> @@ -3171,6 +3171,7 @@ static int amdgpu_device_recover_vram(struct >>> amdgpu_device *adev) >>> struct dma_fence *fence = NULL, *next = NULL; >>> struct amdgpu_bo *shadow; >>> long r = 1, tmo; >>> + bool single_shadow = false; >>> >>> if (amdgpu_sriov_runtime(adev)) >>> tmo = msecs_to_jiffies(8000); >>> @@ -3194,10 +3195,12 @@ static int amdgpu_device_recover_vram(struct >>> amdgpu_device *adev) >>> r = dma_fence_wait_timeout(fence, false, tmo); >>> dma_fence_put(fence); >>> fence = next; >>> + single_shadow = false; >>> if (r <= 0) >>> break; >>> } else { >>> fence = next; >>> + single_shadow = true; >>> } >>> } >>> mutex_unlock(&adev->shadow_list_lock); >>> @@ -3206,7 +3209,8 @@ static int amdgpu_device_recover_vram(struct >>> amdgpu_device *adev) >>> tmo = dma_fence_wait_timeout(fence, false, tmo); >>> dma_fence_put(fence); >>> >>> - if (r <= 0 || tmo <= 0) { >>> + /* r would be zero even if amdgpu_bo_restore_shadow succeeded when >>> single shadow in list */ >>> + if (r < 0 || (r == 0 && !single_shadow) || tmo <= 0) { >>> DRM_ERROR("recover vram bo from shadow failed\n"); >>> return -EIO; >>> } >>> -- >>> 2.7.4 >>> >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <MN2PR12MB328037EF2991597695B6FB9F83560-rweVpJHSKToYX/8oZD8ObAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>]
* Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list [not found] ` <MN2PR12MB328037EF2991597695B6FB9F83560-rweVpJHSKToYX/8oZD8ObAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> @ 2019-04-02 10:34 ` Koenig, Christian 0 siblings, 0 replies; 7+ messages in thread From: Koenig, Christian @ 2019-04-02 10:34 UTC (permalink / raw) To: Lou, Wentao, Deng, Emily, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Am 02.04.19 um 11:23 schrieb Lou, Wentao: > Thanks Christian. > You mean msecs_to_jiffies(8000) should be time to complete whole loop, not each loop. Yeah, a normal desktop system can easily have more than 10000 BOs in that list. So the total timeout could be more than a day, which is a bit long :) > Just sent out another patch for review. Going to take a look. Regards, Christian. > Thanks. > > BR, > Wentao > > > -----Original Message----- > From: Koenig, Christian <Christian.Koenig@amd.com> > Sent: Tuesday, April 2, 2019 3:39 PM > To: Lou, Wentao <Wentao.Lou@amd.com>; Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list > > Yeah, exactly that's what should happen here. > > The value of tmo SHOULD be changed, otherwise we wait tmo jiffies on each loop. > > Christian. > > Am 02.04.19 um 09:29 schrieb Lou, Wentao: >> Hi Christian, >> >> If " tmo = dma_fence_wait_timeout(fence, false, tmo); " was executed inside list_for_each_entry, the value of tmo might be changed. >> But " tmo = dma_fence_wait_timeout(fence, false, tmo); " might be called after exiting the loop of list_for_each_entry. >> It might pass a different value of tmo into dma_fence_wait_timeout. >> >> BR, >> Wentao >> >> >> -----Original Message----- >> From: Christian König <ckoenig.leichtzumerken@gmail.com> >> Sent: Tuesday, April 2, 2019 3:01 PM >> To: Deng, Emily <Emily.Deng@amd.com>; Lou, Wentao >> <Wentao.Lou@amd.com>; amd-gfx@lists.freedesktop.org >> Subject: Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always >> failed if only one node in shadow_list >> >> Yeah, agree that is much closer to a correct solution. >> >> But even better would be to correctly update the timeout as well, e.g: >> >> tmo = dma_fence_wait_timeout(fence, false, tmo); dma_fence_put(fence); fence = next; if (tmo == 0) >> r = -ETIMEDOUT; >> break >> } else if (tmo < 0) { >> r = tmo; >> break; >> } >> >> That we restart the timeout for each wait looks like a rather problematic bug to me as well. >> >> Christian. >> >> Am 02.04.19 um 06:03 schrieb Deng, Emily: >>> Maybe it will be better to add follow check, and change “if (r <= 0 || tmo <= 0) " to "if (r <0 || tmo <= 0)". >>> r = dma_fence_wait_timeout(f, false, timeout); >>> if (r == 0) { >>> r = -ETIMEDOUT; >>> break; >>> } else if (r < 0) { >>> break; >>> } >>> >>> Best wishes >>> Emily Deng >>> >>> >>>> -----Original Message----- >>>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of >>>> wentalou >>>> Sent: Monday, April 1, 2019 4:59 PM >>>> To: amd-gfx@lists.freedesktop.org >>>> Cc: Lou, Wentao <Wentao.Lou@amd.com> >>>> Subject: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always >>>> failed if only one node in shadow_list >>>> >>>> amdgpu_bo_restore_shadow would assign zero to r if succeeded. >>>> r would remain zero if there is only one node in shadow_list. >>>> current code would always return failure when r <= 0. >>>> >>>> Change-Id: Iae6880e7c78b71fde6a6754c69665c2e312a80a5 >>>> Signed-off-by: Wentao Lou <Wentao.Lou@amd.com> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++- >>>> 1 file changed, 5 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> index c4c61e9..5cf21a4 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> @@ -3171,6 +3171,7 @@ static int amdgpu_device_recover_vram(struct >>>> amdgpu_device *adev) >>>> struct dma_fence *fence = NULL, *next = NULL; >>>> struct amdgpu_bo *shadow; >>>> long r = 1, tmo; >>>> + bool single_shadow = false; >>>> >>>> if (amdgpu_sriov_runtime(adev)) >>>> tmo = msecs_to_jiffies(8000); >>>> @@ -3194,10 +3195,12 @@ static int amdgpu_device_recover_vram(struct >>>> amdgpu_device *adev) >>>> r = dma_fence_wait_timeout(fence, false, tmo); >>>> dma_fence_put(fence); >>>> fence = next; >>>> + single_shadow = false; >>>> if (r <= 0) >>>> break; >>>> } else { >>>> fence = next; >>>> + single_shadow = true; >>>> } >>>> } >>>> mutex_unlock(&adev->shadow_list_lock); >>>> @@ -3206,7 +3209,8 @@ static int amdgpu_device_recover_vram(struct >>>> amdgpu_device *adev) >>>> tmo = dma_fence_wait_timeout(fence, false, tmo); >>>> dma_fence_put(fence); >>>> >>>> - if (r <= 0 || tmo <= 0) { >>>> + /* r would be zero even if amdgpu_bo_restore_shadow succeeded when >>>> single shadow in list */ >>>> + if (r < 0 || (r == 0 && !single_shadow) || tmo <= 0) { >>>> DRM_ERROR("recover vram bo from shadow failed\n"); >>>> return -EIO; >>>> } >>>> -- >>>> 2.7.4 >>>> >>>> _______________________________________________ >>>> amd-gfx mailing list >>>> amd-gfx@lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-04-02 10:34 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-04-01 8:58 [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list wentalou [not found] ` <1554109125-21890-1-git-send-email-Wentao.Lou-5C7GfCeVMHo@public.gmane.org> 2019-04-02 4:03 ` Deng, Emily [not found] ` <BN7PR12MB26445D172F6617137A8F905C8F560-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> 2019-04-02 7:01 ` Christian König [not found] ` <03f69f82-1352-8e4e-bd3b-21e5c031ec09-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2019-04-02 7:29 ` Lou, Wentao [not found] ` <MN2PR12MB3280B0482A8DCD4D6499D42F83560-rweVpJHSKToYX/8oZD8ObAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> 2019-04-02 7:39 ` Koenig, Christian [not found] ` <d358855c-c37c-7623-cd24-86f4cb2be16f-5C7GfCeVMHo@public.gmane.org> 2019-04-02 9:23 ` Lou, Wentao [not found] ` <MN2PR12MB328037EF2991597695B6FB9F83560-rweVpJHSKToYX/8oZD8ObAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> 2019-04-02 10:34 ` Koenig, Christian
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.