* [PATCH] drm/amdgpu: Fix the dead lock issue. @ 2018-09-11 2:51 Emily Deng [not found] ` <1536634293-26099-1-git-send-email-Emily.Deng-5C7GfCeVMHo@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Emily Deng @ 2018-09-11 2:51 UTC (permalink / raw) To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Emily Deng It will ramdomly have the dead lock issue when test TDR: 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2. amdgpu_bo_create locked the bo's resv lock 3. amdgpu_bo_create_shadow is waiting for the shadow_list_lock 4. amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv lock. v2: Make a local copy of the list Signed-off-by: Emily Deng <Emily.Deng@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 2a21267..8c81404 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3105,6 +3105,9 @@ static int amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) long r = 1; int i = 0; long tmo; + struct list_head local_shadow_list; + + INIT_LIST_HEAD(&local_shadow_list); if (amdgpu_sriov_runtime(adev)) tmo = msecs_to_jiffies(8000); @@ -3112,8 +3115,19 @@ static int amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) tmo = msecs_to_jiffies(100); DRM_INFO("recover vram bo from shadow start\n"); + + mutex_lock(&adev->shadow_list_lock); + list_splice_init(&adev->shadow_list, &local_shadow_list); + mutex_unlock(&adev->shadow_list_lock); + + mutex_lock(&adev->shadow_list_lock); - list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) { + list_for_each_entry_safe(bo, tmp, &local_shadow_list, shadow_list) { + mutex_unlock(&adev->shadow_list_lock); + + if (!bo) + continue; + next = NULL; amdgpu_device_recover_vram_from_shadow(adev, ring, bo, &next); if (fence) { @@ -3132,9 +3146,14 @@ static int amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) dma_fence_put(fence); fence = next; + mutex_lock(&adev->shadow_list_lock); } mutex_unlock(&adev->shadow_list_lock); + mutex_lock(&adev->shadow_list_lock); + list_splice_init(&local_shadow_list, &adev->shadow_list); + mutex_unlock(&adev->shadow_list_lock); + if (fence) { r = dma_fence_wait_timeout(fence, false, tmo); if (r == 0) -- 2.7.4 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 15+ messages in thread
[parent not found: <1536634293-26099-1-git-send-email-Emily.Deng-5C7GfCeVMHo@public.gmane.org>]
* Re: [PATCH] drm/amdgpu: Fix the dead lock issue. [not found] ` <1536634293-26099-1-git-send-email-Emily.Deng-5C7GfCeVMHo@public.gmane.org> @ 2018-09-11 3:01 ` Zhang, Jerry (Junwei) 2018-09-11 3:03 ` zhoucm1 1 sibling, 0 replies; 15+ messages in thread From: Zhang, Jerry (Junwei) @ 2018-09-11 3:01 UTC (permalink / raw) To: Emily Deng, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW On 09/11/2018 10:51 AM, Emily Deng wrote: > It will ramdomly have the dead lock issue when test TDR: > 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock > 2. amdgpu_bo_create locked the bo's resv lock > 3. amdgpu_bo_create_shadow is waiting for the shadow_list_lock > 4. amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv > lock. > > v2: > Make a local copy of the list > > Signed-off-by: Emily Deng <Emily.Deng@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 ++++++++++++++++++++- > 1 file changed, 20 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 2a21267..8c81404 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -3105,6 +3105,9 @@ static int amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) > long r = 1; > int i = 0; > long tmo; > + struct list_head local_shadow_list; > + > + INIT_LIST_HEAD(&local_shadow_list); > > if (amdgpu_sriov_runtime(adev)) > tmo = msecs_to_jiffies(8000); > @@ -3112,8 +3115,19 @@ static int amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) > tmo = msecs_to_jiffies(100); > > DRM_INFO("recover vram bo from shadow start\n"); > + > + mutex_lock(&adev->shadow_list_lock); > + list_splice_init(&adev->shadow_list, &local_shadow_list); > + mutex_unlock(&adev->shadow_list_lock); > + > + > mutex_lock(&adev->shadow_list_lock); > - list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) { > + list_for_each_entry_safe(bo, tmp, &local_shadow_list, shadow_list) { > + mutex_unlock(&adev->shadow_list_lock); We may not to use shadow_list_lock when traverse the local shadow list. Regards, Jerry > + > + if (!bo) > + continue; > + > next = NULL; > amdgpu_device_recover_vram_from_shadow(adev, ring, bo, &next); > if (fence) { > @@ -3132,9 +3146,14 @@ static int amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) > > dma_fence_put(fence); > fence = next; > + mutex_lock(&adev->shadow_list_lock); > } > mutex_unlock(&adev->shadow_list_lock); > > + mutex_lock(&adev->shadow_list_lock); > + list_splice_init(&local_shadow_list, &adev->shadow_list); > + mutex_unlock(&adev->shadow_list_lock); > + > if (fence) { > r = dma_fence_wait_timeout(fence, false, tmo); > if (r == 0) > _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] drm/amdgpu: Fix the dead lock issue. [not found] ` <1536634293-26099-1-git-send-email-Emily.Deng-5C7GfCeVMHo@public.gmane.org> 2018-09-11 3:01 ` Zhang, Jerry (Junwei) @ 2018-09-11 3:03 ` zhoucm1 [not found] ` <bf049e4b-321f-33b4-dad2-708269515e90-5C7GfCeVMHo@public.gmane.org> 1 sibling, 1 reply; 15+ messages in thread From: zhoucm1 @ 2018-09-11 3:03 UTC (permalink / raw) To: Emily Deng, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW On 2018年09月11日 10:51, Emily Deng wrote: > It will ramdomly have the dead lock issue when test TDR: > 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock > 2. amdgpu_bo_create locked the bo's resv lock > 3. amdgpu_bo_create_shadow is waiting for the shadow_list_lock > 4. amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv > lock. > > v2: > Make a local copy of the list > > Signed-off-by: Emily Deng <Emily.Deng@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 ++++++++++++++++++++- > 1 file changed, 20 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 2a21267..8c81404 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -3105,6 +3105,9 @@ static int amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) > long r = 1; > int i = 0; > long tmo; > + struct list_head local_shadow_list; > + > + INIT_LIST_HEAD(&local_shadow_list); > > if (amdgpu_sriov_runtime(adev)) > tmo = msecs_to_jiffies(8000); > @@ -3112,8 +3115,19 @@ static int amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) > tmo = msecs_to_jiffies(100); > > DRM_INFO("recover vram bo from shadow start\n"); > + > + mutex_lock(&adev->shadow_list_lock); > + list_splice_init(&adev->shadow_list, &local_shadow_list); > + mutex_unlock(&adev->shadow_list_lock); > + > + > mutex_lock(&adev->shadow_list_lock); local_shadow_list is a local variable, I think it doesn't need lock at all, no one change it. Otherwise looks good to me. Thanks, David Zhou > - list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) { > + list_for_each_entry_safe(bo, tmp, &local_shadow_list, shadow_list) { > + mutex_unlock(&adev->shadow_list_lock); > + > + if (!bo) > + continue; > + > next = NULL; > amdgpu_device_recover_vram_from_shadow(adev, ring, bo, &next); > if (fence) { > @@ -3132,9 +3146,14 @@ static int amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) > > dma_fence_put(fence); > fence = next; > + mutex_lock(&adev->shadow_list_lock); > } > mutex_unlock(&adev->shadow_list_lock); > > + mutex_lock(&adev->shadow_list_lock); > + list_splice_init(&local_shadow_list, &adev->shadow_list); > + mutex_unlock(&adev->shadow_list_lock); > + > if (fence) { > r = dma_fence_wait_timeout(fence, false, tmo); > if (r == 0) _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <bf049e4b-321f-33b4-dad2-708269515e90-5C7GfCeVMHo@public.gmane.org>]
* RE: [PATCH] drm/amdgpu: Fix the dead lock issue. [not found] ` <bf049e4b-321f-33b4-dad2-708269515e90-5C7GfCeVMHo@public.gmane.org> @ 2018-09-11 3:23 ` Deng, Emily [not found] ` <BN7PR12MB2644CEE78E8E6D658D1A686E8F040-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Deng, Emily @ 2018-09-11 3:23 UTC (permalink / raw) To: Zhou, David(ChunMing), amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW >-----Original Message----- >From: Zhou, David(ChunMing) >Sent: Tuesday, September 11, 2018 11:03 AM >To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org >Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. > > > >On 2018年09月11日 10:51, Emily Deng wrote: >> It will ramdomly have the dead lock issue when test TDR: >> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2. >> amdgpu_bo_create locked the bo's resv lock 3. amdgpu_bo_create_shadow >> is waiting for the shadow_list_lock 4. >> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv >> lock. >> >> v2: >> Make a local copy of the list >> >> Signed-off-by: Emily Deng <Emily.Deng@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 >++++++++++++++++++++- >> 1 file changed, 20 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index 2a21267..8c81404 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3105,6 +3105,9 @@ static int >amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >> long r = 1; >> int i = 0; >> long tmo; >> + struct list_head local_shadow_list; >> + >> + INIT_LIST_HEAD(&local_shadow_list); >> >> if (amdgpu_sriov_runtime(adev)) >> tmo = msecs_to_jiffies(8000); >> @@ -3112,8 +3115,19 @@ static int >amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >> tmo = msecs_to_jiffies(100); >> >> DRM_INFO("recover vram bo from shadow start\n"); >> + >> + mutex_lock(&adev->shadow_list_lock); >> + list_splice_init(&adev->shadow_list, &local_shadow_list); >> + mutex_unlock(&adev->shadow_list_lock); >> + >> + >> mutex_lock(&adev->shadow_list_lock); >local_shadow_list is a local variable, I think it doesn't need lock at all, no one >change it. Otherwise looks good to me. The bo->shadow_list which now is in local_shadow_list maybe destroy in case that it already in amdgpu_bo_destroy, then it will change local_shadow_list, so need lock the shadow_list_lock. Best wishes Emily Deng >Thanks, >David Zhou >> - list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) { >> + list_for_each_entry_safe(bo, tmp, &local_shadow_list, shadow_list) { >> + mutex_unlock(&adev->shadow_list_lock); >> + >> + if (!bo) >> + continue; >> + >> next = NULL; >> amdgpu_device_recover_vram_from_shadow(adev, ring, bo, >&next); >> if (fence) { >> @@ -3132,9 +3146,14 @@ static int >> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >> >> dma_fence_put(fence); >> fence = next; >> + mutex_lock(&adev->shadow_list_lock); >> } >> mutex_unlock(&adev->shadow_list_lock); >> >> + mutex_lock(&adev->shadow_list_lock); >> + list_splice_init(&local_shadow_list, &adev->shadow_list); >> + mutex_unlock(&adev->shadow_list_lock); >> + >> if (fence) { >> r = dma_fence_wait_timeout(fence, false, tmo); >> if (r == 0) _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <BN7PR12MB2644CEE78E8E6D658D1A686E8F040-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>]
* Re: [PATCH] drm/amdgpu: Fix the dead lock issue. [not found] ` <BN7PR12MB2644CEE78E8E6D658D1A686E8F040-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> @ 2018-09-11 3:27 ` zhoucm1 [not found] ` <fcbc7544-d08d-4051-c554-f1f9cecb87e1-5C7GfCeVMHo@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: zhoucm1 @ 2018-09-11 3:27 UTC (permalink / raw) To: Deng, Emily, Zhou, David(ChunMing), amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW On 2018年09月11日 11:23, Deng, Emily wrote: >> -----Original Message----- >> From: Zhou, David(ChunMing) >> Sent: Tuesday, September 11, 2018 11:03 AM >> To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org >> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. >> >> >> >> On 2018年09月11日 10:51, Emily Deng wrote: >>> It will ramdomly have the dead lock issue when test TDR: >>> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2. >>> amdgpu_bo_create locked the bo's resv lock 3. amdgpu_bo_create_shadow >>> is waiting for the shadow_list_lock 4. >>> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv >>> lock. >>> >>> v2: >>> Make a local copy of the list >>> >>> Signed-off-by: Emily Deng <Emily.Deng@amd.com> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 >> ++++++++++++++++++++- >>> 1 file changed, 20 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> index 2a21267..8c81404 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> @@ -3105,6 +3105,9 @@ static int >> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>> long r = 1; >>> int i = 0; >>> long tmo; >>> + struct list_head local_shadow_list; >>> + >>> + INIT_LIST_HEAD(&local_shadow_list); >>> >>> if (amdgpu_sriov_runtime(adev)) >>> tmo = msecs_to_jiffies(8000); >>> @@ -3112,8 +3115,19 @@ static int >> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>> tmo = msecs_to_jiffies(100); >>> >>> DRM_INFO("recover vram bo from shadow start\n"); >>> + >>> + mutex_lock(&adev->shadow_list_lock); >>> + list_splice_init(&adev->shadow_list, &local_shadow_list); >>> + mutex_unlock(&adev->shadow_list_lock); >>> + >>> + >>> mutex_lock(&adev->shadow_list_lock); >> local_shadow_list is a local variable, I think it doesn't need lock at all, no one >> change it. Otherwise looks good to me. > The bo->shadow_list which now is in local_shadow_list maybe destroy in case that it already in amdgpu_bo_destroy, then it will > change local_shadow_list, so need lock the shadow_list_lock. Ah, sorry for noise, I forget you don't reference these BOs. Thanks, David Zhou > Best wishes > Emily Deng >> Thanks, >> David Zhou >>> - list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) { >>> + list_for_each_entry_safe(bo, tmp, &local_shadow_list, shadow_list) { >>> + mutex_unlock(&adev->shadow_list_lock); >>> + >>> + if (!bo) >>> + continue; >>> + >>> next = NULL; >>> amdgpu_device_recover_vram_from_shadow(adev, ring, bo, >> &next); >>> if (fence) { >>> @@ -3132,9 +3146,14 @@ static int >>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>> >>> dma_fence_put(fence); >>> fence = next; >>> + mutex_lock(&adev->shadow_list_lock); >>> } >>> mutex_unlock(&adev->shadow_list_lock); >>> >>> + mutex_lock(&adev->shadow_list_lock); >>> + list_splice_init(&local_shadow_list, &adev->shadow_list); >>> + mutex_unlock(&adev->shadow_list_lock); >>> + >>> if (fence) { >>> r = dma_fence_wait_timeout(fence, false, tmo); >>> if (r == 0) _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <fcbc7544-d08d-4051-c554-f1f9cecb87e1-5C7GfCeVMHo@public.gmane.org>]
* RE: [PATCH] drm/amdgpu: Fix the dead lock issue. [not found] ` <fcbc7544-d08d-4051-c554-f1f9cecb87e1-5C7GfCeVMHo@public.gmane.org> @ 2018-09-11 3:32 ` Deng, Emily [not found] ` <BN7PR12MB2644E374B512DC7EF10CA6D08F040-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Deng, Emily @ 2018-09-11 3:32 UTC (permalink / raw) To: Zhou, David(ChunMing) >-----Original Message----- >From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of >zhoucm1 >Sent: Tuesday, September 11, 2018 11:28 AM >To: Deng, Emily <Emily.Deng@amd.com>; Zhou, David(ChunMing) ><David1.Zhou@amd.com>; amd-gfx@lists.freedesktop.org >Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. > > > >On 2018年09月11日 11:23, Deng, Emily wrote: >>> -----Original Message----- >>> From: Zhou, David(ChunMing) >>> Sent: Tuesday, September 11, 2018 11:03 AM >>> To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org >>> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. >>> >>> >>> >>> On 2018年09月11日 10:51, Emily Deng wrote: >>>> It will ramdomly have the dead lock issue when test TDR: >>>> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2. >>>> amdgpu_bo_create locked the bo's resv lock 3. >>>> amdgpu_bo_create_shadow is waiting for the shadow_list_lock 4. >>>> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv >>>> lock. >>>> >>>> v2: >>>> Make a local copy of the list >>>> >>>> Signed-off-by: Emily Deng <Emily.Deng@amd.com> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 >>> ++++++++++++++++++++- >>>> 1 file changed, 20 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> index 2a21267..8c81404 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> @@ -3105,6 +3105,9 @@ static int >>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>>> long r = 1; >>>> int i = 0; >>>> long tmo; >>>> + struct list_head local_shadow_list; >>>> + >>>> + INIT_LIST_HEAD(&local_shadow_list); >>>> >>>> if (amdgpu_sriov_runtime(adev)) >>>> tmo = msecs_to_jiffies(8000); >>>> @@ -3112,8 +3115,19 @@ static int >>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>>> tmo = msecs_to_jiffies(100); >>>> >>>> DRM_INFO("recover vram bo from shadow start\n"); >>>> + >>>> + mutex_lock(&adev->shadow_list_lock); >>>> + list_splice_init(&adev->shadow_list, &local_shadow_list); >>>> + mutex_unlock(&adev->shadow_list_lock); >>>> + >>>> + >>>> mutex_lock(&adev->shadow_list_lock); >>> local_shadow_list is a local variable, I think it doesn't need lock >>> at all, no one change it. Otherwise looks good to me. >> The bo->shadow_list which now is in local_shadow_list maybe destroy in >> case that it already in amdgpu_bo_destroy, then it will change >local_shadow_list, so need lock the shadow_list_lock. >Ah, sorry for noise, I forget you don't reference these BOs. Yes, I don't reference these Bos, as I found even reference these Bos, it still couldn't avoid the case that another process is already in amdgpu_bo_destroy. > >Thanks, >David Zhou >> Best wishes >> Emily Deng >>> Thanks, >>> David Zhou >>>> - list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) { >>>> + list_for_each_entry_safe(bo, tmp, &local_shadow_list, shadow_list) { >>>> + mutex_unlock(&adev->shadow_list_lock); >>>> + >>>> + if (!bo) >>>> + continue; >>>> + >>>> next = NULL; >>>> amdgpu_device_recover_vram_from_shadow(adev, ring, bo, >>> &next); >>>> if (fence) { >>>> @@ -3132,9 +3146,14 @@ static int >>>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>>> >>>> dma_fence_put(fence); >>>> fence = next; >>>> + mutex_lock(&adev->shadow_list_lock); >>>> } >>>> mutex_unlock(&adev->shadow_list_lock); >>>> >>>> + mutex_lock(&adev->shadow_list_lock); >>>> + list_splice_init(&local_shadow_list, &adev->shadow_list); >>>> + mutex_unlock(&adev->shadow_list_lock); >>>> + >>>> if (fence) { >>>> r = dma_fence_wait_timeout(fence, false, tmo); >>>> if (r == 0) > >_______________________________________________ >amd-gfx mailing list >amd-gfx@lists.freedesktop.org >https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <BN7PR12MB2644E374B512DC7EF10CA6D08F040-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>]
* Re: [PATCH] drm/amdgpu: Fix the dead lock issue. [not found] ` <BN7PR12MB2644E374B512DC7EF10CA6D08F040-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> @ 2018-09-11 3:37 ` zhoucm1 [not found] ` <be67e50d-f161-c7da-256e-149ed1b56f26-5C7GfCeVMHo@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: zhoucm1 @ 2018-09-11 3:37 UTC (permalink / raw) To: Deng, Emily, Zhou, David(ChunMing), amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW On 2018年09月11日 11:32, Deng, Emily wrote: >> -----Original Message----- >> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of >> zhoucm1 >> Sent: Tuesday, September 11, 2018 11:28 AM >> To: Deng, Emily <Emily.Deng@amd.com>; Zhou, David(ChunMing) >> <David1.Zhou@amd.com>; amd-gfx@lists.freedesktop.org >> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. >> >> >> >> On 2018年09月11日 11:23, Deng, Emily wrote: >>>> -----Original Message----- >>>> From: Zhou, David(ChunMing) >>>> Sent: Tuesday, September 11, 2018 11:03 AM >>>> To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org >>>> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. >>>> >>>> >>>> >>>> On 2018年09月11日 10:51, Emily Deng wrote: >>>>> It will ramdomly have the dead lock issue when test TDR: >>>>> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2. >>>>> amdgpu_bo_create locked the bo's resv lock 3. >>>>> amdgpu_bo_create_shadow is waiting for the shadow_list_lock 4. >>>>> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv >>>>> lock. >>>>> >>>>> v2: >>>>> Make a local copy of the list >>>>> >>>>> Signed-off-by: Emily Deng <Emily.Deng@amd.com> >>>>> --- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 >>>> ++++++++++++++++++++- >>>>> 1 file changed, 20 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>> index 2a21267..8c81404 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>> @@ -3105,6 +3105,9 @@ static int >>>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>>>> long r = 1; >>>>> int i = 0; >>>>> long tmo; >>>>> + struct list_head local_shadow_list; >>>>> + >>>>> + INIT_LIST_HEAD(&local_shadow_list); >>>>> >>>>> if (amdgpu_sriov_runtime(adev)) >>>>> tmo = msecs_to_jiffies(8000); >>>>> @@ -3112,8 +3115,19 @@ static int >>>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>>>> tmo = msecs_to_jiffies(100); >>>>> >>>>> DRM_INFO("recover vram bo from shadow start\n"); >>>>> + >>>>> + mutex_lock(&adev->shadow_list_lock); >>>>> + list_splice_init(&adev->shadow_list, &local_shadow_list); >>>>> + mutex_unlock(&adev->shadow_list_lock); >>>>> + >>>>> + >>>>> mutex_lock(&adev->shadow_list_lock); >>>> local_shadow_list is a local variable, I think it doesn't need lock >>>> at all, no one change it. Otherwise looks good to me. >>> The bo->shadow_list which now is in local_shadow_list maybe destroy in >>> case that it already in amdgpu_bo_destroy, then it will change >> local_shadow_list, so need lock the shadow_list_lock. >> Ah, sorry for noise, I forget you don't reference these BOs. > Yes, I don't reference these Bos, as I found even reference these Bos, it still couldn't avoid the case that another process is already > in amdgpu_bo_destroy. ??? that shouldn't happen, the reference is belonged to list. But back to here, we don't need reference them. And since no shadow BO is added to local after splice, we'd better to use list_next_entry to iterate the local shadow list instead of list_for_each_entry_safe. Thanks, David Zhou >> Thanks, >> David Zhou >>> Best wishes >>> Emily Deng >>>> Thanks, >>>> David Zhou >>>>> - list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) { >>>>> + list_for_each_entry_safe(bo, tmp, &local_shadow_list, shadow_list) { >>>>> + mutex_unlock(&adev->shadow_list_lock); >>>>> + >>>>> + if (!bo) >>>>> + continue; >>>>> + >>>>> next = NULL; >>>>> amdgpu_device_recover_vram_from_shadow(adev, ring, bo, >>>> &next); >>>>> if (fence) { >>>>> @@ -3132,9 +3146,14 @@ static int >>>>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>>>> >>>>> dma_fence_put(fence); >>>>> fence = next; >>>>> + mutex_lock(&adev->shadow_list_lock); >>>>> } >>>>> mutex_unlock(&adev->shadow_list_lock); >>>>> >>>>> + mutex_lock(&adev->shadow_list_lock); >>>>> + list_splice_init(&local_shadow_list, &adev->shadow_list); >>>>> + mutex_unlock(&adev->shadow_list_lock); >>>>> + >>>>> if (fence) { >>>>> r = dma_fence_wait_timeout(fence, false, tmo); >>>>> if (r == 0) >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <be67e50d-f161-c7da-256e-149ed1b56f26-5C7GfCeVMHo@public.gmane.org>]
* Re: [PATCH] drm/amdgpu: Fix the dead lock issue. [not found] ` <be67e50d-f161-c7da-256e-149ed1b56f26-5C7GfCeVMHo@public.gmane.org> @ 2018-09-11 5:41 ` zhoucm1 [not found] ` <83f7a45f-1aee-a2a8-bc82-f3433157c6cb-5C7GfCeVMHo@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: zhoucm1 @ 2018-09-11 5:41 UTC (permalink / raw) To: Deng, Emily, Zhou, David(ChunMing), amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW [-- Attachment #1: Type: text/plain, Size: 5581 bytes --] On 2018年09月11日 11:37, zhoucm1 wrote: > > > On 2018年09月11日 11:32, Deng, Emily wrote: >>> -----Original Message----- >>> From: amd-gfx <amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org> On Behalf Of >>> zhoucm1 >>> Sent: Tuesday, September 11, 2018 11:28 AM >>> To: Deng, Emily <Emily.Deng-5C7GfCeVMHo@public.gmane.org>; Zhou, David(ChunMing) >>> <David1.Zhou-5C7GfCeVMHo@public.gmane.org>; amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org >>> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. >>> >>> >>> >>> On 2018年09月11日 11:23, Deng, Emily wrote: >>>>> -----Original Message----- >>>>> From: Zhou, David(ChunMing) >>>>> Sent: Tuesday, September 11, 2018 11:03 AM >>>>> To: Deng, Emily <Emily.Deng-5C7GfCeVMHo@public.gmane.org>; amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org >>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. >>>>> >>>>> >>>>> >>>>> On 2018年09月11日 10:51, Emily Deng wrote: >>>>>> It will ramdomly have the dead lock issue when test TDR: >>>>>> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2. >>>>>> amdgpu_bo_create locked the bo's resv lock 3. >>>>>> amdgpu_bo_create_shadow is waiting for the shadow_list_lock 4. >>>>>> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv >>>>>> lock. >>>>>> >>>>>> v2: >>>>>> Make a local copy of the list >>>>>> >>>>>> Signed-off-by: Emily Deng <Emily.Deng-5C7GfCeVMHo@public.gmane.org> >>>>>> --- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 >>>>> ++++++++++++++++++++- >>>>>> 1 file changed, 20 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>> index 2a21267..8c81404 100644 >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>> @@ -3105,6 +3105,9 @@ static int >>>>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>>>>> long r = 1; >>>>>> int i = 0; >>>>>> long tmo; >>>>>> + struct list_head local_shadow_list; >>>>>> + >>>>>> + INIT_LIST_HEAD(&local_shadow_list); >>>>>> >>>>>> if (amdgpu_sriov_runtime(adev)) >>>>>> tmo = msecs_to_jiffies(8000); >>>>>> @@ -3112,8 +3115,19 @@ static int >>>>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>>>>> tmo = msecs_to_jiffies(100); >>>>>> >>>>>> DRM_INFO("recover vram bo from shadow start\n"); >>>>>> + >>>>>> + mutex_lock(&adev->shadow_list_lock); >>>>>> + list_splice_init(&adev->shadow_list, &local_shadow_list); >>>>>> + mutex_unlock(&adev->shadow_list_lock); >>>>>> + >>>>>> + >>>>>> mutex_lock(&adev->shadow_list_lock); >>>>> local_shadow_list is a local variable, I think it doesn't need lock >>>>> at all, no one change it. Otherwise looks good to me. >>>> The bo->shadow_list which now is in local_shadow_list maybe destroy in >>>> case that it already in amdgpu_bo_destroy, then it will change >>> local_shadow_list, so need lock the shadow_list_lock. >>> Ah, sorry for noise, I forget you don't reference these BOs. >> Yes, I don't reference these Bos, as I found even reference these >> Bos, it still couldn't avoid the case that another process is already >> in amdgpu_bo_destroy. > ??? that shouldn't happen, the reference is belonged to list. But back > to here, we don't need reference them. > And since no shadow BO is added to local after splice, we'd better to > use list_next_entry to iterate the local shadow list instead of > list_for_each_entry_safe. > > Thanks, > David Zhou >>> Thanks, >>> David Zhou >>>> Best wishes >>>> Emily Deng >>>>> Thanks, >>>>> David Zhou >>>>>> - list_for_each_entry_safe(bo, tmp, &adev->shadow_list, >>>>>> shadow_list) { >>>>>> + list_for_each_entry_safe(bo, tmp, &local_shadow_list, >>>>>> shadow_list) { because shadow list doesn't take bo reference, we should give a amdgpu_bo_ref(bo) with attached patch before unlock. You can have a try. Thanks, David Zhou >>>>>> + mutex_unlock(&adev->shadow_list_lock); >>>>>> + >>>>>> + if (!bo) >>>>>> + continue; >>>>>> + >>>>>> next = NULL; >>>>>> amdgpu_device_recover_vram_from_shadow(adev, ring, bo, >>>>> &next); >>>>>> if (fence) { >>>>>> @@ -3132,9 +3146,14 @@ static int >>>>>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>>>>> >>>>>> dma_fence_put(fence); >>>>>> fence = next; >>>>>> + mutex_lock(&adev->shadow_list_lock); >>>>>> } >>>>>> mutex_unlock(&adev->shadow_list_lock); >>>>>> >>>>>> + mutex_lock(&adev->shadow_list_lock); >>>>>> + list_splice_init(&local_shadow_list, &adev->shadow_list); >>>>>> + mutex_unlock(&adev->shadow_list_lock); >>>>>> + >>>>>> if (fence) { >>>>>> r = dma_fence_wait_timeout(fence, false, tmo); >>>>>> if (r == 0) >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > > _______________________________________________ > amd-gfx mailing list > amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx [-- Attachment #2: 0001-drm-amdgpu-changing-of-shadow-bo-reference-should-ta.patch --] [-- Type: text/x-patch, Size: 1694 bytes --] >From 7676fc70738699dddca5627e21be0d82e91aea05 Mon Sep 17 00:00:00 2001 From: Chunming Zhou <david1.zhou-5C7GfCeVMHo@public.gmane.org> Date: Tue, 11 Sep 2018 13:37:31 +0800 Subject: [PATCH] drm/amdgpu: changing of shadow bo reference should take shadow lock because shadow list doesn't reference shadow bo. Signed-off-by: Chunming Zhou <david1.zhou-5C7GfCeVMHo@public.gmane.org> --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index b0e14a3d54ef..50651157203b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -101,11 +101,9 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo) drm_prime_gem_destroy(&bo->gem_base, bo->tbo.sg); drm_gem_object_release(&bo->gem_base); amdgpu_bo_unref(&bo->parent); - if (!list_empty(&bo->shadow_list)) { - mutex_lock(&adev->shadow_list_lock); + if (!list_empty(&bo->shadow_list)) list_del_init(&bo->shadow_list); - mutex_unlock(&adev->shadow_list_lock); - } + kfree(bo->metadata); kfree(bo); } @@ -838,13 +836,21 @@ struct amdgpu_bo *amdgpu_bo_ref(struct amdgpu_bo *bo) void amdgpu_bo_unref(struct amdgpu_bo **bo) { struct ttm_buffer_object *tbo; + bool locked = false; if ((*bo) == NULL) return; + if (bo->parent && bo->parent->shadow == bo && + !list_empty(&bo->parent->shadow_list)) { + mutex_lock(&adev->shadow_list_lock); + locked = true; + } tbo = &((*bo)->tbo); ttm_bo_put(tbo); *bo = NULL; + if (locked) + mutex_unlock(&adev->shadow_list_lock); } /** -- 2.17.1 [-- Attachment #3: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 15+ messages in thread
[parent not found: <83f7a45f-1aee-a2a8-bc82-f3433157c6cb-5C7GfCeVMHo@public.gmane.org>]
* Re: [PATCH] drm/amdgpu: Fix the dead lock issue. [not found] ` <83f7a45f-1aee-a2a8-bc82-f3433157c6cb-5C7GfCeVMHo@public.gmane.org> @ 2018-09-11 6:40 ` Christian König [not found] ` <c67be83d-aecd-a291-a564-15f1fe501680-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Christian König @ 2018-09-11 6:40 UTC (permalink / raw) To: zhoucm1, Deng, Emily, Zhou, David(ChunMing), amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW [-- Attachment #1.1: Type: text/plain, Size: 6337 bytes --] That won't work correctly. The TTM BO is unreferenced in a couple of more places which we don't have control over. To make it even worse we actually can't take the reservation lock during GPU reset because the reservation object might already be destroyed when we remove the BO from the list. I will take a look at this myself today to find a solution which should work. Regards, Christian. Am 11.09.2018 um 07:41 schrieb zhoucm1: > > > On 2018年09月11日 11:37, zhoucm1 wrote: >> >> >> On 2018年09月11日 11:32, Deng, Emily wrote: >>>> -----Original Message----- >>>> From: amd-gfx <amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org> On Behalf Of >>>> zhoucm1 >>>> Sent: Tuesday, September 11, 2018 11:28 AM >>>> To: Deng, Emily <Emily.Deng-5C7GfCeVMHo@public.gmane.org>; Zhou, David(ChunMing) >>>> <David1.Zhou-5C7GfCeVMHo@public.gmane.org>; amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org >>>> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. >>>> >>>> >>>> >>>> On 2018年09月11日 11:23, Deng, Emily wrote: >>>>>> -----Original Message----- >>>>>> From: Zhou, David(ChunMing) >>>>>> Sent: Tuesday, September 11, 2018 11:03 AM >>>>>> To: Deng, Emily <Emily.Deng-5C7GfCeVMHo@public.gmane.org>; amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org >>>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. >>>>>> >>>>>> >>>>>> >>>>>> On 2018年09月11日 10:51, Emily Deng wrote: >>>>>>> It will ramdomly have the dead lock issue when test TDR: >>>>>>> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2. >>>>>>> amdgpu_bo_create locked the bo's resv lock 3. >>>>>>> amdgpu_bo_create_shadow is waiting for the shadow_list_lock 4. >>>>>>> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv >>>>>>> lock. >>>>>>> >>>>>>> v2: >>>>>>> Make a local copy of the list >>>>>>> >>>>>>> Signed-off-by: Emily Deng <Emily.Deng-5C7GfCeVMHo@public.gmane.org> >>>>>>> --- >>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 >>>>>> ++++++++++++++++++++- >>>>>>> 1 file changed, 20 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>> index 2a21267..8c81404 100644 >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>> @@ -3105,6 +3105,9 @@ static int >>>>>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>>>>>> long r = 1; >>>>>>> int i = 0; >>>>>>> long tmo; >>>>>>> + struct list_head local_shadow_list; >>>>>>> + >>>>>>> + INIT_LIST_HEAD(&local_shadow_list); >>>>>>> >>>>>>> if (amdgpu_sriov_runtime(adev)) >>>>>>> tmo = msecs_to_jiffies(8000); >>>>>>> @@ -3112,8 +3115,19 @@ static int >>>>>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>>>>>> tmo = msecs_to_jiffies(100); >>>>>>> >>>>>>> DRM_INFO("recover vram bo from shadow start\n"); >>>>>>> + >>>>>>> + mutex_lock(&adev->shadow_list_lock); >>>>>>> + list_splice_init(&adev->shadow_list, &local_shadow_list); >>>>>>> + mutex_unlock(&adev->shadow_list_lock); >>>>>>> + >>>>>>> + >>>>>>> mutex_lock(&adev->shadow_list_lock); >>>>>> local_shadow_list is a local variable, I think it doesn't need lock >>>>>> at all, no one change it. Otherwise looks good to me. >>>>> The bo->shadow_list which now is in local_shadow_list maybe >>>>> destroy in >>>>> case that it already in amdgpu_bo_destroy, then it will change >>>> local_shadow_list, so need lock the shadow_list_lock. >>>> Ah, sorry for noise, I forget you don't reference these BOs. >>> Yes, I don't reference these Bos, as I found even reference these >>> Bos, it still couldn't avoid the case that another process is already >>> in amdgpu_bo_destroy. >> ??? that shouldn't happen, the reference is belonged to list. But >> back to here, we don't need reference them. >> And since no shadow BO is added to local after splice, we'd better to >> use list_next_entry to iterate the local shadow list instead of >> list_for_each_entry_safe. >> >> Thanks, >> David Zhou >>>> Thanks, >>>> David Zhou >>>>> Best wishes >>>>> Emily Deng >>>>>> Thanks, >>>>>> David Zhou >>>>>>> - list_for_each_entry_safe(bo, tmp, &adev->shadow_list, >>>>>>> shadow_list) { >>>>>>> + list_for_each_entry_safe(bo, tmp, &local_shadow_list, >>>>>>> shadow_list) { > because shadow list doesn't take bo reference, we should give a > amdgpu_bo_ref(bo) with attached patch before unlock. > You can have a try. > > Thanks, > David Zhou >>>>>>> + mutex_unlock(&adev->shadow_list_lock); >>>>>>> + >>>>>>> + if (!bo) >>>>>>> + continue; >>>>>>> + >>>>>>> next = NULL; >>>>>>> amdgpu_device_recover_vram_from_shadow(adev, ring, bo, >>>>>> &next); >>>>>>> if (fence) { >>>>>>> @@ -3132,9 +3146,14 @@ static int >>>>>>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>>>>>> >>>>>>> dma_fence_put(fence); >>>>>>> fence = next; >>>>>>> + mutex_lock(&adev->shadow_list_lock); >>>>>>> } >>>>>>> mutex_unlock(&adev->shadow_list_lock); >>>>>>> >>>>>>> + mutex_lock(&adev->shadow_list_lock); >>>>>>> + list_splice_init(&local_shadow_list, &adev->shadow_list); >>>>>>> + mutex_unlock(&adev->shadow_list_lock); >>>>>>> + >>>>>>> if (fence) { >>>>>>> r = dma_fence_wait_timeout(fence, false, tmo); >>>>>>> if (r == 0) >>>> _______________________________________________ >>>> amd-gfx mailing list >>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > > > > _______________________________________________ > amd-gfx mailing list > amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx [-- Attachment #1.2: Type: text/html, Size: 13384 bytes --] [-- Attachment #2: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <c67be83d-aecd-a291-a564-15f1fe501680-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* RE: [PATCH] drm/amdgpu: Fix the dead lock issue. [not found] ` <c67be83d-aecd-a291-a564-15f1fe501680-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2018-09-11 6:55 ` Deng, Emily 0 siblings, 0 replies; 15+ messages in thread From: Deng, Emily @ 2018-09-11 6:55 UTC (permalink / raw) To: Koenig, Christian, Zhou, David(ChunMing), amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW [-- Attachment #1.1: Type: text/plain, Size: 5916 bytes --] From: Christian König <ckoenig.leichtzumerken@gmail.com> Sent: Tuesday, September 11, 2018 2:40 PM To: Zhou, David(ChunMing) <David1.Zhou@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com>; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. That won't work correctly. The TTM BO is unreferenced in a couple of more places which we don't have control over. To make it even worse we actually can't take the reservation lock during GPU reset because the reservation object might already be destroyed when we remove the BO from the list. I will take a look at this myself today to find a solution which should work. Ok, thanks very much. Best wishes Emily Deng Regards, Christian. Am 11.09.2018 um 07:41 schrieb zhoucm1: On 2018年09月11日 11:37, zhoucm1 wrote: On 2018年09月11日 11:32, Deng, Emily wrote: -----Original Message----- From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org><mailto:amd-gfx-bounces@lists.freedesktop.org> On Behalf Of zhoucm1 Sent: Tuesday, September 11, 2018 11:28 AM To: Deng, Emily <Emily.Deng@amd.com><mailto:Emily.Deng@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com><mailto:David1.Zhou@amd.com>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. On 2018年09月11日 11:23, Deng, Emily wrote: -----Original Message----- From: Zhou, David(ChunMing) Sent: Tuesday, September 11, 2018 11:03 AM To: Deng, Emily <Emily.Deng@amd.com><mailto:Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. On 2018年09月11日 10:51, Emily Deng wrote: It will ramdomly have the dead lock issue when test TDR: 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2. amdgpu_bo_create locked the bo's resv lock 3. amdgpu_bo_create_shadow is waiting for the shadow_list_lock 4. amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv lock. v2: Make a local copy of the list Signed-off-by: Emily Deng <Emily.Deng@amd.com><mailto:Emily.Deng@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 2a21267..8c81404 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3105,6 +3105,9 @@ static int amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) long r = 1; int i = 0; long tmo; + struct list_head local_shadow_list; + + INIT_LIST_HEAD(&local_shadow_list); if (amdgpu_sriov_runtime(adev)) tmo = msecs_to_jiffies(8000); @@ -3112,8 +3115,19 @@ static int amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) tmo = msecs_to_jiffies(100); DRM_INFO("recover vram bo from shadow start\n"); + + mutex_lock(&adev->shadow_list_lock); + list_splice_init(&adev->shadow_list, &local_shadow_list); + mutex_unlock(&adev->shadow_list_lock); + + mutex_lock(&adev->shadow_list_lock); local_shadow_list is a local variable, I think it doesn't need lock at all, no one change it. Otherwise looks good to me. The bo->shadow_list which now is in local_shadow_list maybe destroy in case that it already in amdgpu_bo_destroy, then it will change local_shadow_list, so need lock the shadow_list_lock. Ah, sorry for noise, I forget you don't reference these BOs. Yes, I don't reference these Bos, as I found even reference these Bos, it still couldn't avoid the case that another process is already in amdgpu_bo_destroy. ??? that shouldn't happen, the reference is belonged to list. But back to here, we don't need reference them. And since no shadow BO is added to local after splice, we'd better to use list_next_entry to iterate the local shadow list instead of list_for_each_entry_safe. Thanks, David Zhou Thanks, David Zhou Best wishes Emily Deng Thanks, David Zhou - list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) { + list_for_each_entry_safe(bo, tmp, &local_shadow_list, shadow_list) { because shadow list doesn't take bo reference, we should give a amdgpu_bo_ref(bo) with attached patch before unlock. You can have a try. Thanks, David Zhou + mutex_unlock(&adev->shadow_list_lock); + + if (!bo) + continue; + next = NULL; amdgpu_device_recover_vram_from_shadow(adev, ring, bo, &next); if (fence) { @@ -3132,9 +3146,14 @@ static int amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) dma_fence_put(fence); fence = next; + mutex_lock(&adev->shadow_list_lock); } mutex_unlock(&adev->shadow_list_lock); + mutex_lock(&adev->shadow_list_lock); + list_splice_init(&local_shadow_list, &adev->shadow_list); + mutex_unlock(&adev->shadow_list_lock); + if (fence) { r = dma_fence_wait_timeout(fence, false, tmo); if (r == 0) _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/amd-gfx [-- Attachment #1.2: Type: text/html, Size: 15557 bytes --] [-- Attachment #2: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH] drm/amdgpu: Fix the dead lock issue. @ 2018-09-10 4:07 Emily Deng [not found] ` <1536552453-18595-1-git-send-email-Emily.Deng-5C7GfCeVMHo@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Emily Deng @ 2018-09-10 4:07 UTC (permalink / raw) To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Emily Deng It will ramdomly have the dead lock issue when test TDR: 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2. amdgpu_bo_create locked the bo's resv lock 3. amdgpu_bo_create_shadow is waiting for the shadow_list_lock 4. amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv lock. Signed-off-by: Emily Deng <Emily.Deng@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index de990bd..c75447d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -557,12 +557,8 @@ static int amdgpu_bo_create_shadow(struct amdgpu_device *adev, bp.resv = bo->tbo.resv; r = amdgpu_bo_do_create(adev, &bp, &bo->shadow); - if (!r) { + if (!r) bo->shadow->parent = amdgpu_bo_ref(bo); - mutex_lock(&adev->shadow_list_lock); - list_add_tail(&bo->shadow_list, &adev->shadow_list); - mutex_unlock(&adev->shadow_list_lock); - } return r; } @@ -603,6 +599,12 @@ int amdgpu_bo_create(struct amdgpu_device *adev, if (!bp->resv) reservation_object_unlock((*bo_ptr)->tbo.resv); + if (!r) { + mutex_lock(&adev->shadow_list_lock); + list_add_tail(&(*bo_ptr)->shadow_list, &adev->shadow_list); + mutex_unlock(&adev->shadow_list_lock); + } + if (r) amdgpu_bo_unref(bo_ptr); } -- 2.7.4 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 15+ messages in thread
[parent not found: <1536552453-18595-1-git-send-email-Emily.Deng-5C7GfCeVMHo@public.gmane.org>]
* Re: [PATCH] drm/amdgpu: Fix the dead lock issue. [not found] ` <1536552453-18595-1-git-send-email-Emily.Deng-5C7GfCeVMHo@public.gmane.org> @ 2018-09-10 7:06 ` Christian König [not found] ` <fe055a6c-0859-7404-6236-2c22eb8e3df5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Christian König @ 2018-09-10 7:06 UTC (permalink / raw) To: Emily Deng, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Am 10.09.2018 um 06:07 schrieb Emily Deng: > It will ramdomly have the dead lock issue when test TDR: > 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock > 2. amdgpu_bo_create locked the bo's resv lock > 3. amdgpu_bo_create_shadow is waiting for the shadow_list_lock > 4. amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv > lock. > > Signed-off-by: Emily Deng <Emily.Deng@amd.com> Good catch, problem is the fix doesn't work like this because the lock won't be dropped at this point in most cases. Instead you need to fix amdgpu_device_recover_vram_from_shadow to make a local copy of the list. You can do this by grabbing a reference to each BO and moving it to a local list. Regards, Christian. > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > index de990bd..c75447d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > @@ -557,12 +557,8 @@ static int amdgpu_bo_create_shadow(struct amdgpu_device *adev, > bp.resv = bo->tbo.resv; > > r = amdgpu_bo_do_create(adev, &bp, &bo->shadow); > - if (!r) { > + if (!r) > bo->shadow->parent = amdgpu_bo_ref(bo); > - mutex_lock(&adev->shadow_list_lock); > - list_add_tail(&bo->shadow_list, &adev->shadow_list); > - mutex_unlock(&adev->shadow_list_lock); > - } > > return r; > } > @@ -603,6 +599,12 @@ int amdgpu_bo_create(struct amdgpu_device *adev, > if (!bp->resv) > reservation_object_unlock((*bo_ptr)->tbo.resv); > > + if (!r) { > + mutex_lock(&adev->shadow_list_lock); > + list_add_tail(&(*bo_ptr)->shadow_list, &adev->shadow_list); > + mutex_unlock(&adev->shadow_list_lock); > + } > + > if (r) > amdgpu_bo_unref(bo_ptr); > } _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <fe055a6c-0859-7404-6236-2c22eb8e3df5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* RE: [PATCH] drm/amdgpu: Fix the dead lock issue. [not found] ` <fe055a6c-0859-7404-6236-2c22eb8e3df5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2018-09-10 7:19 ` Deng, Emily [not found] ` <BN7PR12MB2644EBCEAD5A7EAD4700FDD28F050-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Deng, Emily @ 2018-09-10 7:19 UTC (permalink / raw) To: Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW >-----Original Message----- >From: Christian König <ckoenig.leichtzumerken@gmail.com> >Sent: Monday, September 10, 2018 3:06 PM >To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org >Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. > >Am 10.09.2018 um 06:07 schrieb Emily Deng: >> It will ramdomly have the dead lock issue when test TDR: >> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2. >> amdgpu_bo_create locked the bo's resv lock 3. amdgpu_bo_create_shadow >> is waiting for the shadow_list_lock 4. >> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv >> lock. >> >> Signed-off-by: Emily Deng <Emily.Deng@amd.com> > >Good catch, problem is the fix doesn't work like this because the lock won't be >dropped at this point in most cases. Sorry, could you explain more, why the lock won't be dropped after calling reservation_object_unlock? > >Instead you need to fix amdgpu_device_recover_vram_from_shadow to make >a local copy of the list. > >You can do this by grabbing a reference to each BO and moving it to a local >list. > >Regards, >Christian. > >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 12 +++++++----- >> 1 file changed, 7 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >> index de990bd..c75447d 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >> @@ -557,12 +557,8 @@ static int amdgpu_bo_create_shadow(struct >amdgpu_device *adev, >> bp.resv = bo->tbo.resv; >> >> r = amdgpu_bo_do_create(adev, &bp, &bo->shadow); >> - if (!r) { >> + if (!r) >> bo->shadow->parent = amdgpu_bo_ref(bo); >> - mutex_lock(&adev->shadow_list_lock); >> - list_add_tail(&bo->shadow_list, &adev->shadow_list); >> - mutex_unlock(&adev->shadow_list_lock); >> - } >> >> return r; >> } >> @@ -603,6 +599,12 @@ int amdgpu_bo_create(struct amdgpu_device >*adev, >> if (!bp->resv) >> reservation_object_unlock((*bo_ptr)->tbo.resv); >> >> + if (!r) { >> + mutex_lock(&adev->shadow_list_lock); >> + list_add_tail(&(*bo_ptr)->shadow_list, &adev- >>shadow_list); >> + mutex_unlock(&adev->shadow_list_lock); >> + } >> + >> if (r) >> amdgpu_bo_unref(bo_ptr); >> } _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <BN7PR12MB2644EBCEAD5A7EAD4700FDD28F050-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>]
* Re: [PATCH] drm/amdgpu: Fix the dead lock issue. [not found] ` <BN7PR12MB2644EBCEAD5A7EAD4700FDD28F050-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> @ 2018-09-10 7:23 ` Christian König [not found] ` <6798b857-49ec-06b3-f25f-f9df91a1ae20-5C7GfCeVMHo@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Christian König @ 2018-09-10 7:23 UTC (permalink / raw) To: Deng, Emily, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Am 10.09.2018 um 09:19 schrieb Deng, Emily: >> -----Original Message----- >> From: Christian König <ckoenig.leichtzumerken@gmail.com> >> Sent: Monday, September 10, 2018 3:06 PM >> To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org >> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. >> >> Am 10.09.2018 um 06:07 schrieb Emily Deng: >>> It will ramdomly have the dead lock issue when test TDR: >>> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2. >>> amdgpu_bo_create locked the bo's resv lock 3. amdgpu_bo_create_shadow >>> is waiting for the shadow_list_lock 4. >>> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv >>> lock. >>> >>> Signed-off-by: Emily Deng <Emily.Deng@amd.com> >> Good catch, problem is the fix doesn't work like this because the lock won't be >> dropped at this point in most cases. > Sorry, could you explain more, why the lock won't be dropped after calling reservation_object_unlock? reservation_object_unlock won't be called for most BOs. E.g. the shadow is used for page directories and all page directories use the root reservation object and so have bp->resv set here. Christian. >> Instead you need to fix amdgpu_device_recover_vram_from_shadow to make >> a local copy of the list. >> >> You can do this by grabbing a reference to each BO and moving it to a local >> list. >> >> Regards, >> Christian. >> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 12 +++++++----- >>> 1 file changed, 7 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>> index de990bd..c75447d 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>> @@ -557,12 +557,8 @@ static int amdgpu_bo_create_shadow(struct >> amdgpu_device *adev, >>> bp.resv = bo->tbo.resv; >>> >>> r = amdgpu_bo_do_create(adev, &bp, &bo->shadow); >>> - if (!r) { >>> + if (!r) >>> bo->shadow->parent = amdgpu_bo_ref(bo); >>> - mutex_lock(&adev->shadow_list_lock); >>> - list_add_tail(&bo->shadow_list, &adev->shadow_list); >>> - mutex_unlock(&adev->shadow_list_lock); >>> - } >>> >>> return r; >>> } >>> @@ -603,6 +599,12 @@ int amdgpu_bo_create(struct amdgpu_device >> *adev, >>> if (!bp->resv) >>> reservation_object_unlock((*bo_ptr)->tbo.resv); >>> >>> + if (!r) { >>> + mutex_lock(&adev->shadow_list_lock); >>> + list_add_tail(&(*bo_ptr)->shadow_list, &adev- >>> shadow_list); >>> + mutex_unlock(&adev->shadow_list_lock); >>> + } >>> + >>> if (r) >>> amdgpu_bo_unref(bo_ptr); >>> } _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <6798b857-49ec-06b3-f25f-f9df91a1ae20-5C7GfCeVMHo@public.gmane.org>]
* RE: [PATCH] drm/amdgpu: Fix the dead lock issue. [not found] ` <6798b857-49ec-06b3-f25f-f9df91a1ae20-5C7GfCeVMHo@public.gmane.org> @ 2018-09-10 7:57 ` Deng, Emily 0 siblings, 0 replies; 15+ messages in thread From: Deng, Emily @ 2018-09-10 7:57 UTC (permalink / raw) To: Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW >-----Original Message----- >From: Koenig, Christian >Sent: Monday, September 10, 2018 3:23 PM >To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org >Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. > >Am 10.09.2018 um 09:19 schrieb Deng, Emily: >>> -----Original Message----- >>> From: Christian König <ckoenig.leichtzumerken@gmail.com> >>> Sent: Monday, September 10, 2018 3:06 PM >>> To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org >>> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. >>> >>> Am 10.09.2018 um 06:07 schrieb Emily Deng: >>>> It will ramdomly have the dead lock issue when test TDR: >>>> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2. >>>> amdgpu_bo_create locked the bo's resv lock 3. >>>> amdgpu_bo_create_shadow is waiting for the shadow_list_lock 4. >>>> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv >>>> lock. >>>> >>>> Signed-off-by: Emily Deng <Emily.Deng@amd.com> >>> Good catch, problem is the fix doesn't work like this because the >>> lock won't be dropped at this point in most cases. >> Sorry, could you explain more, why the lock won't be dropped after calling >reservation_object_unlock? > >reservation_object_unlock won't be called for most BOs. > >E.g. the shadow is used for page directories and all page directories use the >root reservation object and so have bp->resv set here. Thanks, will modify as your suggestion. Best wishes Emily Deng >Christian. > >>> Instead you need to fix amdgpu_device_recover_vram_from_shadow to >>> make a local copy of the list. >>> >>> You can do this by grabbing a reference to each BO and moving it to a >>> local list. >>> >>> Regards, >>> Christian. >>> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 12 +++++++----- >>>> 1 file changed, 7 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>>> index de990bd..c75447d 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>>> @@ -557,12 +557,8 @@ static int amdgpu_bo_create_shadow(struct >>> amdgpu_device *adev, >>>> bp.resv = bo->tbo.resv; >>>> >>>> r = amdgpu_bo_do_create(adev, &bp, &bo->shadow); >>>> - if (!r) { >>>> + if (!r) >>>> bo->shadow->parent = amdgpu_bo_ref(bo); >>>> - mutex_lock(&adev->shadow_list_lock); >>>> - list_add_tail(&bo->shadow_list, &adev->shadow_list); >>>> - mutex_unlock(&adev->shadow_list_lock); >>>> - } >>>> >>>> return r; >>>> } >>>> @@ -603,6 +599,12 @@ int amdgpu_bo_create(struct amdgpu_device >>> *adev, >>>> if (!bp->resv) >>>> reservation_object_unlock((*bo_ptr)->tbo.resv); >>>> >>>> + if (!r) { >>>> + mutex_lock(&adev->shadow_list_lock); >>>> + list_add_tail(&(*bo_ptr)->shadow_list, &adev- >>>> shadow_list); >>>> + mutex_unlock(&adev->shadow_list_lock); >>>> + } >>>> + >>>> if (r) >>>> amdgpu_bo_unref(bo_ptr); >>>> } _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2018-09-11 6:55 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-09-11 2:51 [PATCH] drm/amdgpu: Fix the dead lock issue Emily Deng [not found] ` <1536634293-26099-1-git-send-email-Emily.Deng-5C7GfCeVMHo@public.gmane.org> 2018-09-11 3:01 ` Zhang, Jerry (Junwei) 2018-09-11 3:03 ` zhoucm1 [not found] ` <bf049e4b-321f-33b4-dad2-708269515e90-5C7GfCeVMHo@public.gmane.org> 2018-09-11 3:23 ` Deng, Emily [not found] ` <BN7PR12MB2644CEE78E8E6D658D1A686E8F040-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> 2018-09-11 3:27 ` zhoucm1 [not found] ` <fcbc7544-d08d-4051-c554-f1f9cecb87e1-5C7GfCeVMHo@public.gmane.org> 2018-09-11 3:32 ` Deng, Emily [not found] ` <BN7PR12MB2644E374B512DC7EF10CA6D08F040-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> 2018-09-11 3:37 ` zhoucm1 [not found] ` <be67e50d-f161-c7da-256e-149ed1b56f26-5C7GfCeVMHo@public.gmane.org> 2018-09-11 5:41 ` zhoucm1 [not found] ` <83f7a45f-1aee-a2a8-bc82-f3433157c6cb-5C7GfCeVMHo@public.gmane.org> 2018-09-11 6:40 ` Christian König [not found] ` <c67be83d-aecd-a291-a564-15f1fe501680-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2018-09-11 6:55 ` Deng, Emily -- strict thread matches above, loose matches on Subject: below -- 2018-09-10 4:07 Emily Deng [not found] ` <1536552453-18595-1-git-send-email-Emily.Deng-5C7GfCeVMHo@public.gmane.org> 2018-09-10 7:06 ` Christian König [not found] ` <fe055a6c-0859-7404-6236-2c22eb8e3df5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2018-09-10 7:19 ` Deng, Emily [not found] ` <BN7PR12MB2644EBCEAD5A7EAD4700FDD28F050-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org> 2018-09-10 7:23 ` Christian König [not found] ` <6798b857-49ec-06b3-f25f-f9df91a1ae20-5C7GfCeVMHo@public.gmane.org> 2018-09-10 7:57 ` Deng, Emily
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.