All of lore.kernel.org
 help / color / mirror / Atom feed
* [resend PATCH] drm/ttm: Fix a deadlock if the target BO is not idle during swap
@ 2021-09-07  4:08 xinhui pan
  2021-09-07  6:22 ` Christian König
  0 siblings, 1 reply; 9+ messages in thread
From: xinhui pan @ 2021-09-07  4:08 UTC (permalink / raw)
  To: amd-gfx; +Cc: christian.koenig, dri-devel, xinhui pan

The ret value might be -EBUSY, caller will think lru lock is still
locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
list corruption.

ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will
be stuck as we actually did not free any BO memory. This usually happens
when the fence is not signaled for a long time.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 8d7fd65ccced..23f906941ac9 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1152,9 +1152,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
 	}
 
 	if (bo->deleted) {
-		ttm_bo_cleanup_refs(bo, false, false, locked);
+		ret = ttm_bo_cleanup_refs(bo, false, false, locked);
 		ttm_bo_put(bo);
-		return 0;
+		return ret == -EBUSY ? -ENOSPC : ret;
 	}
 
 	ttm_bo_del_from_lru(bo);
@@ -1208,7 +1208,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
 	if (locked)
 		dma_resv_unlock(bo->base.resv);
 	ttm_bo_put(bo);
-	return ret;
+	return ret == -EBUSY ? -ENOSPC : ret;
 }
 
 void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [resend PATCH] drm/ttm: Fix a deadlock if the target BO is not idle during swap
  2021-09-07  4:08 [resend PATCH] drm/ttm: Fix a deadlock if the target BO is not idle during swap xinhui pan
@ 2021-09-07  6:22 ` Christian König
  2021-09-07  9:05   ` Daniel Vetter
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2021-09-07  6:22 UTC (permalink / raw)
  To: xinhui pan, amd-gfx; +Cc: dri-devel

Added a Fixes tag and pushed this to drm-misc-fixes.

It will take a while until it cycles back into the development branches, 
so feel free to push some version to amd-staging-drm-next as well. Just 
ping Alex when you do this.

Thanks,
Christian.

Am 07.09.21 um 06:08 schrieb xinhui pan:
> The ret value might be -EBUSY, caller will think lru lock is still
> locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
> list corruption.
>
> ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
> caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will
> be stuck as we actually did not free any BO memory. This usually happens
> when the fence is not signaled for a long time.
>
> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 8d7fd65ccced..23f906941ac9 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1152,9 +1152,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
>   	}
>   
>   	if (bo->deleted) {
> -		ttm_bo_cleanup_refs(bo, false, false, locked);
> +		ret = ttm_bo_cleanup_refs(bo, false, false, locked);
>   		ttm_bo_put(bo);
> -		return 0;
> +		return ret == -EBUSY ? -ENOSPC : ret;
>   	}
>   
>   	ttm_bo_del_from_lru(bo);
> @@ -1208,7 +1208,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
>   	if (locked)
>   		dma_resv_unlock(bo->base.resv);
>   	ttm_bo_put(bo);
> -	return ret;
> +	return ret == -EBUSY ? -ENOSPC : ret;
>   }
>   
>   void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [resend PATCH] drm/ttm: Fix a deadlock if the target BO is not idle during swap
  2021-09-07  6:22 ` Christian König
@ 2021-09-07  9:05   ` Daniel Vetter
  2021-09-07  9:28     ` Christian König
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Vetter @ 2021-09-07  9:05 UTC (permalink / raw)
  To: Christian König; +Cc: xinhui pan, amd-gfx, dri-devel

On Tue, Sep 07, 2021 at 08:22:20AM +0200, Christian König wrote:
> Added a Fixes tag and pushed this to drm-misc-fixes.

We're in the merge window, this should have been drm-misc-next-fixes. I'll
poke misc maintainers so it's not lost.
-Daniel

> 
> It will take a while until it cycles back into the development branches, so
> feel free to push some version to amd-staging-drm-next as well. Just ping
> Alex when you do this.
> 
> Thanks,
> Christian.
> 
> Am 07.09.21 um 06:08 schrieb xinhui pan:
> > The ret value might be -EBUSY, caller will think lru lock is still
> > locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
> > list corruption.
> > 
> > ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
> > caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will
> > be stuck as we actually did not free any BO memory. This usually happens
> > when the fence is not signaled for a long time.
> > 
> > Signed-off-by: xinhui pan <xinhui.pan@amd.com>
> > Reviewed-by: Christian König <christian.koenig@amd.com>
> > ---
> >   drivers/gpu/drm/ttm/ttm_bo.c | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> > index 8d7fd65ccced..23f906941ac9 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > @@ -1152,9 +1152,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
> >   	}
> >   	if (bo->deleted) {
> > -		ttm_bo_cleanup_refs(bo, false, false, locked);
> > +		ret = ttm_bo_cleanup_refs(bo, false, false, locked);
> >   		ttm_bo_put(bo);
> > -		return 0;
> > +		return ret == -EBUSY ? -ENOSPC : ret;
> >   	}
> >   	ttm_bo_del_from_lru(bo);
> > @@ -1208,7 +1208,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
> >   	if (locked)
> >   		dma_resv_unlock(bo->base.resv);
> >   	ttm_bo_put(bo);
> > -	return ret;
> > +	return ret == -EBUSY ? -ENOSPC : ret;
> >   }
> >   void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [resend PATCH] drm/ttm: Fix a deadlock if the target BO is not idle during swap
  2021-09-07  9:05   ` Daniel Vetter
@ 2021-09-07  9:28     ` Christian König
  2021-09-08 18:27       ` Daniel Vetter
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2021-09-07  9:28 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: xinhui pan, amd-gfx, dri-devel

Am 07.09.21 um 11:05 schrieb Daniel Vetter:
> On Tue, Sep 07, 2021 at 08:22:20AM +0200, Christian König wrote:
>> Added a Fixes tag and pushed this to drm-misc-fixes.
> We're in the merge window, this should have been drm-misc-next-fixes. I'll
> poke misc maintainers so it's not lost.

Hui? It's a fix for a problem in stable and not in drm-misc-next.

Christian.

> -Daniel
>
>> It will take a while until it cycles back into the development branches, so
>> feel free to push some version to amd-staging-drm-next as well. Just ping
>> Alex when you do this.
>>
>> Thanks,
>> Christian.
>>
>> Am 07.09.21 um 06:08 schrieb xinhui pan:
>>> The ret value might be -EBUSY, caller will think lru lock is still
>>> locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
>>> list corruption.
>>>
>>> ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
>>> caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will
>>> be stuck as we actually did not free any BO memory. This usually happens
>>> when the fence is not signaled for a long time.
>>>
>>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>> ---
>>>    drivers/gpu/drm/ttm/ttm_bo.c | 6 +++---
>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 8d7fd65ccced..23f906941ac9 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -1152,9 +1152,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
>>>    	}
>>>    	if (bo->deleted) {
>>> -		ttm_bo_cleanup_refs(bo, false, false, locked);
>>> +		ret = ttm_bo_cleanup_refs(bo, false, false, locked);
>>>    		ttm_bo_put(bo);
>>> -		return 0;
>>> +		return ret == -EBUSY ? -ENOSPC : ret;
>>>    	}
>>>    	ttm_bo_del_from_lru(bo);
>>> @@ -1208,7 +1208,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
>>>    	if (locked)
>>>    		dma_resv_unlock(bo->base.resv);
>>>    	ttm_bo_put(bo);
>>> -	return ret;
>>> +	return ret == -EBUSY ? -ENOSPC : ret;
>>>    }
>>>    void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [resend PATCH] drm/ttm: Fix a deadlock if the target BO is not idle during swap
  2021-09-07  9:28     ` Christian König
@ 2021-09-08 18:27       ` Daniel Vetter
  2021-09-09  7:10         ` Christian König
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Vetter @ 2021-09-08 18:27 UTC (permalink / raw)
  To: Christian König; +Cc: Daniel Vetter, xinhui pan, amd-gfx, dri-devel

On Tue, Sep 07, 2021 at 11:28:23AM +0200, Christian König wrote:
> Am 07.09.21 um 11:05 schrieb Daniel Vetter:
> > On Tue, Sep 07, 2021 at 08:22:20AM +0200, Christian König wrote:
> > > Added a Fixes tag and pushed this to drm-misc-fixes.
> > We're in the merge window, this should have been drm-misc-next-fixes. I'll
> > poke misc maintainers so it's not lost.
> 
> Hui? It's a fix for a problem in stable and not in drm-misc-next.

Ah the flow chart is confusing. There is no current -rc, so it's always
-next-fixes. Or you're running the risk that it's lost until after -rc1.
Maybe we should clarify that "is the bug in current -rc?" only applies if
there is a current -rc.

Anyway Thomas sent out a pr, so it's all good.
-Daniel

> 
> Christian.
> 
> > -Daniel
> > 
> > > It will take a while until it cycles back into the development branches, so
> > > feel free to push some version to amd-staging-drm-next as well. Just ping
> > > Alex when you do this.
> > > 
> > > Thanks,
> > > Christian.
> > > 
> > > Am 07.09.21 um 06:08 schrieb xinhui pan:
> > > > The ret value might be -EBUSY, caller will think lru lock is still
> > > > locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
> > > > list corruption.
> > > > 
> > > > ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
> > > > caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will
> > > > be stuck as we actually did not free any BO memory. This usually happens
> > > > when the fence is not signaled for a long time.
> > > > 
> > > > Signed-off-by: xinhui pan <xinhui.pan@amd.com>
> > > > Reviewed-by: Christian König <christian.koenig@amd.com>
> > > > ---
> > > >    drivers/gpu/drm/ttm/ttm_bo.c | 6 +++---
> > > >    1 file changed, 3 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> > > > index 8d7fd65ccced..23f906941ac9 100644
> > > > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > > > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > > > @@ -1152,9 +1152,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
> > > >    	}
> > > >    	if (bo->deleted) {
> > > > -		ttm_bo_cleanup_refs(bo, false, false, locked);
> > > > +		ret = ttm_bo_cleanup_refs(bo, false, false, locked);
> > > >    		ttm_bo_put(bo);
> > > > -		return 0;
> > > > +		return ret == -EBUSY ? -ENOSPC : ret;
> > > >    	}
> > > >    	ttm_bo_del_from_lru(bo);
> > > > @@ -1208,7 +1208,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
> > > >    	if (locked)
> > > >    		dma_resv_unlock(bo->base.resv);
> > > >    	ttm_bo_put(bo);
> > > > -	return ret;
> > > > +	return ret == -EBUSY ? -ENOSPC : ret;
> > > >    }
> > > >    void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [resend PATCH] drm/ttm: Fix a deadlock if the target BO is not idle during swap
  2021-09-08 18:27       ` Daniel Vetter
@ 2021-09-09  7:10         ` Christian König
  2021-09-14 13:50           ` Daniel Vetter
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2021-09-09  7:10 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: xinhui pan, amd-gfx, dri-devel

Am 08.09.21 um 20:27 schrieb Daniel Vetter:
> On Tue, Sep 07, 2021 at 11:28:23AM +0200, Christian König wrote:
>> Am 07.09.21 um 11:05 schrieb Daniel Vetter:
>>> On Tue, Sep 07, 2021 at 08:22:20AM +0200, Christian König wrote:
>>>> Added a Fixes tag and pushed this to drm-misc-fixes.
>>> We're in the merge window, this should have been drm-misc-next-fixes. I'll
>>> poke misc maintainers so it's not lost.
>> Hui? It's a fix for a problem in stable and not in drm-misc-next.
> Ah the flow chart is confusing. There is no current -rc, so it's always
> -next-fixes. Or you're running the risk that it's lost until after -rc1.
> Maybe we should clarify that "is the bug in current -rc?" only applies if
> there is a current -rc.

Yeah, I've noticed this as well.

But when there is no current -rc because we are in the merge window then 
the question is how do I submit patches to the current stable?

In other words this patch here is really for 5.14 and should then be 
backported to 5.13 and maybe even 5.10 as well.

The code was restructured for 5.15 and I even need to double check if 
that still applies there as well.

Or should I send patches like those directly to Greg?

Regards,
Christian.

>
> Anyway Thomas sent out a pr, so it's all good.
> -Daniel
>
>> Christian.
>>
>>> -Daniel
>>>
>>>> It will take a while until it cycles back into the development branches, so
>>>> feel free to push some version to amd-staging-drm-next as well. Just ping
>>>> Alex when you do this.
>>>>
>>>> Thanks,
>>>> Christian.
>>>>
>>>> Am 07.09.21 um 06:08 schrieb xinhui pan:
>>>>> The ret value might be -EBUSY, caller will think lru lock is still
>>>>> locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
>>>>> list corruption.
>>>>>
>>>>> ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
>>>>> caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will
>>>>> be stuck as we actually did not free any BO memory. This usually happens
>>>>> when the fence is not signaled for a long time.
>>>>>
>>>>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>>>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>> ---
>>>>>     drivers/gpu/drm/ttm/ttm_bo.c | 6 +++---
>>>>>     1 file changed, 3 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> index 8d7fd65ccced..23f906941ac9 100644
>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> @@ -1152,9 +1152,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
>>>>>     	}
>>>>>     	if (bo->deleted) {
>>>>> -		ttm_bo_cleanup_refs(bo, false, false, locked);
>>>>> +		ret = ttm_bo_cleanup_refs(bo, false, false, locked);
>>>>>     		ttm_bo_put(bo);
>>>>> -		return 0;
>>>>> +		return ret == -EBUSY ? -ENOSPC : ret;
>>>>>     	}
>>>>>     	ttm_bo_del_from_lru(bo);
>>>>> @@ -1208,7 +1208,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
>>>>>     	if (locked)
>>>>>     		dma_resv_unlock(bo->base.resv);
>>>>>     	ttm_bo_put(bo);
>>>>> -	return ret;
>>>>> +	return ret == -EBUSY ? -ENOSPC : ret;
>>>>>     }
>>>>>     void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [resend PATCH] drm/ttm: Fix a deadlock if the target BO is not idle during swap
  2021-09-09  7:10         ` Christian König
@ 2021-09-14 13:50           ` Daniel Vetter
  2021-09-15 10:05             ` Christian König
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Vetter @ 2021-09-14 13:50 UTC (permalink / raw)
  To: Christian König; +Cc: Daniel Vetter, xinhui pan, amd-gfx, dri-devel

On Thu, Sep 09, 2021 at 09:10:39AM +0200, Christian König wrote:
> Am 08.09.21 um 20:27 schrieb Daniel Vetter:
> > On Tue, Sep 07, 2021 at 11:28:23AM +0200, Christian König wrote:
> > > Am 07.09.21 um 11:05 schrieb Daniel Vetter:
> > > > On Tue, Sep 07, 2021 at 08:22:20AM +0200, Christian König wrote:
> > > > > Added a Fixes tag and pushed this to drm-misc-fixes.
> > > > We're in the merge window, this should have been drm-misc-next-fixes. I'll
> > > > poke misc maintainers so it's not lost.
> > > Hui? It's a fix for a problem in stable and not in drm-misc-next.
> > Ah the flow chart is confusing. There is no current -rc, so it's always
> > -next-fixes. Or you're running the risk that it's lost until after -rc1.
> > Maybe we should clarify that "is the bug in current -rc?" only applies if
> > there is a current -rc.
> 
> Yeah, I've noticed this as well.
> 
> But when there is no current -rc because we are in the merge window then the
> question is how do I submit patches to the current stable?

You never submit patches directly to stable. It's always "get it into
Linus' tree asap" plus either Cc: stable or a Fixes: line. During merge
window "get into Linus' tree asap" means "put it into drm-misc-next-fixes"

> In other words this patch here is really for 5.14 and should then be
> backported to 5.13 and maybe even 5.10 as well.
> 
> The code was restructured for 5.15 and I even need to double check if that
> still applies there as well.
> 
> Or should I send patches like those directly to Greg?

Nope. Just fastest path into Linus' tree is good enough. Greg picks up
patches directly from the merge window if it has one of the tags. There's
occasionally a bit of grumbling because there's so many stable patches
coming in during the merge window, but otherwise it should be in stable in
the next release like during -rc phase.
-Daniel

> Regards,
> Christian.
> 
> > 
> > Anyway Thomas sent out a pr, so it's all good.
> > -Daniel
> > 
> > > Christian.
> > > 
> > > > -Daniel
> > > > 
> > > > > It will take a while until it cycles back into the development branches, so
> > > > > feel free to push some version to amd-staging-drm-next as well. Just ping
> > > > > Alex when you do this.
> > > > > 
> > > > > Thanks,
> > > > > Christian.
> > > > > 
> > > > > Am 07.09.21 um 06:08 schrieb xinhui pan:
> > > > > > The ret value might be -EBUSY, caller will think lru lock is still
> > > > > > locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
> > > > > > list corruption.
> > > > > > 
> > > > > > ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
> > > > > > caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will
> > > > > > be stuck as we actually did not free any BO memory. This usually happens
> > > > > > when the fence is not signaled for a long time.
> > > > > > 
> > > > > > Signed-off-by: xinhui pan <xinhui.pan@amd.com>
> > > > > > Reviewed-by: Christian König <christian.koenig@amd.com>
> > > > > > ---
> > > > > >     drivers/gpu/drm/ttm/ttm_bo.c | 6 +++---
> > > > > >     1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> > > > > > index 8d7fd65ccced..23f906941ac9 100644
> > > > > > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > > > > > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > > > > > @@ -1152,9 +1152,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
> > > > > >     	}
> > > > > >     	if (bo->deleted) {
> > > > > > -		ttm_bo_cleanup_refs(bo, false, false, locked);
> > > > > > +		ret = ttm_bo_cleanup_refs(bo, false, false, locked);
> > > > > >     		ttm_bo_put(bo);
> > > > > > -		return 0;
> > > > > > +		return ret == -EBUSY ? -ENOSPC : ret;
> > > > > >     	}
> > > > > >     	ttm_bo_del_from_lru(bo);
> > > > > > @@ -1208,7 +1208,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
> > > > > >     	if (locked)
> > > > > >     		dma_resv_unlock(bo->base.resv);
> > > > > >     	ttm_bo_put(bo);
> > > > > > -	return ret;
> > > > > > +	return ret == -EBUSY ? -ENOSPC : ret;
> > > > > >     }
> > > > > >     void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [resend PATCH] drm/ttm: Fix a deadlock if the target BO is not idle during swap
  2021-09-14 13:50           ` Daniel Vetter
@ 2021-09-15 10:05             ` Christian König
  2021-09-16 12:22               ` Daniel Vetter
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2021-09-15 10:05 UTC (permalink / raw)
  To: Daniel Vetter, Christian König; +Cc: xinhui pan, amd-gfx, dri-devel

Am 14.09.21 um 15:50 schrieb Daniel Vetter:
> On Thu, Sep 09, 2021 at 09:10:39AM +0200, Christian König wrote:
>> Am 08.09.21 um 20:27 schrieb Daniel Vetter:
>>> On Tue, Sep 07, 2021 at 11:28:23AM +0200, Christian König wrote:
>>>> Am 07.09.21 um 11:05 schrieb Daniel Vetter:
>>>>> On Tue, Sep 07, 2021 at 08:22:20AM +0200, Christian König wrote:
>>>>>> Added a Fixes tag and pushed this to drm-misc-fixes.
>>>>> We're in the merge window, this should have been drm-misc-next-fixes. I'll
>>>>> poke misc maintainers so it's not lost.
>>>> Hui? It's a fix for a problem in stable and not in drm-misc-next.
>>> Ah the flow chart is confusing. There is no current -rc, so it's always
>>> -next-fixes. Or you're running the risk that it's lost until after -rc1.
>>> Maybe we should clarify that "is the bug in current -rc?" only applies if
>>> there is a current -rc.
>> Yeah, I've noticed this as well.
>>
>> But when there is no current -rc because we are in the merge window then the
>> question is how do I submit patches to the current stable?
> You never submit patches directly to stable. It's always "get it into
> Linus' tree asap" plus either Cc: stable or a Fixes: line.

But what if the code in drm-misc-next-fixes has been restructured and 
doesn't have that issue any more?

How do I get the patch into stable then? Submitting directly to Greg?

Thanks,
Christian.

>   During merge
> window "get into Linus' tree asap" means "put it into drm-misc-next-fixes"
>
>> In other words this patch here is really for 5.14 and should then be
>> backported to 5.13 and maybe even 5.10 as well.
>>
>> The code was restructured for 5.15 and I even need to double check if that
>> still applies there as well.
>>
>> Or should I send patches like those directly to Greg?
> Nope. Just fastest path into Linus' tree is good enough. Greg picks up
> patches directly from the merge window if it has one of the tags. There's
> occasionally a bit of grumbling because there's so many stable patches
> coming in during the merge window, but otherwise it should be in stable in
> the next release like during -rc phase.
> -Daniel
>
>> Regards,
>> Christian.
>>
>>> Anyway Thomas sent out a pr, so it's all good.
>>> -Daniel
>>>
>>>> Christian.
>>>>
>>>>> -Daniel
>>>>>
>>>>>> It will take a while until it cycles back into the development branches, so
>>>>>> feel free to push some version to amd-staging-drm-next as well. Just ping
>>>>>> Alex when you do this.
>>>>>>
>>>>>> Thanks,
>>>>>> Christian.
>>>>>>
>>>>>> Am 07.09.21 um 06:08 schrieb xinhui pan:
>>>>>>> The ret value might be -EBUSY, caller will think lru lock is still
>>>>>>> locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
>>>>>>> list corruption.
>>>>>>>
>>>>>>> ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
>>>>>>> caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will
>>>>>>> be stuck as we actually did not free any BO memory. This usually happens
>>>>>>> when the fence is not signaled for a long time.
>>>>>>>
>>>>>>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>>>>>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>>>> ---
>>>>>>>      drivers/gpu/drm/ttm/ttm_bo.c | 6 +++---
>>>>>>>      1 file changed, 3 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>> index 8d7fd65ccced..23f906941ac9 100644
>>>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>> @@ -1152,9 +1152,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
>>>>>>>      	}
>>>>>>>      	if (bo->deleted) {
>>>>>>> -		ttm_bo_cleanup_refs(bo, false, false, locked);
>>>>>>> +		ret = ttm_bo_cleanup_refs(bo, false, false, locked);
>>>>>>>      		ttm_bo_put(bo);
>>>>>>> -		return 0;
>>>>>>> +		return ret == -EBUSY ? -ENOSPC : ret;
>>>>>>>      	}
>>>>>>>      	ttm_bo_del_from_lru(bo);
>>>>>>> @@ -1208,7 +1208,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
>>>>>>>      	if (locked)
>>>>>>>      		dma_resv_unlock(bo->base.resv);
>>>>>>>      	ttm_bo_put(bo);
>>>>>>> -	return ret;
>>>>>>> +	return ret == -EBUSY ? -ENOSPC : ret;
>>>>>>>      }
>>>>>>>      void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [resend PATCH] drm/ttm: Fix a deadlock if the target BO is not idle during swap
  2021-09-15 10:05             ` Christian König
@ 2021-09-16 12:22               ` Daniel Vetter
  0 siblings, 0 replies; 9+ messages in thread
From: Daniel Vetter @ 2021-09-16 12:22 UTC (permalink / raw)
  To: Christian König
  Cc: Daniel Vetter, Christian König, xinhui pan, amd-gfx, dri-devel

On Wed, Sep 15, 2021 at 12:05:30PM +0200, Christian König wrote:
> Am 14.09.21 um 15:50 schrieb Daniel Vetter:
> > On Thu, Sep 09, 2021 at 09:10:39AM +0200, Christian König wrote:
> > > Am 08.09.21 um 20:27 schrieb Daniel Vetter:
> > > > On Tue, Sep 07, 2021 at 11:28:23AM +0200, Christian König wrote:
> > > > > Am 07.09.21 um 11:05 schrieb Daniel Vetter:
> > > > > > On Tue, Sep 07, 2021 at 08:22:20AM +0200, Christian König wrote:
> > > > > > > Added a Fixes tag and pushed this to drm-misc-fixes.
> > > > > > We're in the merge window, this should have been drm-misc-next-fixes. I'll
> > > > > > poke misc maintainers so it's not lost.
> > > > > Hui? It's a fix for a problem in stable and not in drm-misc-next.
> > > > Ah the flow chart is confusing. There is no current -rc, so it's always
> > > > -next-fixes. Or you're running the risk that it's lost until after -rc1.
> > > > Maybe we should clarify that "is the bug in current -rc?" only applies if
> > > > there is a current -rc.
> > > Yeah, I've noticed this as well.
> > > 
> > > But when there is no current -rc because we are in the merge window then the
> > > question is how do I submit patches to the current stable?
> > You never submit patches directly to stable. It's always "get it into
> > Linus' tree asap" plus either Cc: stable or a Fixes: line.
> 
> But what if the code in drm-misc-next-fixes has been restructured and
> doesn't have that issue any more?
> 
> How do I get the patch into stable then? Submitting directly to Greg?

Yes, that's the exception. With a reference to the commit in Linus' tree
that resolves the issue, and an explainer why you need something else as a
backport.

https://dri.freedesktop.org/docs/drm/process/stable-kernel-rules.html#option-3

Cheers, Daniel

> 
> Thanks,
> Christian.
> 
> >   During merge
> > window "get into Linus' tree asap" means "put it into drm-misc-next-fixes"
> > 
> > > In other words this patch here is really for 5.14 and should then be
> > > backported to 5.13 and maybe even 5.10 as well.
> > > 
> > > The code was restructured for 5.15 and I even need to double check if that
> > > still applies there as well.
> > > 
> > > Or should I send patches like those directly to Greg?
> > Nope. Just fastest path into Linus' tree is good enough. Greg picks up
> > patches directly from the merge window if it has one of the tags. There's
> > occasionally a bit of grumbling because there's so many stable patches
> > coming in during the merge window, but otherwise it should be in stable in
> > the next release like during -rc phase.
> > -Daniel
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > Anyway Thomas sent out a pr, so it's all good.
> > > > -Daniel
> > > > 
> > > > > Christian.
> > > > > 
> > > > > > -Daniel
> > > > > > 
> > > > > > > It will take a while until it cycles back into the development branches, so
> > > > > > > feel free to push some version to amd-staging-drm-next as well. Just ping
> > > > > > > Alex when you do this.
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > Christian.
> > > > > > > 
> > > > > > > Am 07.09.21 um 06:08 schrieb xinhui pan:
> > > > > > > > The ret value might be -EBUSY, caller will think lru lock is still
> > > > > > > > locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
> > > > > > > > list corruption.
> > > > > > > > 
> > > > > > > > ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
> > > > > > > > caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will
> > > > > > > > be stuck as we actually did not free any BO memory. This usually happens
> > > > > > > > when the fence is not signaled for a long time.
> > > > > > > > 
> > > > > > > > Signed-off-by: xinhui pan <xinhui.pan@amd.com>
> > > > > > > > Reviewed-by: Christian König <christian.koenig@amd.com>
> > > > > > > > ---
> > > > > > > >      drivers/gpu/drm/ttm/ttm_bo.c | 6 +++---
> > > > > > > >      1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> > > > > > > > index 8d7fd65ccced..23f906941ac9 100644
> > > > > > > > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > > > > > > > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > > > > > > > @@ -1152,9 +1152,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
> > > > > > > >      	}
> > > > > > > >      	if (bo->deleted) {
> > > > > > > > -		ttm_bo_cleanup_refs(bo, false, false, locked);
> > > > > > > > +		ret = ttm_bo_cleanup_refs(bo, false, false, locked);
> > > > > > > >      		ttm_bo_put(bo);
> > > > > > > > -		return 0;
> > > > > > > > +		return ret == -EBUSY ? -ENOSPC : ret;
> > > > > > > >      	}
> > > > > > > >      	ttm_bo_del_from_lru(bo);
> > > > > > > > @@ -1208,7 +1208,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
> > > > > > > >      	if (locked)
> > > > > > > >      		dma_resv_unlock(bo->base.resv);
> > > > > > > >      	ttm_bo_put(bo);
> > > > > > > > -	return ret;
> > > > > > > > +	return ret == -EBUSY ? -ENOSPC : ret;
> > > > > > > >      }
> > > > > > > >      void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-09-16 12:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-07  4:08 [resend PATCH] drm/ttm: Fix a deadlock if the target BO is not idle during swap xinhui pan
2021-09-07  6:22 ` Christian König
2021-09-07  9:05   ` Daniel Vetter
2021-09-07  9:28     ` Christian König
2021-09-08 18:27       ` Daniel Vetter
2021-09-09  7:10         ` Christian König
2021-09-14 13:50           ` Daniel Vetter
2021-09-15 10:05             ` Christian König
2021-09-16 12:22               ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.