All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nouveau: fix client work fence deletion race
@ 2023-06-15  2:40 Dave Airlie
  2023-06-15 10:45 ` Karol Herbst
  2023-06-15 11:10 ` Lucas Stach
  0 siblings, 2 replies; 3+ messages in thread
From: Dave Airlie @ 2023-06-15  2:40 UTC (permalink / raw)
  To: dri-devel

From: Dave Airlie <airlied@redhat.com>

This seems to have existed for ever but is now more apparant after
9bff18d13473a9fdf81d5158248472a9d8ecf2bd (drm/ttm: use per BO cleanup workers)

My analysis:
two threads are running,
one in the irq signalling the fence, in dma_fence_signal_timestamp_locked,
it has done the DMA_FENCE_FLAG_SIGNALLED_BIT setting, but hasn't yet reached the callbacks.

second thread in nouveau_cli_work_ready, where it sees the fence is signalled, so then puts the
fence, cleanups the object and frees the work item, which contains the callback.

thread one goes again and tries to call the callback and causes the use-after-free.

Proposed fix:
lock the fence signalled check in nouveau_cli_work_ready, so either the callbacks are done
or the memory is freed.

Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_drm.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index cc7c5b4a05fd..1a45be769848 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -137,10 +137,16 @@ nouveau_name(struct drm_device *dev)
 static inline bool
 nouveau_cli_work_ready(struct dma_fence *fence)
 {
-	if (!dma_fence_is_signaled(fence))
-		return false;
-	dma_fence_put(fence);
-	return true;
+	unsigned long flags;
+	bool ret = true;
+	spin_lock_irqsave(fence->lock, flags);
+	if (!dma_fence_is_signaled_locked(fence))
+		ret = false;
+	spin_unlock_irqrestore(fence->lock, flags);
+
+	if (ret == true)
+		dma_fence_put(fence);
+	return ret;
 }
 
 static void
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] nouveau: fix client work fence deletion race
  2023-06-15  2:40 [PATCH] nouveau: fix client work fence deletion race Dave Airlie
@ 2023-06-15 10:45 ` Karol Herbst
  2023-06-15 11:10 ` Lucas Stach
  1 sibling, 0 replies; 3+ messages in thread
From: Karol Herbst @ 2023-06-15 10:45 UTC (permalink / raw)
  To: Dave Airlie; +Cc: dri-devel

On Thu, Jun 15, 2023 at 4:47 AM Dave Airlie <airlied@gmail.com> wrote:
>
> From: Dave Airlie <airlied@redhat.com>
>
> This seems to have existed for ever but is now more apparant after
> 9bff18d13473a9fdf81d5158248472a9d8ecf2bd (drm/ttm: use per BO cleanup workers)
>
> My analysis:
> two threads are running,
> one in the irq signalling the fence, in dma_fence_signal_timestamp_locked,
> it has done the DMA_FENCE_FLAG_SIGNALLED_BIT setting, but hasn't yet reached the callbacks.
>
> second thread in nouveau_cli_work_ready, where it sees the fence is signalled, so then puts the
> fence, cleanups the object and frees the work item, which contains the callback.
>
> thread one goes again and tries to call the callback and causes the use-after-free.
>
> Proposed fix:
> lock the fence signalled check in nouveau_cli_work_ready, so either the callbacks are done
> or the memory is freed.
>
> Signed-off-by: Dave Airlie <airlied@redhat.com>

Fixes: 11e451e74050 ("drm/nouveau: remove fence wait code from
deferred client work handler")

Is I think the most reasonable commit to tag to ensure backports.

> ---
>  drivers/gpu/drm/nouveau/nouveau_drm.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index cc7c5b4a05fd..1a45be769848 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -137,10 +137,16 @@ nouveau_name(struct drm_device *dev)
>  static inline bool
>  nouveau_cli_work_ready(struct dma_fence *fence)
>  {
> -       if (!dma_fence_is_signaled(fence))
> -               return false;
> -       dma_fence_put(fence);
> -       return true;
> +       unsigned long flags;
> +       bool ret = true;
> +       spin_lock_irqsave(fence->lock, flags);
> +       if (!dma_fence_is_signaled_locked(fence))
> +               ret = false;
> +       spin_unlock_irqrestore(fence->lock, flags);

I agree with the code being always broken as
`dma_fence_is_signaled_locked` specifies: "This function requires
&dma_fence.lock to be held."

> +
> +       if (ret == true)
> +               dma_fence_put(fence);
> +       return ret;
>  }
>
>  static void
> --
> 2.40.1
>

regardless of the patch fixing the issue users have seen:

Reviewed-by: Karol Herbst <kherbst@redhat.com>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] nouveau: fix client work fence deletion race
  2023-06-15  2:40 [PATCH] nouveau: fix client work fence deletion race Dave Airlie
  2023-06-15 10:45 ` Karol Herbst
@ 2023-06-15 11:10 ` Lucas Stach
  1 sibling, 0 replies; 3+ messages in thread
From: Lucas Stach @ 2023-06-15 11:10 UTC (permalink / raw)
  To: Dave Airlie, dri-devel

Am Donnerstag, dem 15.06.2023 um 12:40 +1000 schrieb Dave Airlie:
> From: Dave Airlie <airlied@redhat.com>
> 
> This seems to have existed for ever but is now more apparant after
> 9bff18d13473a9fdf81d5158248472a9d8ecf2bd (drm/ttm: use per BO cleanup workers)
> 
> My analysis:
> two threads are running,
> one in the irq signalling the fence, in dma_fence_signal_timestamp_locked,
> it has done the DMA_FENCE_FLAG_SIGNALLED_BIT setting, but hasn't yet reached the callbacks.
> 
> second thread in nouveau_cli_work_ready, where it sees the fence is signalled, so then puts the
> fence, cleanups the object and frees the work item, which contains the callback.
> 
> thread one goes again and tries to call the callback and causes the use-after-free.
> 
> Proposed fix:
> lock the fence signalled check in nouveau_cli_work_ready, so either the callbacks are done
> or the memory is freed.
> 
> Signed-off-by: Dave Airlie <airlied@redhat.com>
> ---
>  drivers/gpu/drm/nouveau/nouveau_drm.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index cc7c5b4a05fd..1a45be769848 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -137,10 +137,16 @@ nouveau_name(struct drm_device *dev)
>  static inline bool
>  nouveau_cli_work_ready(struct dma_fence *fence)
>  {
> -	if (!dma_fence_is_signaled(fence))
> -		return false;
> -	dma_fence_put(fence);
> -	return true;
> +	unsigned long flags;
> +	bool ret = true;
> +	spin_lock_irqsave(fence->lock, flags);

This function is only ever called from worker (process) context, so
there is no point in saving the current irq state. This can be a plain
spin_lock_irq.

Regards,
Lucas

> +	if (!dma_fence_is_signaled_locked(fence))
> +		ret = false;
> +	spin_unlock_irqrestore(fence->lock, flags);
> +
> +	if (ret == true)
> +		dma_fence_put(fence);
> +	return ret;
>  }
>  
>  static void


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-06-15 11:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-15  2:40 [PATCH] nouveau: fix client work fence deletion race Dave Airlie
2023-06-15 10:45 ` Karol Herbst
2023-06-15 11:10 ` Lucas Stach

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.