stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] media: uvcvideo: Fix race condition with usb_kill_urb
@ 2022-12-14 16:31 Ricardo Ribalda
  2022-12-27 14:24 ` Laurent Pinchart
  0 siblings, 1 reply; 6+ messages in thread
From: Ricardo Ribalda @ 2022-12-14 16:31 UTC (permalink / raw)
  To: Laurent Pinchart, Sergey Senozhatsky, Mauro Carvalho Chehab, Max Staudt
  Cc: linux-media, stable, Yunke Cao, Ricardo Ribalda, linux-kernel

usb_kill_urb warranties that all the handlers are finished when it
returns, but does not protect against threads that might be handling
asynchronously the urb.

For UVC, the function uvc_ctrl_status_event_async() takes care of
control changes asynchronously.

 If the code is executed in the following order:

CPU 0					CPU 1
===== 					=====
uvc_status_complete()
					uvc_status_stop()
uvc_ctrl_status_event_work()
					uvc_status_start() -> FAIL

Then uvc_status_start will keep failing and this error will be shown:

<4>[    5.540139] URB 0000000000000000 submitted while active
drivers/usb/core/urb.c:378 usb_submit_urb+0x4c3/0x528

Let's improve the current situation, by not re-submiting the urb if
we are stopping the status event. Also process the queued work
(if any) during stop.

CPU 0					CPU 1
===== 					=====
uvc_status_complete()
					uvc_status_stop()
					uvc_status_start()
uvc_ctrl_status_event_work() -> FAIL

Hopefully, with the usb layer protection this should be enough to cover
all the cases.

Cc: stable@vger.kernel.org
Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
Reviewed-by: Yunke Cao <yunkec@chromium.org>
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
---
uvc: Fix race condition on uvc

Make sure that all the async work is finished when we stop the status urb.

To: Yunke Cao <yunkec@chromium.org>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
To: Max Staudt <mstaudt@google.com>
To: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
To: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-media@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes in v4:
- Replace bool with atomic_t to avoid compiler reordering
- First complete the async work and then kill the urb to avoid race (Thanks Laurent!)
- Link to v3: https://lore.kernel.org/r/20221212-uvc-race-v3-0-954efc752c9a@chromium.org

Changes in v3:
- Remove the patch for dev->status, makes more sense in another series, and makes
  the zero day less nervous.
- Update reviewed-by (thanks Yunke!).
- Link to v2: https://lore.kernel.org/r/20221212-uvc-race-v2-0-54496cc3b8ab@chromium.org

Changes in v2:
- Add a patch for not kalloc dev->status
- Redo the logic mechanism, so it also works with suspend (Thanks Yunke!)
- Link to v1: https://lore.kernel.org/r/20221212-uvc-race-v1-0-c52e1783c31d@chromium.org
---
 drivers/media/usb/uvc/uvc_ctrl.c   | 3 +++
 drivers/media/usb/uvc/uvc_status.c | 6 ++++++
 drivers/media/usb/uvc/uvcvideo.h   | 1 +
 3 files changed, 10 insertions(+)

diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
index c95a2229f4fa..1be6897a7d6d 100644
--- a/drivers/media/usb/uvc/uvc_ctrl.c
+++ b/drivers/media/usb/uvc/uvc_ctrl.c
@@ -1442,6 +1442,9 @@ static void uvc_ctrl_status_event_work(struct work_struct *work)
 
 	uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
 
+	if (atomic_read(&dev->flush_status))
+		return;
+
 	/* Resubmit the URB. */
 	w->urb->interval = dev->int_ep->desc.bInterval;
 	ret = usb_submit_urb(w->urb, GFP_KERNEL);
diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c
index 7518ffce22ed..4a95850cdc1b 100644
--- a/drivers/media/usb/uvc/uvc_status.c
+++ b/drivers/media/usb/uvc/uvc_status.c
@@ -304,10 +304,16 @@ int uvc_status_start(struct uvc_device *dev, gfp_t flags)
 	if (dev->int_urb == NULL)
 		return 0;
 
+	atomic_set(&dev->flush_status, 0);
 	return usb_submit_urb(dev->int_urb, flags);
 }
 
 void uvc_status_stop(struct uvc_device *dev)
 {
+	struct uvc_ctrl_work *w = &dev->async_ctrl;
+
+	atomic_set(&dev->flush_status, 1);
+	if (cancel_work_sync(&w->work))
+		uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
 	usb_kill_urb(dev->int_urb);
 }
diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index df93db259312..1274691f157f 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -560,6 +560,7 @@ struct uvc_device {
 	struct usb_host_endpoint *int_ep;
 	struct urb *int_urb;
 	u8 *status;
+	atomic_t flush_status;
 	struct input_dev *input;
 	char input_phys[64];
 

---
base-commit: 0ec5a38bf8499f403f81cb81a0e3a60887d1993c
change-id: 20221212-uvc-race-09276ea68bf8

Best regards,
-- 
Ricardo Ribalda <ribalda@chromium.org>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] media: uvcvideo: Fix race condition with usb_kill_urb
  2022-12-14 16:31 [PATCH v4] media: uvcvideo: Fix race condition with usb_kill_urb Ricardo Ribalda
@ 2022-12-27 14:24 ` Laurent Pinchart
  2022-12-27 14:37   ` Laurent Pinchart
  0 siblings, 1 reply; 6+ messages in thread
From: Laurent Pinchart @ 2022-12-27 14:24 UTC (permalink / raw)
  To: Ricardo Ribalda
  Cc: Sergey Senozhatsky, Mauro Carvalho Chehab, Max Staudt,
	linux-media, stable, Yunke Cao, linux-kernel

Hi Ricardo,

Thank you for the patch.

On Wed, Dec 14, 2022 at 05:31:55PM +0100, Ricardo Ribalda wrote:
> usb_kill_urb warranties that all the handlers are finished when it
> returns, but does not protect against threads that might be handling
> asynchronously the urb.
> 
> For UVC, the function uvc_ctrl_status_event_async() takes care of
> control changes asynchronously.
> 
>  If the code is executed in the following order:
> 
> CPU 0					CPU 1
> ===== 					=====
> uvc_status_complete()
> 					uvc_status_stop()
> uvc_ctrl_status_event_work()
> 					uvc_status_start() -> FAIL
> 
> Then uvc_status_start will keep failing and this error will be shown:
> 
> <4>[    5.540139] URB 0000000000000000 submitted while active
> drivers/usb/core/urb.c:378 usb_submit_urb+0x4c3/0x528
> 
> Let's improve the current situation, by not re-submiting the urb if
> we are stopping the status event. Also process the queued work
> (if any) during stop.
> 
> CPU 0					CPU 1
> ===== 					=====
> uvc_status_complete()
> 					uvc_status_stop()
> 					uvc_status_start()
> uvc_ctrl_status_event_work() -> FAIL
> 
> Hopefully, with the usb layer protection this should be enough to cover
> all the cases.
> 
> Cc: stable@vger.kernel.org
> Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
> Reviewed-by: Yunke Cao <yunkec@chromium.org>
> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> ---
> uvc: Fix race condition on uvc
> 
> Make sure that all the async work is finished when we stop the status urb.
> 
> To: Yunke Cao <yunkec@chromium.org>
> To: Sergey Senozhatsky <senozhatsky@chromium.org>
> To: Max Staudt <mstaudt@google.com>
> To: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> To: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: linux-media@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> Changes in v4:
> - Replace bool with atomic_t to avoid compiler reordering
> - First complete the async work and then kill the urb to avoid race (Thanks Laurent!)
> - Link to v3: https://lore.kernel.org/r/20221212-uvc-race-v3-0-954efc752c9a@chromium.org
> 
> Changes in v3:
> - Remove the patch for dev->status, makes more sense in another series, and makes
>   the zero day less nervous.
> - Update reviewed-by (thanks Yunke!).
> - Link to v2: https://lore.kernel.org/r/20221212-uvc-race-v2-0-54496cc3b8ab@chromium.org
> 
> Changes in v2:
> - Add a patch for not kalloc dev->status
> - Redo the logic mechanism, so it also works with suspend (Thanks Yunke!)
> - Link to v1: https://lore.kernel.org/r/20221212-uvc-race-v1-0-c52e1783c31d@chromium.org
> ---
>  drivers/media/usb/uvc/uvc_ctrl.c   | 3 +++
>  drivers/media/usb/uvc/uvc_status.c | 6 ++++++
>  drivers/media/usb/uvc/uvcvideo.h   | 1 +
>  3 files changed, 10 insertions(+)
> 
> diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> index c95a2229f4fa..1be6897a7d6d 100644
> --- a/drivers/media/usb/uvc/uvc_ctrl.c
> +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> @@ -1442,6 +1442,9 @@ static void uvc_ctrl_status_event_work(struct work_struct *work)
>  
>  	uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
>  
> +	if (atomic_read(&dev->flush_status))
> +		return;
> +
>  	/* Resubmit the URB. */
>  	w->urb->interval = dev->int_ep->desc.bInterval;
>  	ret = usb_submit_urb(w->urb, GFP_KERNEL);
> diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c
> index 7518ffce22ed..4a95850cdc1b 100644
> --- a/drivers/media/usb/uvc/uvc_status.c
> +++ b/drivers/media/usb/uvc/uvc_status.c
> @@ -304,10 +304,16 @@ int uvc_status_start(struct uvc_device *dev, gfp_t flags)
>  	if (dev->int_urb == NULL)
>  		return 0;
>  
> +	atomic_set(&dev->flush_status, 0);
>  	return usb_submit_urb(dev->int_urb, flags);
>  }
>  
>  void uvc_status_stop(struct uvc_device *dev)
>  {
> +	struct uvc_ctrl_work *w = &dev->async_ctrl;
> +
> +	atomic_set(&dev->flush_status, 1);

Note that atomic_read() and atomic_set() do no imply any memory barrier
on most architectures, as far as I can tell. They essentially become
READ_ONCE() and WRITE_ONCE() calls, which guarantee that the compiler
will not merge or split loads or stores, or reorder them with respect to
load and stores to the *same* memory location, but nothing else. I think
you need to add proper barriers, and you can then probably also drop
usage of atomic_t.

> +	if (cancel_work_sync(&w->work))
> +		uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
>  	usb_kill_urb(dev->int_urb);

This should get rid of the main race (possibly save the barrier issue),
but it's not the most efficient option, and can still be problematic.
Consider the following case:

CPU0							CPU1
----							----

void uvc_status_stop(struct uvc_device *dev)		uvc_ctrl_status_event_async()
{							{
	...
	atomic_set(&dev->flush_status, 1);			...
	if (cancel_work_sync())
		...
								schedule_work();
	usb_kill_urb();					}
}

The usb_kill_urb() call ensures that uvc_ctrl_status_event_async()
completes before uvc_status_stop() returns, but there will still be work
scheduled in that case. uvc_ctrl_status_event_work() will be run later,
and as flush_status is set to 1, the function will not resubmit the URB.
That fixes the main race, but leaves the asynchronous control status
event handling for after uvc_status_stop() returns, which isn't great.

Now, if we consider that uvc_status_start() could be called shortly
after uvc_status_stop(), we may get in a situation where
uvc_status_start() will reset flush_status to 0 before
uvc_ctrl_status_event_async() runs. Both uvc_ctrl_status_event_async()
and uvc_status_start() will thus submit the same URB.

You can't fix this by first killing the URB and then cancelling the
work, as there would then be a risk that uvc_ctrl_status_event_work()
would be running in parallel, going past the flush_status check before
flush_status gets set to 1 in uvc_status_stop(), and submitting the URB
after usb_kill_urb() is called.

I think a good fix would be to check flush_status in
uvc_ctrl_status_event_async() and avoid the schedule_work() call in that
case. You could then also move the atomic_set(..., 0) call from
uvc_status_start() to the end of uvc_status_stop() (again with proper
barriers).

Could you please check the memory barriers and test the above proposal
(after double-checking it of course, I may have missed something) ?

>  }
> diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
> index df93db259312..1274691f157f 100644
> --- a/drivers/media/usb/uvc/uvcvideo.h
> +++ b/drivers/media/usb/uvc/uvcvideo.h
> @@ -560,6 +560,7 @@ struct uvc_device {
>  	struct usb_host_endpoint *int_ep;
>  	struct urb *int_urb;
>  	u8 *status;
> +	atomic_t flush_status;
>  	struct input_dev *input;
>  	char input_phys[64];
>  

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] media: uvcvideo: Fix race condition with usb_kill_urb
  2022-12-27 14:24 ` Laurent Pinchart
@ 2022-12-27 14:37   ` Laurent Pinchart
  2023-01-02 12:45     ` Ricardo Ribalda
  0 siblings, 1 reply; 6+ messages in thread
From: Laurent Pinchart @ 2022-12-27 14:37 UTC (permalink / raw)
  To: Ricardo Ribalda
  Cc: Sergey Senozhatsky, Mauro Carvalho Chehab, Max Staudt,
	linux-media, stable, Yunke Cao, linux-kernel

On Tue, Dec 27, 2022 at 04:25:01PM +0200, Laurent Pinchart wrote:
> Hi Ricardo,
> 
> Thank you for the patch.
> 
> On Wed, Dec 14, 2022 at 05:31:55PM +0100, Ricardo Ribalda wrote:
> > usb_kill_urb warranties that all the handlers are finished when it
> > returns, but does not protect against threads that might be handling
> > asynchronously the urb.
> > 
> > For UVC, the function uvc_ctrl_status_event_async() takes care of
> > control changes asynchronously.
> > 
> >  If the code is executed in the following order:
> > 
> > CPU 0					CPU 1
> > ===== 					=====
> > uvc_status_complete()
> > 					uvc_status_stop()
> > uvc_ctrl_status_event_work()
> > 					uvc_status_start() -> FAIL
> > 
> > Then uvc_status_start will keep failing and this error will be shown:
> > 
> > <4>[    5.540139] URB 0000000000000000 submitted while active
> > drivers/usb/core/urb.c:378 usb_submit_urb+0x4c3/0x528
> > 
> > Let's improve the current situation, by not re-submiting the urb if
> > we are stopping the status event. Also process the queued work
> > (if any) during stop.
> > 
> > CPU 0					CPU 1
> > ===== 					=====
> > uvc_status_complete()
> > 					uvc_status_stop()
> > 					uvc_status_start()
> > uvc_ctrl_status_event_work() -> FAIL
> > 
> > Hopefully, with the usb layer protection this should be enough to cover
> > all the cases.
> > 
> > Cc: stable@vger.kernel.org
> > Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
> > Reviewed-by: Yunke Cao <yunkec@chromium.org>
> > Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> > ---
> > uvc: Fix race condition on uvc
> > 
> > Make sure that all the async work is finished when we stop the status urb.
> > 
> > To: Yunke Cao <yunkec@chromium.org>
> > To: Sergey Senozhatsky <senozhatsky@chromium.org>
> > To: Max Staudt <mstaudt@google.com>
> > To: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> > To: Mauro Carvalho Chehab <mchehab@kernel.org>
> > Cc: linux-media@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > ---
> > Changes in v4:
> > - Replace bool with atomic_t to avoid compiler reordering
> > - First complete the async work and then kill the urb to avoid race (Thanks Laurent!)
> > - Link to v3: https://lore.kernel.org/r/20221212-uvc-race-v3-0-954efc752c9a@chromium.org
> > 
> > Changes in v3:
> > - Remove the patch for dev->status, makes more sense in another series, and makes
> >   the zero day less nervous.
> > - Update reviewed-by (thanks Yunke!).
> > - Link to v2: https://lore.kernel.org/r/20221212-uvc-race-v2-0-54496cc3b8ab@chromium.org
> > 
> > Changes in v2:
> > - Add a patch for not kalloc dev->status
> > - Redo the logic mechanism, so it also works with suspend (Thanks Yunke!)
> > - Link to v1: https://lore.kernel.org/r/20221212-uvc-race-v1-0-c52e1783c31d@chromium.org
> > ---
> >  drivers/media/usb/uvc/uvc_ctrl.c   | 3 +++
> >  drivers/media/usb/uvc/uvc_status.c | 6 ++++++
> >  drivers/media/usb/uvc/uvcvideo.h   | 1 +
> >  3 files changed, 10 insertions(+)
> > 
> > diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> > index c95a2229f4fa..1be6897a7d6d 100644
> > --- a/drivers/media/usb/uvc/uvc_ctrl.c
> > +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> > @@ -1442,6 +1442,9 @@ static void uvc_ctrl_status_event_work(struct work_struct *work)
> >  
> >  	uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
> >  
> > +	if (atomic_read(&dev->flush_status))
> > +		return;
> > +
> >  	/* Resubmit the URB. */
> >  	w->urb->interval = dev->int_ep->desc.bInterval;
> >  	ret = usb_submit_urb(w->urb, GFP_KERNEL);
> > diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c
> > index 7518ffce22ed..4a95850cdc1b 100644
> > --- a/drivers/media/usb/uvc/uvc_status.c
> > +++ b/drivers/media/usb/uvc/uvc_status.c
> > @@ -304,10 +304,16 @@ int uvc_status_start(struct uvc_device *dev, gfp_t flags)
> >  	if (dev->int_urb == NULL)
> >  		return 0;
> >  
> > +	atomic_set(&dev->flush_status, 0);
> >  	return usb_submit_urb(dev->int_urb, flags);
> >  }
> >  
> >  void uvc_status_stop(struct uvc_device *dev)
> >  {
> > +	struct uvc_ctrl_work *w = &dev->async_ctrl;
> > +
> > +	atomic_set(&dev->flush_status, 1);
> 
> Note that atomic_read() and atomic_set() do no imply any memory barrier
> on most architectures, as far as I can tell. They essentially become
> READ_ONCE() and WRITE_ONCE() calls, which guarantee that the compiler
> will not merge or split loads or stores, or reorder them with respect to
> load and stores to the *same* memory location, but nothing else. I think
> you need to add proper barriers, and you can then probably also drop
> usage of atomic_t.
> 
> > +	if (cancel_work_sync(&w->work))
> > +		uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
> >  	usb_kill_urb(dev->int_urb);
> 
> This should get rid of the main race (possibly save the barrier issue),
> but it's not the most efficient option, and can still be problematic.
> Consider the following case:
> 
> CPU0							CPU1
> ----							----
> 
> void uvc_status_stop(struct uvc_device *dev)		uvc_ctrl_status_event_async()
> {							{
> 	...
> 	atomic_set(&dev->flush_status, 1);			...
> 	if (cancel_work_sync())
> 		...
> 								schedule_work();
> 	usb_kill_urb();					}
> }
> 
> The usb_kill_urb() call ensures that uvc_ctrl_status_event_async()
> completes before uvc_status_stop() returns, but there will still be work
> scheduled in that case. uvc_ctrl_status_event_work() will be run later,
> and as flush_status is set to 1, the function will not resubmit the URB.
> That fixes the main race, but leaves the asynchronous control status
> event handling for after uvc_status_stop() returns, which isn't great.
> 
> Now, if we consider that uvc_status_start() could be called shortly
> after uvc_status_stop(), we may get in a situation where
> uvc_status_start() will reset flush_status to 0 before
> uvc_ctrl_status_event_async() runs. Both uvc_ctrl_status_event_async()
> and uvc_status_start() will thus submit the same URB.
> 
> You can't fix this by first killing the URB and then cancelling the
> work, as there would then be a risk that uvc_ctrl_status_event_work()
> would be running in parallel, going past the flush_status check before
> flush_status gets set to 1 in uvc_status_stop(), and submitting the URB
> after usb_kill_urb() is called.
> 
> I think a good fix would be to check flush_status in
> uvc_ctrl_status_event_async() and avoid the schedule_work() call in that
> case. You could then also move the atomic_set(..., 0) call from
> uvc_status_start() to the end of uvc_status_stop() (again with proper
> barriers).

Also, as all of this is tricky, comments in appropriate places in the
code would be very helpful to avoid breaking things later.

> Could you please check the memory barriers and test the above proposal
> (after double-checking it of course, I may have missed something) ?
> 
> >  }
> > diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
> > index df93db259312..1274691f157f 100644
> > --- a/drivers/media/usb/uvc/uvcvideo.h
> > +++ b/drivers/media/usb/uvc/uvcvideo.h
> > @@ -560,6 +560,7 @@ struct uvc_device {
> >  	struct usb_host_endpoint *int_ep;
> >  	struct urb *int_urb;
> >  	u8 *status;
> > +	atomic_t flush_status;
> >  	struct input_dev *input;
> >  	char input_phys[64];
> >  

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] media: uvcvideo: Fix race condition with usb_kill_urb
  2022-12-27 14:37   ` Laurent Pinchart
@ 2023-01-02 12:45     ` Ricardo Ribalda
  2023-01-02 12:57       ` Laurent Pinchart
  0 siblings, 1 reply; 6+ messages in thread
From: Ricardo Ribalda @ 2023-01-02 12:45 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Sergey Senozhatsky, Mauro Carvalho Chehab, Max Staudt,
	linux-media, stable, Yunke Cao, linux-kernel

Hi Laurent

Thanks for your review!

On Tue, 27 Dec 2022 at 15:37, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> On Tue, Dec 27, 2022 at 04:25:01PM +0200, Laurent Pinchart wrote:
> > Hi Ricardo,
> >
> > Thank you for the patch.
> >
> > On Wed, Dec 14, 2022 at 05:31:55PM +0100, Ricardo Ribalda wrote:
> > > usb_kill_urb warranties that all the handlers are finished when it
> > > returns, but does not protect against threads that might be handling
> > > asynchronously the urb.
> > >
> > > For UVC, the function uvc_ctrl_status_event_async() takes care of
> > > control changes asynchronously.
> > >
> > >  If the code is executed in the following order:
> > >
> > > CPU 0                                       CPU 1
> > > =====                                       =====
> > > uvc_status_complete()
> > >                                     uvc_status_stop()
> > > uvc_ctrl_status_event_work()
> > >                                     uvc_status_start() -> FAIL
> > >
> > > Then uvc_status_start will keep failing and this error will be shown:
> > >
> > > <4>[    5.540139] URB 0000000000000000 submitted while active
> > > drivers/usb/core/urb.c:378 usb_submit_urb+0x4c3/0x528
> > >
> > > Let's improve the current situation, by not re-submiting the urb if
> > > we are stopping the status event. Also process the queued work
> > > (if any) during stop.
> > >
> > > CPU 0                                       CPU 1
> > > =====                                       =====
> > > uvc_status_complete()
> > >                                     uvc_status_stop()
> > >                                     uvc_status_start()
> > > uvc_ctrl_status_event_work() -> FAIL
> > >
> > > Hopefully, with the usb layer protection this should be enough to cover
> > > all the cases.
> > >
> > > Cc: stable@vger.kernel.org
> > > Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
> > > Reviewed-by: Yunke Cao <yunkec@chromium.org>
> > > Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> > > ---
> > > uvc: Fix race condition on uvc
> > >
> > > Make sure that all the async work is finished when we stop the status urb.
> > >
> > > To: Yunke Cao <yunkec@chromium.org>
> > > To: Sergey Senozhatsky <senozhatsky@chromium.org>
> > > To: Max Staudt <mstaudt@google.com>
> > > To: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> > > To: Mauro Carvalho Chehab <mchehab@kernel.org>
> > > Cc: linux-media@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org
> > > ---
> > > Changes in v4:
> > > - Replace bool with atomic_t to avoid compiler reordering
> > > - First complete the async work and then kill the urb to avoid race (Thanks Laurent!)
> > > - Link to v3: https://lore.kernel.org/r/20221212-uvc-race-v3-0-954efc752c9a@chromium.org
> > >
> > > Changes in v3:
> > > - Remove the patch for dev->status, makes more sense in another series, and makes
> > >   the zero day less nervous.
> > > - Update reviewed-by (thanks Yunke!).
> > > - Link to v2: https://lore.kernel.org/r/20221212-uvc-race-v2-0-54496cc3b8ab@chromium.org
> > >
> > > Changes in v2:
> > > - Add a patch for not kalloc dev->status
> > > - Redo the logic mechanism, so it also works with suspend (Thanks Yunke!)
> > > - Link to v1: https://lore.kernel.org/r/20221212-uvc-race-v1-0-c52e1783c31d@chromium.org
> > > ---
> > >  drivers/media/usb/uvc/uvc_ctrl.c   | 3 +++
> > >  drivers/media/usb/uvc/uvc_status.c | 6 ++++++
> > >  drivers/media/usb/uvc/uvcvideo.h   | 1 +
> > >  3 files changed, 10 insertions(+)
> > >
> > > diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> > > index c95a2229f4fa..1be6897a7d6d 100644
> > > --- a/drivers/media/usb/uvc/uvc_ctrl.c
> > > +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> > > @@ -1442,6 +1442,9 @@ static void uvc_ctrl_status_event_work(struct work_struct *work)
> > >
> > >     uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
> > >
> > > +   if (atomic_read(&dev->flush_status))
> > > +           return;
> > > +
> > >     /* Resubmit the URB. */
> > >     w->urb->interval = dev->int_ep->desc.bInterval;
> > >     ret = usb_submit_urb(w->urb, GFP_KERNEL);
> > > diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c
> > > index 7518ffce22ed..4a95850cdc1b 100644
> > > --- a/drivers/media/usb/uvc/uvc_status.c
> > > +++ b/drivers/media/usb/uvc/uvc_status.c
> > > @@ -304,10 +304,16 @@ int uvc_status_start(struct uvc_device *dev, gfp_t flags)
> > >     if (dev->int_urb == NULL)
> > >             return 0;
> > >
> > > +   atomic_set(&dev->flush_status, 0);
> > >     return usb_submit_urb(dev->int_urb, flags);
> > >  }
> > >
> > >  void uvc_status_stop(struct uvc_device *dev)
> > >  {
> > > +   struct uvc_ctrl_work *w = &dev->async_ctrl;
> > > +
> > > +   atomic_set(&dev->flush_status, 1);
> >
> > Note that atomic_read() and atomic_set() do no imply any memory barrier
> > on most architectures, as far as I can tell. They essentially become
> > READ_ONCE() and WRITE_ONCE() calls, which guarantee that the compiler
> > will not merge or split loads or stores, or reorder them with respect to
> > load and stores to the *same* memory location, but nothing else. I think
> > you need to add proper barriers, and you can then probably also drop
> > usage of atomic_t.

You are absolutely right! Only a subset of atomic implies memory barriers.

Found this doc particularly good:
https://www.kernel.org/doc/html/v5.10/core-api/atomic_ops.html


> >
> > > +   if (cancel_work_sync(&w->work))
> > > +           uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
> > >     usb_kill_urb(dev->int_urb);
> >
> > This should get rid of the main race (possibly save the barrier issue),
> > but it's not the most efficient option, and can still be problematic.
> > Consider the following case:
> >
> > CPU0                                                  CPU1
> > ----                                                  ----
> >
> > void uvc_status_stop(struct uvc_device *dev)          uvc_ctrl_status_event_async()
> > {                                                     {
> >       ...
> >       atomic_set(&dev->flush_status, 1);                      ...
> >       if (cancel_work_sync())
> >               ...
> >                                                               schedule_work();
> >       usb_kill_urb();                                 }
> > }
> >
> > The usb_kill_urb() call ensures that uvc_ctrl_status_event_async()
> > completes before uvc_status_stop() returns, but there will still be work
> > scheduled in that case. uvc_ctrl_status_event_work() will be run later,
> > and as flush_status is set to 1, the function will not resubmit the URB.
> > That fixes the main race, but leaves the asynchronous control status
> > event handling for after uvc_status_stop() returns, which isn't great.
> >
> > Now, if we consider that uvc_status_start() could be called shortly
> > after uvc_status_stop(), we may get in a situation where
> > uvc_status_start() will reset flush_status to 0 before
> > uvc_ctrl_status_event_async() runs. Both uvc_ctrl_status_event_async()
> > and uvc_status_start() will thus submit the same URB.
> >
> > You can't fix this by first killing the URB and then cancelling the
> > work, as there would then be a risk that uvc_ctrl_status_event_work()
> > would be running in parallel, going past the flush_status check before
> > flush_status gets set to 1 in uvc_status_stop(), and submitting the URB
> > after usb_kill_urb() is called.
> >
> > I think a good fix would be to check flush_status in
> > uvc_ctrl_status_event_async() and avoid the schedule_work() call in that
> > case.

If we do that, then we will be losing an event. I would rather call
cancel_work_sync() again after usb_kill_urb().


> >You could then also move the atomic_set(..., 0) call from
> > uvc_status_start() to the end of uvc_status_stop() (again with proper
> > barriers).

Will do that, it is more "elegant".

>
> Also, as all of this is tricky, comments in appropriate places in the
> code would be very helpful to avoid breaking things later.
>
> > Could you please check the memory barriers and test the above proposal
> > (after double-checking it of course, I may have missed something) ?

> >
> > >  }
> > > diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
> > > index df93db259312..1274691f157f 100644
> > > --- a/drivers/media/usb/uvc/uvcvideo.h
> > > +++ b/drivers/media/usb/uvc/uvcvideo.h
> > > @@ -560,6 +560,7 @@ struct uvc_device {
> > >     struct usb_host_endpoint *int_ep;
> > >     struct urb *int_urb;
> > >     u8 *status;
> > > +   atomic_t flush_status;
> > >     struct input_dev *input;
> > >     char input_phys[64];
> > >
>
> --
> Regards,
>
> Laurent Pinchart



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] media: uvcvideo: Fix race condition with usb_kill_urb
  2023-01-02 12:45     ` Ricardo Ribalda
@ 2023-01-02 12:57       ` Laurent Pinchart
  2023-01-02 13:03         ` Ricardo Ribalda
  0 siblings, 1 reply; 6+ messages in thread
From: Laurent Pinchart @ 2023-01-02 12:57 UTC (permalink / raw)
  To: Ricardo Ribalda
  Cc: Sergey Senozhatsky, Mauro Carvalho Chehab, Max Staudt,
	linux-media, stable, Yunke Cao, linux-kernel

Hi Ricardo,

On Mon, Jan 02, 2023 at 01:45:06PM +0100, Ricardo Ribalda wrote:
> On Tue, 27 Dec 2022 at 15:37, Laurent Pinchart wrote:
> > On Tue, Dec 27, 2022 at 04:25:01PM +0200, Laurent Pinchart wrote:
> > > On Wed, Dec 14, 2022 at 05:31:55PM +0100, Ricardo Ribalda wrote:
> > > > usb_kill_urb warranties that all the handlers are finished when it
> > > > returns, but does not protect against threads that might be handling
> > > > asynchronously the urb.
> > > >
> > > > For UVC, the function uvc_ctrl_status_event_async() takes care of
> > > > control changes asynchronously.
> > > >
> > > >  If the code is executed in the following order:
> > > >
> > > > CPU 0                                       CPU 1
> > > > =====                                       =====
> > > > uvc_status_complete()
> > > >                                     uvc_status_stop()
> > > > uvc_ctrl_status_event_work()
> > > >                                     uvc_status_start() -> FAIL
> > > >
> > > > Then uvc_status_start will keep failing and this error will be shown:
> > > >
> > > > <4>[    5.540139] URB 0000000000000000 submitted while active
> > > > drivers/usb/core/urb.c:378 usb_submit_urb+0x4c3/0x528
> > > >
> > > > Let's improve the current situation, by not re-submiting the urb if
> > > > we are stopping the status event. Also process the queued work
> > > > (if any) during stop.
> > > >
> > > > CPU 0                                       CPU 1
> > > > =====                                       =====
> > > > uvc_status_complete()
> > > >                                     uvc_status_stop()
> > > >                                     uvc_status_start()
> > > > uvc_ctrl_status_event_work() -> FAIL
> > > >
> > > > Hopefully, with the usb layer protection this should be enough to cover
> > > > all the cases.
> > > >
> > > > Cc: stable@vger.kernel.org
> > > > Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
> > > > Reviewed-by: Yunke Cao <yunkec@chromium.org>
> > > > Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> > > > ---
> > > > uvc: Fix race condition on uvc
> > > >
> > > > Make sure that all the async work is finished when we stop the status urb.
> > > >
> > > > To: Yunke Cao <yunkec@chromium.org>
> > > > To: Sergey Senozhatsky <senozhatsky@chromium.org>
> > > > To: Max Staudt <mstaudt@google.com>
> > > > To: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> > > > To: Mauro Carvalho Chehab <mchehab@kernel.org>
> > > > Cc: linux-media@vger.kernel.org
> > > > Cc: linux-kernel@vger.kernel.org
> > > > ---
> > > > Changes in v4:
> > > > - Replace bool with atomic_t to avoid compiler reordering
> > > > - First complete the async work and then kill the urb to avoid race (Thanks Laurent!)
> > > > - Link to v3: https://lore.kernel.org/r/20221212-uvc-race-v3-0-954efc752c9a@chromium.org
> > > >
> > > > Changes in v3:
> > > > - Remove the patch for dev->status, makes more sense in another series, and makes
> > > >   the zero day less nervous.
> > > > - Update reviewed-by (thanks Yunke!).
> > > > - Link to v2: https://lore.kernel.org/r/20221212-uvc-race-v2-0-54496cc3b8ab@chromium.org
> > > >
> > > > Changes in v2:
> > > > - Add a patch for not kalloc dev->status
> > > > - Redo the logic mechanism, so it also works with suspend (Thanks Yunke!)
> > > > - Link to v1: https://lore.kernel.org/r/20221212-uvc-race-v1-0-c52e1783c31d@chromium.org
> > > > ---
> > > >  drivers/media/usb/uvc/uvc_ctrl.c   | 3 +++
> > > >  drivers/media/usb/uvc/uvc_status.c | 6 ++++++
> > > >  drivers/media/usb/uvc/uvcvideo.h   | 1 +
> > > >  3 files changed, 10 insertions(+)
> > > >
> > > > diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> > > > index c95a2229f4fa..1be6897a7d6d 100644
> > > > --- a/drivers/media/usb/uvc/uvc_ctrl.c
> > > > +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> > > > @@ -1442,6 +1442,9 @@ static void uvc_ctrl_status_event_work(struct work_struct *work)
> > > >
> > > >     uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
> > > >
> > > > +   if (atomic_read(&dev->flush_status))
> > > > +           return;
> > > > +
> > > >     /* Resubmit the URB. */
> > > >     w->urb->interval = dev->int_ep->desc.bInterval;
> > > >     ret = usb_submit_urb(w->urb, GFP_KERNEL);
> > > > diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c
> > > > index 7518ffce22ed..4a95850cdc1b 100644
> > > > --- a/drivers/media/usb/uvc/uvc_status.c
> > > > +++ b/drivers/media/usb/uvc/uvc_status.c
> > > > @@ -304,10 +304,16 @@ int uvc_status_start(struct uvc_device *dev, gfp_t flags)
> > > >     if (dev->int_urb == NULL)
> > > >             return 0;
> > > >
> > > > +   atomic_set(&dev->flush_status, 0);
> > > >     return usb_submit_urb(dev->int_urb, flags);
> > > >  }
> > > >
> > > >  void uvc_status_stop(struct uvc_device *dev)
> > > >  {
> > > > +   struct uvc_ctrl_work *w = &dev->async_ctrl;
> > > > +
> > > > +   atomic_set(&dev->flush_status, 1);
> > >
> > > Note that atomic_read() and atomic_set() do no imply any memory barrier
> > > on most architectures, as far as I can tell. They essentially become
> > > READ_ONCE() and WRITE_ONCE() calls, which guarantee that the compiler
> > > will not merge or split loads or stores, or reorder them with respect to
> > > load and stores to the *same* memory location, but nothing else. I think
> > > you need to add proper barriers, and you can then probably also drop
> > > usage of atomic_t.
> 
> You are absolutely right! Only a subset of atomic implies memory barriers.
> 
> Found this doc particularly good:
> https://www.kernel.org/doc/html/v5.10/core-api/atomic_ops.html
> 
> 
> > >
> > > > +   if (cancel_work_sync(&w->work))
> > > > +           uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
> > > >     usb_kill_urb(dev->int_urb);
> > >
> > > This should get rid of the main race (possibly save the barrier issue),
> > > but it's not the most efficient option, and can still be problematic.
> > > Consider the following case:
> > >
> > > CPU0                                                  CPU1
> > > ----                                                  ----
> > >
> > > void uvc_status_stop(struct uvc_device *dev)          uvc_ctrl_status_event_async()
> > > {                                                     {
> > >       ...
> > >       atomic_set(&dev->flush_status, 1);                      ...
> > >       if (cancel_work_sync())
> > >               ...
> > >                                                               schedule_work();
> > >       usb_kill_urb();                                 }
> > > }
> > >
> > > The usb_kill_urb() call ensures that uvc_ctrl_status_event_async()
> > > completes before uvc_status_stop() returns, but there will still be work
> > > scheduled in that case. uvc_ctrl_status_event_work() will be run later,
> > > and as flush_status is set to 1, the function will not resubmit the URB.
> > > That fixes the main race, but leaves the asynchronous control status
> > > event handling for after uvc_status_stop() returns, which isn't great.
> > >
> > > Now, if we consider that uvc_status_start() could be called shortly
> > > after uvc_status_stop(), we may get in a situation where
> > > uvc_status_start() will reset flush_status to 0 before
> > > uvc_ctrl_status_event_async() runs. Both uvc_ctrl_status_event_async()
> > > and uvc_status_start() will thus submit the same URB.
> > >
> > > You can't fix this by first killing the URB and then cancelling the
> > > work, as there would then be a risk that uvc_ctrl_status_event_work()
> > > would be running in parallel, going past the flush_status check before
> > > flush_status gets set to 1 in uvc_status_stop(), and submitting the URB
> > > after usb_kill_urb() is called.
> > >
> > > I think a good fix would be to check flush_status in
> > > uvc_ctrl_status_event_async() and avoid the schedule_work() call in that
> > > case.
> 
> If we do that, then we will be losing an event. I would rather call
> cancel_work_sync() again after usb_kill_urb().

I've thought about that, but I don't think losing the event is an issue.
We're stopping event handling in the first place, there's no
synchronization guarantee with the camera. For all we know the camera
could have generate the event right after we stop instead of right
before. There's of course no reason to drop the event for the sake of
it, but if it leads to simpler code, there's no reason to process it
either.

> > > You could then also move the atomic_set(..., 0) call from
> > > uvc_status_start() to the end of uvc_status_stop() (again with proper
> > > barriers).
> 
> Will do that, it is more "elegant".
> 
> > Also, as all of this is tricky, comments in appropriate places in the
> > code would be very helpful to avoid breaking things later.
> >
> > > Could you please check the memory barriers and test the above proposal
> > > (after double-checking it of course, I may have missed something) ?
> > >
> > > >  }
> > > > diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
> > > > index df93db259312..1274691f157f 100644
> > > > --- a/drivers/media/usb/uvc/uvcvideo.h
> > > > +++ b/drivers/media/usb/uvc/uvcvideo.h
> > > > @@ -560,6 +560,7 @@ struct uvc_device {
> > > >     struct usb_host_endpoint *int_ep;
> > > >     struct urb *int_urb;
> > > >     u8 *status;
> > > > +   atomic_t flush_status;
> > > >     struct input_dev *input;
> > > >     char input_phys[64];
> > > >

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] media: uvcvideo: Fix race condition with usb_kill_urb
  2023-01-02 12:57       ` Laurent Pinchart
@ 2023-01-02 13:03         ` Ricardo Ribalda
  0 siblings, 0 replies; 6+ messages in thread
From: Ricardo Ribalda @ 2023-01-02 13:03 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Sergey Senozhatsky, Mauro Carvalho Chehab, Max Staudt,
	linux-media, stable, Yunke Cao, linux-kernel

On Mon, 2 Jan 2023 at 13:57, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Ricardo,
>
> On Mon, Jan 02, 2023 at 01:45:06PM +0100, Ricardo Ribalda wrote:
> > On Tue, 27 Dec 2022 at 15:37, Laurent Pinchart wrote:
> > > On Tue, Dec 27, 2022 at 04:25:01PM +0200, Laurent Pinchart wrote:
> > > > On Wed, Dec 14, 2022 at 05:31:55PM +0100, Ricardo Ribalda wrote:
> > > > > usb_kill_urb warranties that all the handlers are finished when it
> > > > > returns, but does not protect against threads that might be handling
> > > > > asynchronously the urb.
> > > > >
> > > > > For UVC, the function uvc_ctrl_status_event_async() takes care of
> > > > > control changes asynchronously.
> > > > >
> > > > >  If the code is executed in the following order:
> > > > >
> > > > > CPU 0                                       CPU 1
> > > > > =====                                       =====
> > > > > uvc_status_complete()
> > > > >                                     uvc_status_stop()
> > > > > uvc_ctrl_status_event_work()
> > > > >                                     uvc_status_start() -> FAIL
> > > > >
> > > > > Then uvc_status_start will keep failing and this error will be shown:
> > > > >
> > > > > <4>[    5.540139] URB 0000000000000000 submitted while active
> > > > > drivers/usb/core/urb.c:378 usb_submit_urb+0x4c3/0x528
> > > > >
> > > > > Let's improve the current situation, by not re-submiting the urb if
> > > > > we are stopping the status event. Also process the queued work
> > > > > (if any) during stop.
> > > > >
> > > > > CPU 0                                       CPU 1
> > > > > =====                                       =====
> > > > > uvc_status_complete()
> > > > >                                     uvc_status_stop()
> > > > >                                     uvc_status_start()
> > > > > uvc_ctrl_status_event_work() -> FAIL
> > > > >
> > > > > Hopefully, with the usb layer protection this should be enough to cover
> > > > > all the cases.
> > > > >
> > > > > Cc: stable@vger.kernel.org
> > > > > Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
> > > > > Reviewed-by: Yunke Cao <yunkec@chromium.org>
> > > > > Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> > > > > ---
> > > > > uvc: Fix race condition on uvc
> > > > >
> > > > > Make sure that all the async work is finished when we stop the status urb.
> > > > >
> > > > > To: Yunke Cao <yunkec@chromium.org>
> > > > > To: Sergey Senozhatsky <senozhatsky@chromium.org>
> > > > > To: Max Staudt <mstaudt@google.com>
> > > > > To: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> > > > > To: Mauro Carvalho Chehab <mchehab@kernel.org>
> > > > > Cc: linux-media@vger.kernel.org
> > > > > Cc: linux-kernel@vger.kernel.org
> > > > > ---
> > > > > Changes in v4:
> > > > > - Replace bool with atomic_t to avoid compiler reordering
> > > > > - First complete the async work and then kill the urb to avoid race (Thanks Laurent!)
> > > > > - Link to v3: https://lore.kernel.org/r/20221212-uvc-race-v3-0-954efc752c9a@chromium.org
> > > > >
> > > > > Changes in v3:
> > > > > - Remove the patch for dev->status, makes more sense in another series, and makes
> > > > >   the zero day less nervous.
> > > > > - Update reviewed-by (thanks Yunke!).
> > > > > - Link to v2: https://lore.kernel.org/r/20221212-uvc-race-v2-0-54496cc3b8ab@chromium.org
> > > > >
> > > > > Changes in v2:
> > > > > - Add a patch for not kalloc dev->status
> > > > > - Redo the logic mechanism, so it also works with suspend (Thanks Yunke!)
> > > > > - Link to v1: https://lore.kernel.org/r/20221212-uvc-race-v1-0-c52e1783c31d@chromium.org
> > > > > ---
> > > > >  drivers/media/usb/uvc/uvc_ctrl.c   | 3 +++
> > > > >  drivers/media/usb/uvc/uvc_status.c | 6 ++++++
> > > > >  drivers/media/usb/uvc/uvcvideo.h   | 1 +
> > > > >  3 files changed, 10 insertions(+)
> > > > >
> > > > > diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> > > > > index c95a2229f4fa..1be6897a7d6d 100644
> > > > > --- a/drivers/media/usb/uvc/uvc_ctrl.c
> > > > > +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> > > > > @@ -1442,6 +1442,9 @@ static void uvc_ctrl_status_event_work(struct work_struct *work)
> > > > >
> > > > >     uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
> > > > >
> > > > > +   if (atomic_read(&dev->flush_status))
> > > > > +           return;
> > > > > +
> > > > >     /* Resubmit the URB. */
> > > > >     w->urb->interval = dev->int_ep->desc.bInterval;
> > > > >     ret = usb_submit_urb(w->urb, GFP_KERNEL);
> > > > > diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c
> > > > > index 7518ffce22ed..4a95850cdc1b 100644
> > > > > --- a/drivers/media/usb/uvc/uvc_status.c
> > > > > +++ b/drivers/media/usb/uvc/uvc_status.c
> > > > > @@ -304,10 +304,16 @@ int uvc_status_start(struct uvc_device *dev, gfp_t flags)
> > > > >     if (dev->int_urb == NULL)
> > > > >             return 0;
> > > > >
> > > > > +   atomic_set(&dev->flush_status, 0);
> > > > >     return usb_submit_urb(dev->int_urb, flags);
> > > > >  }
> > > > >
> > > > >  void uvc_status_stop(struct uvc_device *dev)
> > > > >  {
> > > > > +   struct uvc_ctrl_work *w = &dev->async_ctrl;
> > > > > +
> > > > > +   atomic_set(&dev->flush_status, 1);
> > > >
> > > > Note that atomic_read() and atomic_set() do no imply any memory barrier
> > > > on most architectures, as far as I can tell. They essentially become
> > > > READ_ONCE() and WRITE_ONCE() calls, which guarantee that the compiler
> > > > will not merge or split loads or stores, or reorder them with respect to
> > > > load and stores to the *same* memory location, but nothing else. I think
> > > > you need to add proper barriers, and you can then probably also drop
> > > > usage of atomic_t.
> >
> > You are absolutely right! Only a subset of atomic implies memory barriers.
> >
> > Found this doc particularly good:
> > https://www.kernel.org/doc/html/v5.10/core-api/atomic_ops.html
> >
> >
> > > >
> > > > > +   if (cancel_work_sync(&w->work))
> > > > > +           uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
> > > > >     usb_kill_urb(dev->int_urb);
> > > >
> > > > This should get rid of the main race (possibly save the barrier issue),
> > > > but it's not the most efficient option, and can still be problematic.
> > > > Consider the following case:
> > > >
> > > > CPU0                                                  CPU1
> > > > ----                                                  ----
> > > >
> > > > void uvc_status_stop(struct uvc_device *dev)          uvc_ctrl_status_event_async()
> > > > {                                                     {
> > > >       ...
> > > >       atomic_set(&dev->flush_status, 1);                      ...
> > > >       if (cancel_work_sync())
> > > >               ...
> > > >                                                               schedule_work();
> > > >       usb_kill_urb();                                 }
> > > > }
> > > >
> > > > The usb_kill_urb() call ensures that uvc_ctrl_status_event_async()
> > > > completes before uvc_status_stop() returns, but there will still be work
> > > > scheduled in that case. uvc_ctrl_status_event_work() will be run later,
> > > > and as flush_status is set to 1, the function will not resubmit the URB.
> > > > That fixes the main race, but leaves the asynchronous control status
> > > > event handling for after uvc_status_stop() returns, which isn't great.
> > > >
> > > > Now, if we consider that uvc_status_start() could be called shortly
> > > > after uvc_status_stop(), we may get in a situation where
> > > > uvc_status_start() will reset flush_status to 0 before
> > > > uvc_ctrl_status_event_async() runs. Both uvc_ctrl_status_event_async()
> > > > and uvc_status_start() will thus submit the same URB.
> > > >
> > > > You can't fix this by first killing the URB and then cancelling the
> > > > work, as there would then be a risk that uvc_ctrl_status_event_work()
> > > > would be running in parallel, going past the flush_status check before
> > > > flush_status gets set to 1 in uvc_status_stop(), and submitting the URB
> > > > after usb_kill_urb() is called.
> > > >
> > > > I think a good fix would be to check flush_status in
> > > > uvc_ctrl_status_event_async() and avoid the schedule_work() call in that
> > > > case.
> >
> > If we do that, then we will be losing an event. I would rather call
> > cancel_work_sync() again after usb_kill_urb().
>
> I've thought about that, but I don't think losing the event is an issue.
> We're stopping event handling in the first place, there's no
> synchronization guarantee with the camera. For all we know the camera
> could have generate the event right after we stop instead of right
> before. There's of course no reason to drop the event for the sake of
> it, but if it leads to simpler code, there's no reason to process it
> either.

Please take a look to v5, it does not look complicated, and we have
all the synchronization code in two functions:
uvc_status_stop() and uvc_ctrl_status_event_work(), instead of 3. Plus
we do not lose any received event.

Regards!

>
> > > > You could then also move the atomic_set(..., 0) call from
> > > > uvc_status_start() to the end of uvc_status_stop() (again with proper
> > > > barriers).
> >
> > Will do that, it is more "elegant".
> >
> > > Also, as all of this is tricky, comments in appropriate places in the
> > > code would be very helpful to avoid breaking things later.
> > >
> > > > Could you please check the memory barriers and test the above proposal
> > > > (after double-checking it of course, I may have missed something) ?
> > > >
> > > > >  }
> > > > > diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
> > > > > index df93db259312..1274691f157f 100644
> > > > > --- a/drivers/media/usb/uvc/uvcvideo.h
> > > > > +++ b/drivers/media/usb/uvc/uvcvideo.h
> > > > > @@ -560,6 +560,7 @@ struct uvc_device {
> > > > >     struct usb_host_endpoint *int_ep;
> > > > >     struct urb *int_urb;
> > > > >     u8 *status;
> > > > > +   atomic_t flush_status;
> > > > >     struct input_dev *input;
> > > > >     char input_phys[64];
> > > > >
>
> --
> Regards,
>
> Laurent Pinchart



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-01-02 13:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-14 16:31 [PATCH v4] media: uvcvideo: Fix race condition with usb_kill_urb Ricardo Ribalda
2022-12-27 14:24 ` Laurent Pinchart
2022-12-27 14:37   ` Laurent Pinchart
2023-01-02 12:45     ` Ricardo Ribalda
2023-01-02 12:57       ` Laurent Pinchart
2023-01-02 13:03         ` Ricardo Ribalda

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).