linux-media.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>,
	Sakari Ailus <sakari.ailus@iki.fi>,
	linux-uvc-devel@lists.sourceforge.net, linux-usb@vger.kernel.org,
	linux-media@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/5] media: uvcvideo: Fix race conditions
Date: Mon, 31 Aug 2020 00:36:21 +0300	[thread overview]
Message-ID: <20200830213621.GC6043@pendragon.ideasonboard.com> (raw)
In-Reply-To: <ac2080a1-3b00-ac9e-cd49-d1ee84c6ca25@roeck-us.net>

Hi Guenter,

On Sun, Aug 30, 2020 at 01:48:24PM -0700, Guenter Roeck wrote:
> On 8/30/20 8:58 AM, Laurent Pinchart wrote:
> > On Sun, Aug 30, 2020 at 08:04:38AM -0700, Guenter Roeck wrote:
> >> The uvcvideo code has no lock protection against USB disconnects
> >> while video operations are ongoing. This has resulted in random
> >> error reports, typically pointing to a crash in usb_ifnum_to_if(),
> >> called from usb_hcd_alloc_bandwidth(). A typical traceback is as
> >> follows.
> >>
> >> usb 1-4: USB disconnect, device number 3
> >> BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> >> PGD 0 P4D 0
> >> Oops: 0000 [#1] PREEMPT SMP PTI
> >> CPU: 0 PID: 5633 Comm: V4L2CaptureThre Not tainted 4.19.113-08536-g5d29ca36db06 #1
> >> Hardware name: GOOGLE Edgar, BIOS Google_Edgar.7287.167.156 03/25/2019
> >> RIP: 0010:usb_ifnum_to_if+0x29/0x40
> >> Code: <...>
> >> RSP: 0018:ffffa46f42a47a80 EFLAGS: 00010246
> >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff904a396c9000
> >> RDX: ffff904a39641320 RSI: 0000000000000001 RDI: 0000000000000000
> >> RBP: ffffa46f42a47a80 R08: 0000000000000002 R09: 0000000000000000
> >> R10: 0000000000009975 R11: 0000000000000009 R12: 0000000000000000
> >> R13: ffff904a396b3800 R14: ffff904a39e88000 R15: 0000000000000000
> >> FS: 00007f396448e700(0000) GS:ffff904a3ba00000(0000) knlGS:0000000000000000
> >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> CR2: 0000000000000000 CR3: 000000016cb46000 CR4: 00000000001006f0
> >> Call Trace:
> >>  usb_hcd_alloc_bandwidth+0x1ee/0x30f
> >>  usb_set_interface+0x1a3/0x2b7
> >>  uvc_video_start_transfer+0x29b/0x4b8 [uvcvideo]
> >>  uvc_video_start_streaming+0x91/0xdd [uvcvideo]
> >>  uvc_start_streaming+0x28/0x5d [uvcvideo]
> >>  vb2_start_streaming+0x61/0x143 [videobuf2_common]
> >>  vb2_core_streamon+0xf7/0x10f [videobuf2_common]
> >>  uvc_queue_streamon+0x2e/0x41 [uvcvideo]
> >>  uvc_ioctl_streamon+0x42/0x5c [uvcvideo]
> >>  __video_do_ioctl+0x33d/0x42a
> >>  video_usercopy+0x34e/0x5ff
> >>  ? video_ioctl2+0x16/0x16
> >>  v4l2_ioctl+0x46/0x53
> >>  do_vfs_ioctl+0x50a/0x76f
> >>  ksys_ioctl+0x58/0x83
> >>  __x64_sys_ioctl+0x1a/0x1e
> >>  do_syscall_64+0x54/0xde
> >>
> >> While this is problem rarely observed in the field, it is relatively easy
> >> to reproduce by adding msleep() calls into the code.
> >>
> >> I don't presume to claim that I found every issue, but this patch series
> >> should fix at least the major problems.
> >>
> >> The patch series was tested exensively on a Chromebook running chromeos-4.19
> >> and on a Linux system running a v5.8.y based kernel.
> > 
> > I'll review each patch individually, but I think 2/5, 4/5 and 5/5 should
> > be handled in the V4L2 core, not the uvcvideo driver. Otherwise we would
> > have to replicate that logic in all drivers, while I think it can easily
> > be implemented in a generic fashion as previously discussed.
> > 
> The problem is that the v4l2 core already does support locking. There is
> a global lock, in struct video_device, a queue lock in struct v4l2_m2m_ctx,
> and another queue lock in struct vb2_queue. However, all of those have
> to be initialized from the driver. The uvcvideo driver uses its own locks and
> does not set the lock pointers in the various generic structures. I was able
> to figure out how to use the uvcvideo specific locks in the uvcvideo
> driver, but all my attempts to initialize and use the generic locks failed.
> 
> It may well be that the generic code isn't entirely clean - for example
> I am not sure if the lock protection in v4l2_open() is complete since
> it doesn't handle disconnects after checking if the video device is still
> registered (and I don't really see the point of the second video_is_registered()
> call in v4l2_open). However, that may just be a lack of understanding on my
> side on how the code is supposed to work. Maybe the actual device open function
> is expected to have its own protection against underlying hardware removal
> and video device unregistration while opening the device.
> 
> [ Regarding the second call to video_is_registered() in v4l2_open():
>   Add msleep(5000) between it and the call to the driver open function,
>   disconnect the device during the sleep, and it will happily call the device
>   open function on a non-registered video device. That is what patch 5/5 tries
>   to fix or the uvcvideo driver.
>   The same problem applies to other file operations in v4l2-dev.c: They all
>   check if the video device is registered before calling the device
>   specific code, but I don't really see the point of doing that because
>   there is no protection against unregistration after the check was made
>   and before/while the device specific code is running.
>   Patch 4/5 tries to fix this for the uvcvideo driver.
>   If that is a bug in the v4l2 code, I'll be happy to work on a fix,
>   but the only generic fix I could think of would be to utilize the lock in
>   struct video_device ... but that lock isn't initialized by the uvcvideo
>   driver.
> ]
> 
> Either case, I don't think my understanding of the interaction between
> v4l2 and uvcvideo is good enough to make more invasive changes. I _think_
> any generic improvement should start with refactoring the uvcvideo code to
> use the v4l2 locking mechanism. However, from the exchange here, my
> understanding is that this locking mechanism is not used on purpose. That
> means we'll have a uvcvideo specific locking mechanism, period, and I don't
> think it is even possible to solve the problem without utilizing this locking
> mechanism.
> 
> Of course, it may as well be that I am completely off track and clueless.
> After all, the first time I looked into this code was about two weeks ago.
> So please bear with me if I talk nonsense.

It would be rather impolite to claim you're clueless, given that you
managed to write this patch series only two weeks after first looking
into the problem :-)

I'll try to prototype what I envision would be a good solution in the
V4L2 core. If stars align, I may even try to push it one level up, to
the chardev layer. Would you then be able to test it ?

> >> ----------------------------------------------------------------
> >> Guenter Roeck (5):
> >>       media: uvcvideo: Cancel async worker earlier
> >>       media: uvcvideo: Lock video streams and queues while unregistering
> >>       media: uvcvideo: Release stream queue when unregistering video device
> >>       media: uvcvideo: Protect uvc queue file operations against disconnect
> >>       media: uvcvideo: In uvc_v4l2_open, check if video device is registered
> >>
> >>  drivers/media/usb/uvc/uvc_ctrl.c   | 11 ++++++----
> >>  drivers/media/usb/uvc/uvc_driver.c | 12 ++++++++++
> >>  drivers/media/usb/uvc/uvc_queue.c  | 32 +++++++++++++++++++++++++--
> >>  drivers/media/usb/uvc/uvc_v4l2.c   | 45 ++++++++++++++++++++++++++++++++++++--
> >>  drivers/media/usb/uvc/uvcvideo.h   |  1 +
> >>  5 files changed, 93 insertions(+), 8 deletions(-)

-- 
Regards,

Laurent Pinchart

  reply	other threads:[~2020-08-30 21:36 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-30 15:04 [PATCH 0/5] media: uvcvideo: Fix race conditions Guenter Roeck
2020-08-30 15:04 ` [PATCH 1/5] media: uvcvideo: Cancel async worker earlier Guenter Roeck
2020-08-30 15:04 ` [PATCH 2/5] media: uvcvideo: Lock video streams and queues while unregistering Guenter Roeck
2020-08-30 15:04 ` [PATCH 3/5] media: uvcvideo: Release stream queue when unregistering video device Guenter Roeck
2020-08-30 15:04 ` [PATCH 4/5] media: uvcvideo: Protect uvc queue file operations against disconnect Guenter Roeck
2020-09-01 16:51   ` kernel test robot
2020-09-01 16:58     ` Guenter Roeck
2020-08-30 15:04 ` [PATCH 5/5] media: uvcvideo: In uvc_v4l2_open, check if video device is registered Guenter Roeck
2020-08-30 15:58 ` [PATCH 0/5] media: uvcvideo: Fix race conditions Laurent Pinchart
2020-08-30 20:48   ` Guenter Roeck
2020-08-30 21:36     ` Laurent Pinchart [this message]
2020-08-31  0:10       ` Guenter Roeck
2020-09-03  3:19         ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200830213621.GC6043@pendragon.ideasonboard.com \
    --to=laurent.pinchart@ideasonboard.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=linux-uvc-devel@lists.sourceforge.net \
    --cc=linux@roeck-us.net \
    --cc=mchehab@kernel.org \
    --cc=sakari.ailus@iki.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).