All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hans Verkuil <hverkuil@xs4all.nl>
To: Greg KH <greg@kroah.com>
Cc: linux-kernel@vger.kernel.org, v4l <video4linux-list@redhat.com>,
	Laurent Pinchart <laurent.pinchart@skynet.be>
Subject: [BUG] cdev_put() race condition
Date: Mon, 8 Dec 2008 21:56:26 +0100	[thread overview]
Message-ID: <200812082156.26522.hverkuil@xs4all.nl> (raw)

Hi Greg,

Laurent found a race condition in the uvc driver that we traced to the 
way chrdev_open and cdev_put/get work.

You need the following ingredients to reproduce it:

1) a hot-pluggable char device like an USB webcam.
2) a manually created device node for such a webcam instead of relying 
on udev.

In order to easily force this situation you would also need to add a  
delay to the char device's release() function. For webcams that would 
be at the top of v4l2_chardev_release() in 
drivers/media/video/v4l2-dev.c. But adding a delay to e.g. cdev_purge 
would have the same effect.

The sequence of events in the case of a webcam is as follows:

1) The USB device is removed, causing a disconnect.

2) The webcam driver unregisters the video device which in turn calls 
cdev_del().

3) When the last application using the device is closed, the cdev is 
released when the kref of the cdev's kobject goes to 0.

4) If the kref's release() call takes a while due to e.g. extra cleanup 
in the case of a webcam, then another application can try to open the 
video device. Note that this requires a device node created with mknod, 
otherwise the device nodes would already have been removed by udev.

5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this device 
node was never accessed before), then all is fine since kobj_lookup 
will fail because cdev_del() has been called earlier. However, if this 
device node was used earlier, then the else part is called: 
cdev_get(p). This 'p' is the cdev that is being released. Since the 
kref count is 0 you will get a WARN message from kref_get, but the code 
continues on, the f_op->open will (hopefully) return more-or-less 
gracefully with an error and the cdev_put at the end will cause the 
refcount to go to 0 again, which results in a SECOND call to the kref's 
release function!

See this link for the original discussion on the v4l list containing 
stack traces an a patch that you need if you want to (and can) test 
this with the uvc driver:

http://www.spinics.net/lists/vfl/msg39967.html

Reading Documentation/kref.txt leads me to the conclusion that a mutex 
should be introduced to prevent cdev_get from being called while 
cdev_put is in progress. It should be a mutex instead of a spinlock 
because the kref's release() can do all sorts of cleanups, some of 
which might involve waits.

I think that replacing cdev_lock with a mutex, adding it to cdev_put(), 
cdev_del() and removing it from cdev_purge() should do the trick (I 
hope).

BTW: shouldn't cdev_del() call cdev_put() instead of kobject_put()? If I 
understand it correctly then cdev_add calls cdev_get (through 
exact_lock()), so cdev_del should do the reverse, right?

Regards,

	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

WARNING: multiple messages have this Message-ID (diff)
From: Hans Verkuil <hverkuil@xs4all.nl>
To: Greg KH <greg@kroah.com>
Cc: v4l <video4linux-list@redhat.com>, linux-kernel@vger.kernel.org
Subject: [BUG] cdev_put() race condition
Date: Mon, 8 Dec 2008 21:56:26 +0100	[thread overview]
Message-ID: <200812082156.26522.hverkuil@xs4all.nl> (raw)

Hi Greg,

Laurent found a race condition in the uvc driver that we traced to the 
way chrdev_open and cdev_put/get work.

You need the following ingredients to reproduce it:

1) a hot-pluggable char device like an USB webcam.
2) a manually created device node for such a webcam instead of relying 
on udev.

In order to easily force this situation you would also need to add a  
delay to the char device's release() function. For webcams that would 
be at the top of v4l2_chardev_release() in 
drivers/media/video/v4l2-dev.c. But adding a delay to e.g. cdev_purge 
would have the same effect.

The sequence of events in the case of a webcam is as follows:

1) The USB device is removed, causing a disconnect.

2) The webcam driver unregisters the video device which in turn calls 
cdev_del().

3) When the last application using the device is closed, the cdev is 
released when the kref of the cdev's kobject goes to 0.

4) If the kref's release() call takes a while due to e.g. extra cleanup 
in the case of a webcam, then another application can try to open the 
video device. Note that this requires a device node created with mknod, 
otherwise the device nodes would already have been removed by udev.

5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this device 
node was never accessed before), then all is fine since kobj_lookup 
will fail because cdev_del() has been called earlier. However, if this 
device node was used earlier, then the else part is called: 
cdev_get(p). This 'p' is the cdev that is being released. Since the 
kref count is 0 you will get a WARN message from kref_get, but the code 
continues on, the f_op->open will (hopefully) return more-or-less 
gracefully with an error and the cdev_put at the end will cause the 
refcount to go to 0 again, which results in a SECOND call to the kref's 
release function!

See this link for the original discussion on the v4l list containing 
stack traces an a patch that you need if you want to (and can) test 
this with the uvc driver:

http://www.spinics.net/lists/vfl/msg39967.html

Reading Documentation/kref.txt leads me to the conclusion that a mutex 
should be introduced to prevent cdev_get from being called while 
cdev_put is in progress. It should be a mutex instead of a spinlock 
because the kref's release() can do all sorts of cleanups, some of 
which might involve waits.

I think that replacing cdev_lock with a mutex, adding it to cdev_put(), 
cdev_del() and removing it from cdev_purge() should do the trick (I 
hope).

BTW: shouldn't cdev_del() call cdev_put() instead of kobject_put()? If I 
understand it correctly then cdev_add calls cdev_get (through 
exact_lock()), so cdev_del should do the reverse, right?

Regards,

	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

--
video4linux-list mailing list
Unsubscribe mailto:video4linux-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/video4linux-list

             reply	other threads:[~2008-12-08 20:57 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-08 20:56 Hans Verkuil [this message]
2008-12-08 20:56 ` [BUG] cdev_put() race condition Hans Verkuil
2008-12-16 10:06 ` Hans Verkuil
2008-12-16 10:06   ` Hans Verkuil
2008-12-16 20:22 ` Greg KH
2008-12-16 21:00   ` Hans Verkuil
2008-12-16 21:00     ` Hans Verkuil
2008-12-16 21:21     ` Greg KH
2008-12-16 23:23       ` Hans Verkuil
2008-12-16 23:23         ` Hans Verkuil
2008-12-16 23:30         ` Greg KH
2008-12-17 13:37           ` Hans Verkuil
2008-12-17 13:37             ` Hans Verkuil
2008-12-17 14:52             ` Boaz Harrosh
2008-12-17 15:07               ` Hans Verkuil
2008-12-17 15:07                 ` Hans Verkuil
2008-12-17 16:09                 ` Boaz Harrosh
2008-12-17 17:33                   ` Hans Verkuil
2008-12-17 17:33                     ` Hans Verkuil
2008-12-17 18:08                     ` Al Viro
2008-12-18  8:12                       ` Boaz Harrosh
2008-12-18  8:25                     ` Boaz Harrosh
2008-12-17 18:16             ` Greg KH
2008-12-17 19:27               ` Laurent Pinchart
2008-12-17 19:27                 ` Laurent Pinchart
2008-12-17 19:35                 ` Greg KH
2008-12-17 19:30               ` Hans Verkuil
2008-12-17 19:30                 ` Hans Verkuil
2008-12-17 19:38                 ` Greg KH
2008-12-17 19:39                 ` Hans Verkuil
2008-12-17 19:39                   ` Hans Verkuil
2008-12-17 19:53                   ` Greg KH
2008-12-17 20:18                     ` Hans Verkuil
2008-12-17 20:18                       ` Hans Verkuil
2008-12-17 20:52                       ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200812082156.26522.hverkuil@xs4all.nl \
    --to=hverkuil@xs4all.nl \
    --cc=greg@kroah.com \
    --cc=laurent.pinchart@skynet.be \
    --cc=linux-kernel@vger.kernel.org \
    --cc=video4linux-list@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.