linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Chen, Mike Ximing" <mike.ximing.chen@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"arnd@arndb.de" <arnd@arndb.de>,
	"pierre-louis.bossart@linux.intel.com" 
	<pierre-louis.bossart@linux.intel.com>
Subject: Re: [PATCH v8 04/20] dlb: add device ioctl layer and first three ioctls
Date: Wed, 13 Jan 2021 10:57:47 +0100	[thread overview]
Message-ID: <X/7EG46Z20F8QFIX@kroah.com> (raw)
In-Reply-To: <CAPcyv4hGxLZGEkfnqdLfF-a1CzfEjLux-TBxXztbknFhEe9mYA@mail.gmail.com>

On Tue, Jan 12, 2021 at 01:03:08PM -0800, Dan Williams wrote:
> On Sun, Jan 10, 2021 at 7:06 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Sat, Jan 09, 2021 at 01:49:42PM -0800, Dan Williams wrote:
> > > On Sat, Jan 9, 2021 at 12:34 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > >
> > > > On Sat, Jan 09, 2021 at 07:49:24AM +0000, Chen, Mike Ximing wrote:
> > > > > > > +static int dlb_ioctl_arg_size[NUM_DLB_CMD] = {
> > > > > > > + sizeof(struct dlb_get_device_version_args),
> > > > > > > + sizeof(struct dlb_create_sched_domain_args),
> > > > > > > + sizeof(struct dlb_get_num_resources_args)
> > > > > >
> > > > > > That list.
> > > > > >
> > > > > > Ugh, no.  that's no way to write maintainable code that you will be able
> > > > > > to understand in 2 years.
> > > > > >
> > > > >
> > > > > dlb_ioctl() was implemented with switch-case and real function calls previously.
> > > > > We changed to the table/list implementation during a code restructure. I will move
> > > > > back to the old implementation.
> > > >
> > > > Who said to change this?  And why did they say that?  Please go back to
> > > > those developers and point them at this thread so that doesn't happen
> > > > again...
> > > >
> > > > > > > +{
> > > > > > > + struct dlb *dlb;
> > > > > > > + dlb_ioctl_fn_t fn;
> > > > > > > + u32 cmd_nr;
> > > > > > > + void *karg;
> > > > > > > + int size;
> > > > > > > + int ret;
> > > > > > > +
> > > > > > > + dlb = f->private_data;
> > > > > > > +
> > > > > > > + if (!dlb) {
> > > > > > > +         pr_err("dlb: [%s()] Invalid DLB data\n", __func__);
> > > > > > > +         return -EFAULT;
> > > > > >
> > > > > > This error value is only allowed if you really do have a memory fault.
> > > > > >
> > > > > > Hint, you do not here.
> > > > > >
> > > > > > How can that value ever be NULL?
> > > > > >
> > > > >
> > > > > It is targeted at some very rare cases, in which an ioctl command are called immediately after a device unbind (by a different process/application).
> > > >
> > > > And how can that happen?  If it really can happen, where is the lock
> > > > that you are holding to keep that pointer from becoming "stale" right
> > > > after you assign it?
> > > >
> > > > So either this never can happen, or your logic here for this type of
> > > > thing is totally wrong.  Please work to determine which it is.
> > >
> > > I would have preferred a chance to offer a reviewed-by on this set
> > > before it went out (per the required process) to validate that the
> > > feedback on the lifetime handling was properly addressed, it wasn't,
> > > but lets deal with this on the list now.
> > >
> > > The race to handle is the one identified by cdev_del():
> > >
> > >  * NOTE: This guarantees that cdev device will no longer be able to be
> > >  * opened, however any cdevs already open will remain and their fops will
> > >  * still be callable even after cdev_del returns.
> > >
> > > This means that the dlb->private_data is pointing to a live device, a
> > > dying device, or NULL. Without revalidating to the dlb pointer under a
> > > lock, or some other coordinated reference cout, it can transition
> > > states underneath the running ioctl.
> >
> > But, that's only the case if this is the last cdev reference held here,
> > right?  How can a close be called if a filehandle is still open such
> > that an ioctl can be called?
> >
> > This should just be a "simple" char device operation, with no need to be
> > fancy or anything odd like that, right?  If not, then yes, this really
> > does need a real lock.
> >
> > > Greg, I'm thinking of taking a shot at a document, "botching up device
> > > lifetimes",  in the same spirit as
> > > Documentation/process/botching-up-ioctls.rst to lay out the different
> > > schemes for revalidating driver private data in ioctls.
> >
> > Sure, but again, it should be "simple" in that an ioctl can't be called
> > after close() happens.
> 
> Yes, for checking that file->private_data is not NULL it is sufficient
> to just assume that ioctl() can't be called after close().
> 
> That's not my concern though. The open race that cdev_del() does not
> address is ioctl() called after device-unbind. The open fd is never
> revoked and can live past device_unregister() in which case the ioctl
> needs to revalidate the device, or (not recommended) block unbind /
> driver-remove while open file descriptors are outstanding.

A cdev is to track the character device, so the open/close/ioctl issue
should not be relevant here.

Device unbind is totally different and has nothing to do with the
character device node, except where you are trying to tie the two
together.  And yes, you do have to be aware of that, but usually is it
quite simple.  Complex examples are the v4l layer where the distance
between the two devices is great, so the middle layer has to handle
things carefully.

For a "simple" driver like this one, there shouldn't be any issues and
it should be hard to get this wrong :)

> > > It strikes me that a helper like this might address many of the common patterns:
> > >
> > > +/**
> > > + * get_live_device() - increment reference count for device iff !dead
> > > + * @dev: device.
> > > + *
> > > + * Forward the call to get_device() if the device is still alive. If
> > > + * this is called with the device_lock() held then the device is
> > > + * guaranteed to not die until the device_lock() is dropped.
> > > + */
> > > +struct device *get_live_device(struct device *dev)
> > > +{
> > > +       return dev && !dev->p->dead ? get_device(dev) : NULL;
> > > +}
> > > +EXPORT_SYMBOL_GPL(get_live_device);
> >
> > Ick, no, that's still racy :(
> 
> Hence the comment about needing to synchronize with the driver doing
> device_unregister().

If you save off your device pointer properly on probe (i.e. you grab a
reference count as you "know" it is live), then you don't have any of
these races or need to synchronize.  So again, this should be hard to
get wrong, unless you have a "heavy" middle layer between the char
device node and the device structure.

> > And a cdev is not a "real" struct device, despite looking like one if
> > you squint at it.  The patches from Christoph should be merged soon that
> > remove the last remants of the logic that kind of looked like a struct
> > device operation (with a kobject), and after that, I will clean it out
> > to keep it from looking like it ties into the driver model entirely, as
> > many people get this wrong, because it does not.
> 
> Agree, but many drivers still tie cdev lifetime to a struct device.

True, and that's normally fine.  Do you have examples of where this is
wrong in the tree?

> > > Alternatively, if device_lock() is too awkward for a driver it could
> > > use its own lock and kill_device().
> > >
> > > ...am I off base thinking that cdev_del vs fops liveness is a
> > > widespread problem worth documenting new design patterns?
> >
> > It shouldn't be a problem, again, because who would be able to close a
> > char device node and still be able to call ioctl on it?  The VFS layer
> > should prevent that from happening, right?
> 
> Per above, unbind vs and revoking new ioctl() invocations is the concern.

Luckily unbind is a very rare occurance (with the exception of the
virtio drivers, who seem to love it and use it as their only user/kernel
api), so this shouldn't be an issue in normal operations of a system.

thanks,

greg k-h

  reply	other threads:[~2021-01-13  9:57 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-05  2:58 [PATCH v8 00/20] dlb: introduce DLB device driver Mike Ximing Chen
2021-01-05  2:58 ` [PATCH v8 01/20] dlb: add skeleton for DLB driver Mike Ximing Chen
2021-01-07 19:35   ` Greg KH
2021-01-09  6:09     ` Chen, Mike Ximing
2021-01-05  2:58 ` [PATCH v8 02/20] dlb: initialize device Mike Ximing Chen
2021-01-05  2:58 ` [PATCH v8 03/20] dlb: add resource and device initialization Mike Ximing Chen
2021-01-07 19:40   ` Greg KH
2021-01-10 17:22     ` Chen, Mike Ximing
2021-01-05  2:58 ` [PATCH v8 04/20] dlb: add device ioctl layer and first three ioctls Mike Ximing Chen
2021-01-07 19:41   ` Greg KH
2021-01-09  6:17     ` Chen, Mike Ximing
2021-01-07 19:41   ` Greg KH
2021-01-09  7:55     ` Chen, Mike Ximing
2021-01-07 19:50   ` Greg KH
2021-01-09  7:49     ` Chen, Mike Ximing
2021-01-09  8:33       ` Greg KH
2021-01-09 21:49         ` Dan Williams
2021-01-10 15:07           ` Greg KH
2021-01-12 21:03             ` Dan Williams
2021-01-13  9:57               ` Greg KH [this message]
2021-01-21 23:14                 ` Dan Williams
2021-01-10  4:30         ` Chen, Mike Ximing
2021-01-05  2:58 ` [PATCH v8 05/20] dlb: add scheduling domain configuration Mike Ximing Chen
2021-01-05  2:58 ` [PATCH v8 06/20] dlb: add domain software reset Mike Ximing Chen
2021-01-05  2:58 ` [PATCH v8 07/20] dlb: add low-level register reset operations Mike Ximing Chen
2021-01-05  2:58 ` [PATCH v8 08/20] dlb: add runtime power-management support Mike Ximing Chen
2021-01-05  2:58 ` [PATCH v8 09/20] dlb: add queue create, reset, get-depth ioctls Mike Ximing Chen
2021-01-05  2:58 ` [PATCH v8 10/20] dlb: add register operations for queue management Mike Ximing Chen
2021-01-05  2:58 ` [PATCH v8 11/20] dlb: add ioctl to configure ports and query poll mode Mike Ximing Chen
2021-01-05  2:58 ` [PATCH v8 12/20] dlb: add register operations for port management Mike Ximing Chen
2021-01-05  2:58 ` [PATCH v8 13/20] dlb: add port mmap support Mike Ximing Chen
2021-01-05  2:58 ` [PATCH v8 14/20] dlb: add start domain ioctl Mike Ximing Chen
2021-01-05  2:58 ` [PATCH v8 15/20] dlb: add queue map, unmap, and pending unmap operations Mike Ximing Chen
2021-01-05  2:58 ` [PATCH v8 16/20] dlb: add port map/unmap state machine Mike Ximing Chen
2021-01-05  2:58 ` [PATCH v8 17/20] dlb: add static queue map register operations Mike Ximing Chen
2021-01-05  2:58 ` [PATCH v8 18/20] dlb: add dynamic " Mike Ximing Chen
2021-01-05  2:58 ` [PATCH v8 19/20] dlb: add queue unmap " Mike Ximing Chen
2021-01-05  2:58 ` [PATCH v8 20/20] dlb: queue map/unmap workqueue Mike Ximing Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X/7EG46Z20F8QFIX@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=arnd@arndb.de \
    --cc=dan.j.williams@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mike.ximing.chen@intel.com \
    --cc=pierre-louis.bossart@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).