ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
To: Greg KH <greg@kroah.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>,
	Mark Brown <broonie@kernel.org>,
	Linus Walleij <linus.walleij@linaro.org>,
	Roland Dreier <roland@kernel.org>,
	Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>,
	ksummit@lists.linux.dev, Daniel Vetter <daniel@ffwll.ch>
Subject: Re: [TECH TOPIC] Rust for Linux
Date: Wed, 7 Jul 2021 20:17:08 +0300	[thread overview]
Message-ID: <YOXhlDsMAZUn1EBg@pendragon.ideasonboard.com> (raw)
In-Reply-To: <YOWh0Dq+2v+wH3B4@kroah.com>

On Wed, Jul 07, 2021 at 02:45:04PM +0200, Greg KH wrote:
> On Wed, Jul 07, 2021 at 01:38:44PM +0100, James Bottomley wrote:
> > On Wed, 2021-07-07 at 14:20 +0200, Greg KH wrote:
> > > On Wed, Jul 07, 2021 at 12:34:31PM +0100, James Bottomley wrote:
> > > > On Wed, 2021-07-07 at 11:50 +0100, Mark Brown wrote:
> > > > > On Tue, Jul 06, 2021 at 10:36:14PM +0200, Linus Walleij wrote:
> > > > > > On Tue, Jul 6, 2021 at 10:00 PM Roland Dreier wrote:
> > > > > > > "devres" / devm_xxx was an attempt to deal with this in C,
> > > > > > > but it only solves some cases of this and has not received a
> > > > > > > lot of adoption (we can argue about the reasons).
> > > > > >  
> > > > > > Really? From my point of view that is adopted all over the map.
> > > > > > I add new users all the time and use it as much as I can when
> > > > > > writing new drivers.
> > > > > 
> > > > > Yes, it's *super* widely used in most of the kernel.  Perhaps
> > > > > there's some subsystems that reject it for some reason.
> > > > > 
> > > > > > I think it's a formidable success, people just need to learn to
> > > > > > do it more.
> > > > > 
> > > > > There *are* issues with people adopting it too enthusiastically -
> > > > > as well as the memory lifetime issues that Laurent mentioned it's
> > > > > easy for it to cause problems with interrupt handlers that are
> > > > > left live longer than they should be and try to use things that
> > > > > were already deallocated.

(CC'ing Daniel Vetter as the author of the DRM devres-based resource
management)

I've given this lots of thoughts lately, in the context of V4L2, but it
should be roughly the same for character devices in general. Here's what
I think should be done for drivers that expose character devices.

- Drivers must stop allocating their internal data structure (the one
  they set as device data with dev_set_drvdata()) with devm_kzalloc().
  The data structure must instead be allocated with a plain kzalloc()
  and reference-counted.

  Most drivers will register a single character device using a
  subsystem-specific API (e.g. video_register_device() in V4L2). The
  subsystem needs to provide a .release() callback, called when the
  last reference to the character device is released. Drivers must
  implement this, and can simply free their internal data structure at
  this point.

  For drivers that register multiple character devices, or in general
  expose multiple interfaces to userspace or other parts of the kernel,
  the internal data structure must be properly reference-counted, with a
  reference released in each .release() callback. There may be ways to
  simplify this.

  This can be seen as going back to the pre-devm_kzalloc() era, but it's
  only about undoing a mistake that was done way too often (to be fair,
  many drivers used to just kfree() the data in the driver's .remove()
  operation, so it wasn't always a regression, only enshrining a
  preexisting bad practice).

  This is only part of the puzzle though. There's a remove/use race that
  still needs to be solved.

- In .remove(), drivers must inform the character device that new access
  from userspace are not allowed anymore. This would set a flag in
  struct cdev that would result in all new calls from userspace through
  file operations to be rejected (with -ENXIO for instance). This should
  be wrapped in subsystem-specific functions (e.g.
  video_device_disconnect() wrapping cdev_disconnect()). From now on, no
  new calls are possible, but existing calls may be in progress.

- Drivers then need to cancel all pending I/O and wait for completion.
  I/O (either direct memory-mapped I/O or through bus APIs, such as I2C
  or USB transactions) are not allowed anymore after .remove() returns.
  This will have the side effect of waking up userspace contexts that
  are waiting for I/O completion (through a blocking file I/O operation
  for instance). Existing calls from userspace will start completing.

  This is also a good place to start shutting down the device, for
  instance disabling interrupts.

- The next step is for drivers to wait until all calls from userspace
  complete. This should be done with a call count in struct cdev that is
  updated upon entry and exit of calls from userspace, and a wait queue
  to wait for that count to go to 0. This should be wrapped in
  subsystem-specific APIs. As the flag that indicates device removal is
  set, no new calls are allowed so the counter can only decrease, and as
  all pending I/O have terminated or have been cancelled, no pending
  calls should be blocked.

- Finally, drivers should unregister the character device, through the
  appropriate subsystem API.

  At this point, memory mappings and file handles referencing the device
  may still exist, but no file operation is in progress. The device is
  quiescent.

  Care needs to be taken in drivers and subsystems to not start any I/O
  operation when handling the file .release() operation or the
  destruction of memory mappings. Overall I don't expect much issues,
  but I'm sure some drivers do strange things in those code paths.

- When the least memory mapping is gone and the last file handle is
  closed, the subsystem will call the driver's .release() callback. At
  this point, the driver will perform the operations listed in the first
  item of this list.


The challenge here will be to make this as easy as possible for drivers,
to avoid risk of drivers getting it wrong. The DRM/KMS subsystem has
created a devres-based system to handle this, with the devres associated
with the drm_device (the abstraction of the cdevs exposed by DRM to
userspace), *not* the physical device. It has a drawback though, it
assumes that a DRM driver will only ever want to register a drm_device
and nothing else, and hardcodes that assumption in the way it releases
resources. That's fine for most DRM drivers, but if a driver was to
register a drm_device and something else (such as a V4L2 video_device,
an input device, ...), the DRM subsystem will get in the way.

I have two questions:

- Does the above make sense ?
- Assuming it does, how do we get from the current mess to a situation
  where writing a driver will be a pleasure, not a punishment ? :-)

> > > > Fixing this should not be beyond the wit of humankind, though.  If
> > > > we delayed deallocation to module release, that would fix the
> > > > interrupt issue, wouldn't it?  Perhaps all devres memory for
> > > > devices should live until then anyway and thus be automatically
> > > > deallocated instead of needing an explicit free ... the problem
> > > > with that being compiled in devices currently optimize away the
> > > > module refcounting, but that should be fixable.
> > > 
> > > module code lifespans are different than device structure lifespans,
> > > it's when people get them confused, as here, that we have problems.
> > 
> > I'm not claiming they are.  What I am claiming is that module lifetime
> > must always encompass the device lifetimes.  Therefore, you can never
> > be incorrect by using a module lifetime for anything attached to a
> > device, just inefficient for using memory longer than potentially
> > needed.  However, in a lot of use cases, the device is created on
> > module init and destroyed on module exit, so the inefficiency is barely
> > noticeable.
> 
> In almost no use case is the device created on module init of a driver.
> devices are created by busses, look at the USB code as an example.  The
> usb bus creates the devices and then individual modules bind to that
> device as needed.  If the device is removed from the system, wonderful,
> the device is unbound, but the module is still loaded.  So if you really
> wanted to, with your change, just do a insert/remove of a USB device a
> few zillion times and then memory is all gone :(
> 
> > The question I'm asking is shouldn't we optimize for this?
> 
> No.
> 
> > so let people allocate devm memory safe in the knowledge it will be
> > freed on module release?
> 
> No, see above.  Modules are never removed in a normal system.  devices
> are.
> 
> And the drm developers have done great work in unwinding some of these
> types of mistakes in their drivers, let's not go backwards please.

-- 
Regards,

Laurent Pinchart

  reply	other threads:[~2021-07-07 17:18 UTC|newest]

Thread overview: 201+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-25 22:09 [TECH TOPIC] Rust for Linux Miguel Ojeda
2021-07-05 23:51 ` Linus Walleij
2021-07-06  4:30   ` Leon Romanovsky
2021-07-06  9:55     ` Linus Walleij
2021-07-06 10:16       ` Geert Uytterhoeven
2021-07-06 17:59         ` Linus Walleij
2021-07-06 18:36           ` Miguel Ojeda
2021-07-06 19:12             ` Linus Walleij
2021-07-06 21:32               ` Miguel Ojeda
2021-07-07 14:10             ` Arnd Bergmann
2021-07-07 15:28               ` Miguel Ojeda
2021-07-07 15:50                 ` Andrew Lunn
2021-07-07 16:34                   ` Miguel Ojeda
2021-07-07 16:55                 ` Arnd Bergmann
2021-07-07 17:54                   ` Miguel Ojeda
2021-07-06 10:22       ` Leon Romanovsky
2021-07-06 14:30       ` Miguel Ojeda
2021-07-06 14:32         ` Miguel Ojeda
2021-07-06 15:03         ` Sasha Levin
2021-07-06 15:33           ` Miguel Ojeda
2021-07-06 15:42             ` Laurent Pinchart
2021-07-06 16:09               ` Mike Rapoport
2021-07-06 18:29               ` Miguel Ojeda
2021-07-06 18:38                 ` Laurent Pinchart
2021-07-06 19:45                   ` Steven Rostedt
2021-07-06 19:59                   ` Miguel Ojeda
2021-07-06 18:53             ` Sasha Levin
2021-07-06 21:50               ` Miguel Ojeda
2021-07-07  4:57                 ` Leon Romanovsky
2021-07-07 13:39                 ` Alexandre Belloni
2021-07-07 13:50                   ` Miguel Ojeda
2021-07-06 18:26         ` Linus Walleij
2021-07-06 19:11           ` Miguel Ojeda
2021-07-06 19:13         ` Johannes Berg
2021-07-06 19:43           ` Miguel Ojeda
2021-07-06 10:20     ` James Bottomley
2021-07-06 14:55       ` Miguel Ojeda
2021-07-06 15:01         ` Sasha Levin
2021-07-06 15:36           ` Miguel Ojeda
2021-07-09 10:02         ` Marco Elver
2021-07-09 16:02           ` Miguel Ojeda
2021-07-06 18:09       ` Linus Walleij
2021-07-06 14:24     ` Miguel Ojeda
2021-07-06 14:33       ` Laurent Pinchart
2021-07-06 14:56       ` Leon Romanovsky
2021-07-06 15:29         ` Miguel Ojeda
2021-07-07  4:38           ` Leon Romanovsky
2021-07-06 20:00   ` Roland Dreier
2021-07-06 20:36     ` Linus Walleij
2021-07-06 22:00       ` Laurent Pinchart
2021-07-07  7:27         ` Julia Lawall
2021-07-07  7:45           ` Greg KH
2021-07-07  7:52             ` James Bottomley
2021-07-07 13:49               ` Miguel Ojeda
2021-07-07 14:08                 ` James Bottomley
2021-07-07 15:15                   ` Miguel Ojeda
2021-07-07 15:44                     ` Greg KH
2021-07-07 17:01                       ` Wedson Almeida Filho
2021-07-07 17:20                         ` Greg KH
2021-07-07 19:19                           ` Wedson Almeida Filho
2021-07-07 20:38                             ` Jan Kara
2021-07-07 23:09                               ` Wedson Almeida Filho
2021-07-08  6:11                                 ` Greg KH
2021-07-08 13:36                                   ` Wedson Almeida Filho
2021-07-08 18:51                                     ` Greg KH
2021-07-08 19:31                                       ` Andy Lutomirski
2021-07-08 19:35                                         ` Geert Uytterhoeven
2021-07-08 21:56                                           ` Andy Lutomirski
2021-07-08 19:49                                       ` Linus Walleij
2021-07-08 20:34                                         ` Miguel Ojeda
2021-07-08 22:13                                           ` Linus Walleij
2021-07-09  7:24                                             ` Geert Uytterhoeven
2021-07-19 12:24                                             ` Wedson Almeida Filho
2021-07-19 13:15                                               ` Wedson Almeida Filho
2021-07-19 14:02                                                 ` Arnd Bergmann
2021-07-19 14:13                                                   ` Linus Walleij
2021-07-19 21:32                                                     ` Arnd Bergmann
2021-07-19 21:33                                                     ` Arnd Bergmann
2021-07-20  1:46                                                       ` Miguel Ojeda
2021-07-20  6:43                                                         ` Johannes Berg
2021-07-19 14:43                                                   ` Geert Uytterhoeven
2021-07-19 18:24                                                     ` Miguel Ojeda
2021-07-19 18:47                                                       ` Steven Rostedt
2021-07-19 14:54                                                   ` Miguel Ojeda
2021-07-19 17:32                                                   ` Wedson Almeida Filho
2021-07-19 21:31                                                     ` Arnd Bergmann
2021-07-19 17:37                                                   ` Miguel Ojeda
2021-07-19 16:02                                                 ` Vegard Nossum
2021-07-19 17:45                                                   ` Miguel Ojeda
2021-07-19 17:54                                                     ` Miguel Ojeda
2021-07-19 18:06                                                   ` Wedson Almeida Filho
2021-07-19 19:37                                                     ` Laurent Pinchart
2021-07-19 21:09                                                       ` Wedson Almeida Filho
2021-07-20 23:54                                                         ` Laurent Pinchart
2021-07-21  1:33                                                           ` Andy Lutomirski
2021-07-21  1:42                                                             ` Laurent Pinchart
2021-07-21 13:54                                                               ` Linus Walleij
2021-07-21 14:13                                                                 ` Wedson Almeida Filho
2021-07-21 14:19                                                                   ` Linus Walleij
2021-07-22 11:33                                                                     ` Wedson Almeida Filho
2021-07-23  0:45                                                                       ` Linus Walleij
2021-07-21  4:39                                                             ` Wedson Almeida Filho
2021-07-23  1:04                                                               ` Laurent Pinchart
2021-07-21  4:23                                                           ` Wedson Almeida Filho
2021-07-23  1:13                                                             ` Laurent Pinchart
2021-07-19 22:57                                                 ` Alexandre Belloni
2021-07-20  7:15                                                   ` Miguel Ojeda
2021-07-20  9:39                                                     ` Alexandre Belloni
2021-07-20 12:10                                                       ` Miguel Ojeda
2021-07-19 13:53                                               ` Linus Walleij
2021-07-19 14:42                                                 ` Wedson Almeida Filho
2021-07-19 22:16                                                   ` Linus Walleij
2021-07-20  1:20                                                     ` Wedson Almeida Filho
2021-07-20 13:21                                                       ` Andrew Lunn
2021-07-20 13:38                                                         ` Miguel Ojeda
2021-07-20 14:04                                                           ` Andrew Lunn
2021-07-20 13:55                                                         ` Greg KH
2021-07-20  1:21                                                     ` Miguel Ojeda
2021-07-20 16:00                                                       ` Mark Brown
2021-07-20 22:42                                                       ` Linus Walleij
2021-07-19 14:43                                                 ` Miguel Ojeda
2021-07-19 15:15                                                   ` Andrew Lunn
2021-07-19 15:43                                                     ` Miguel Ojeda
2021-07-09  7:03                                         ` Viresh Kumar
2021-07-09 17:06                                         ` Mark Brown
2021-07-09 17:43                                           ` Miguel Ojeda
2021-07-10  9:53                                             ` Jonathan Cameron
2021-07-10 20:09                                         ` Kees Cook
2021-07-08 13:55                                   ` Miguel Ojeda
2021-07-08 14:58                                     ` Greg KH
2021-07-08 15:02                                       ` Mark Brown
2021-07-08 16:38                                       ` Andy Lutomirski
2021-07-08 18:01                                         ` Greg KH
2021-07-08 18:00                                       ` Miguel Ojeda
2021-07-08 18:44                                         ` Greg KH
2021-07-08 23:09                                           ` Miguel Ojeda
2021-07-08  7:20                                 ` Geert Uytterhoeven
2021-07-08 13:41                                   ` Wedson Almeida Filho
2021-07-08 13:43                                     ` Geert Uytterhoeven
2021-07-08 13:54                                       ` Wedson Almeida Filho
2021-07-08 14:16                                         ` Geert Uytterhoeven
2021-07-08 14:24                                           ` Wedson Almeida Filho
2021-07-09  7:04                                             ` Jerome Glisse
2021-07-08 14:04                                       ` Miguel Ojeda
2021-07-08 14:18                                         ` Geert Uytterhoeven
2021-07-08 14:28                                           ` Miguel Ojeda
2021-07-08 14:33                                             ` Geert Uytterhoeven
2021-07-08 14:35                                               ` Miguel Ojeda
2021-07-09 11:55                                                 ` Geert Uytterhoeven
2021-07-08 16:07                                               ` Andy Lutomirski
2021-07-07 20:58                           ` Miguel Ojeda
2021-07-07 21:47                             ` Laurent Pinchart
2021-07-07 22:44                               ` Miguel Ojeda
2021-07-07 17:01           ` Miguel Ojeda
2021-07-07 10:50       ` Mark Brown
2021-07-07 10:56         ` Julia Lawall
2021-07-07 11:27           ` James Bottomley
2021-07-07 11:34         ` James Bottomley
2021-07-07 12:20           ` Greg KH
2021-07-07 12:38             ` James Bottomley
2021-07-07 12:45               ` Greg KH
2021-07-07 17:17                 ` Laurent Pinchart [this message]
2021-07-08  6:49                   ` cdev/devm_* issues (was Re: [TECH TOPIC] Rust for Linux) Greg KH
2021-07-08  8:23                     ` Laurent Pinchart
2021-07-08 23:06                     ` Linus Walleij
2021-07-09  0:02                       ` Dan Williams
2021-07-09 16:53                       ` Wedson Almeida Filho
2021-07-13  8:59                         ` Linus Walleij
     [not found]                           ` <CAHp75VfW7PxAyU=eYPNWFU_oUY=aStz-4W5gX87KSo402YhMXQ@mail.gmail.com>
2021-07-21 13:46                             ` Linus Walleij
2021-07-21 15:49                               ` Andy Shevchenko
2021-07-10  7:09                     ` Dan Carpenter
2021-07-12 13:42                       ` Jason Gunthorpe
2021-07-15  9:54                     ` Daniel Vetter
2021-07-21  9:08                       ` Dan Carpenter
2021-07-22  9:56                         ` Daniel Vetter
2021-07-22 10:09                           ` Dan Carpenter
2021-07-08  9:08                   ` [TECH TOPIC] Rust for Linux Mauro Carvalho Chehab
2021-07-10 16:42                     ` Laurent Pinchart
2021-07-10 17:18                       ` Andy Lutomirski
2021-07-07 15:17           ` Mark Brown
2021-07-06 21:45     ` Bart Van Assche
2021-07-06 23:08       ` Stephen Hemminger
2021-07-07  2:41         ` Bart Van Assche
2021-07-07 18:57           ` Linus Torvalds
2021-07-07 20:32             ` Bart Van Assche
2021-07-07 20:39               ` Linus Torvalds
2021-07-07 21:40                 ` Laurent Pinchart
2021-07-08  7:22                 ` Geert Uytterhoeven
2021-07-07 21:02               ` Laurent Pinchart
2021-07-07 22:11               ` Miguel Ojeda
2021-07-07 22:43                 ` Laurent Pinchart
2021-07-07 23:21                   ` Miguel Ojeda
2021-07-07 23:40                     ` Laurent Pinchart
2021-07-08  0:27                       ` Miguel Ojeda
2021-07-08  0:56                         ` Laurent Pinchart
2021-07-08  6:26             ` Alexey Dobriyan
2021-07-06 19:05 ` Bart Van Assche
2021-07-06 19:27   ` Miguel Ojeda
2021-07-07 15:48 ` Steven Rostedt
2021-07-07 16:44   ` Miguel Ojeda
2023-08-07 10:03 Miguel Ojeda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YOXhlDsMAZUn1EBg@pendragon.ideasonboard.com \
    --to=laurent.pinchart@ideasonboard.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=broonie@kernel.org \
    --cc=daniel@ffwll.ch \
    --cc=greg@kroah.com \
    --cc=ksummit@lists.linux.dev \
    --cc=linus.walleij@linaro.org \
    --cc=miguel.ojeda.sandonis@gmail.com \
    --cc=roland@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).