From: Jens Axboe <firstname.lastname@example.org>
To: Matteo Croce <email@example.com>,
Christoph Hellwig <firstname.lastname@example.org>
"Lennart Poettering" <email@example.com>,
"Luca Boccassi" <firstname.lastname@example.org>,
"Alexander Viro" <email@example.com>,
"Damien Le Moal" <firstname.lastname@example.org>,
"Tejun Heo" <email@example.com>,
"Javier González" <firstname.lastname@example.org>,
"Niklas Cassel" <email@example.com>,
"Johannes Thumshirn" <firstname.lastname@example.org>,
"Hannes Reinecke" <email@example.com>,
"Matthew Wilcox" <firstname.lastname@example.org>,
Subject: Re: [PATCH v5 0/5] block: add a sequence number to disks
Date: Wed, 28 Jul 2021 13:22:18 -0600 [thread overview]
Message-ID: <email@example.com> (raw)
On 7/12/21 5:05 PM, Matteo Croce wrote:
> From: Matteo Croce <firstname.lastname@example.org>
> Associating uevents with block devices in userspace is difficult and racy:
> the uevent netlink socket is lossy, and on slow and overloaded systems has
> a very high latency. Block devices do not have exclusive owners in
> userspace, any process can set one up (e.g. loop devices). Moreover, device
> names can be reused (e.g. loop0 can be reused again and again). A userspace
> process setting up a block device and watching for its events cannot thus
> reliably tell whether an event relates to the device it just set up or
> another earlier instance with the same name.
> Being able to set a UUID on a loop device would solve the race conditions.
> But it does not allow to derive orderings from uevents: if you see a uevent
> with a UUID that does not match the device you are waiting for, you cannot
> tell whether it's because the right uevent has not arrived yet, or it was
> already sent and you missed it. So you cannot tell whether you should wait
> for it or not.
> Being able to set devices up in a namespace would solve the race conditions
> too, but it can work only if being namespaced is feasible in the first
> place. Many userspace processes need to set devices up for the root
> namespace, so this solution cannot always work.
> Changing the loop devices naming implementation to always use
> monotonically increasing device numbers, instead of reusing the lowest
> free number, would also solve the problem, but it would be very disruptive
> to userspace and likely break many existing use cases. It would also be
> quite awkward to use on long-running machines, as the loop device name
> would quickly grow to many-digits length.
> Furthermore, this problem does not affect only loop devices - partition
> probing is asynchronous and very slow on busy systems. It is very easy to
> enter races when using LO_FLAGS_PARTSCAN and watching for the partitions to
> show up, as it can take a long time for the uevents to be delivered after
> setting them up.
> Associating a unique, monotonically increasing sequential number to the
> lifetime of each block device, which can be retrieved with an ioctl
> immediately upon setting it up, allows to solve the race conditions with
> uevents, and also allows userspace processes to know whether they should
> wait for the uevent they need or if it was dropped and thus they should
> move on.
> This does not benefit only loop devices and block devices with multiple
> partitions, but for example also removable media such as USB sticks or
> The first patch is the core one, the 2..4 expose the information in
> different ways, and the last one makes the loop device generate a media
> changed event upon attach, detach or reconfigure, so the sequence number
> is increased.
> If merged, this feature will immediately used by the userspace:
Applied for 5.15, with #2 done manually since it didn't apply cleanly.
prev parent reply other threads:[~2021-07-28 19:22 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-12 23:05 [PATCH v5 0/5] block: add a sequence number to disks Matteo Croce
2021-07-12 23:05 ` [PATCH v5 1/6] block: add disk sequence number Matteo Croce
2021-07-12 23:05 ` [PATCH v5 2/6] block: export the diskseq in uevents Matteo Croce
2021-07-12 23:05 ` [PATCH v5 3/6] block: add ioctl to read the disk sequence number Matteo Croce
2021-07-12 23:05 ` [PATCH v5 4/6] block: export diskseq in sysfs Matteo Croce
2021-07-12 23:05 ` [PATCH v5 5/6] block: add a helper to raise a media changed event Matteo Croce
2021-07-12 23:05 ` [PATCH v5 6/6] loop: raise media_change event Matteo Croce
2021-07-13 6:03 ` Christoph Hellwig
2021-07-20 17:27 ` [PATCH v5 0/5] block: add a sequence number to disks Luca Boccassi
2021-07-22 11:41 ` Matteo Croce
2021-07-28 19:01 ` Lennart Poettering
2021-07-28 19:22 ` Jens Axboe [this message]
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).