archive mirror
 help / color / mirror / Atom feed
From: Lennart Poettering <>
To: Hannes Reinecke <>
Cc: Matteo Croce <>,
	Christoph Hellwig <>,,,
	Jens Axboe <>,
	Linux Kernel Mailing List <>,
	Luca Boccassi <>,
	Alexander Viro <>,
	Damien Le Moal <>, Tejun Heo <>,
	Javier Gonz??lez <>,
	Niklas Cassel <>,
	Johannes Thumshirn <>,
	Matthew Wilcox <>,
	JeffleXu <>
Subject: Re: [PATCH v3 1/6] block: add disk sequence number
Date: Wed, 23 Jun 2021 17:48:57 +0200	[thread overview]
Message-ID: <YNNX6U5Ui95ZEJnw@gardel-login> (raw)
In-Reply-To: <>

On Mi, 23.06.21 17:02, Hannes Reinecke ( wrote:

> > you imply it was easy to know which device use a uevent belongs
> > to. But that's the problem: it is not possible to do so safely. if i
> > see a uevent for a block device "loop0" I cannot tell if it was from
> > my own use of the device or for some previous user of it.
> >
> > And that's what we'd like to see fixed: i.e. we query the block device
> > for the seqeno now used and then we can use that to filter the uevents
> > and ignore the ones that do not carry the same sequence number as we
> > got assigned for our user.
> It is notoriously tricky to monitor the intended use-case for kernel
> devices, precisely because we do _not_ attach any additional information to
> it.
> I have send a proposal for LSF to implement block-namespaces, the prime
> use-case of which is indeed attaching cgroup/namespace information to block
> devices such that we _can_ match (block) devices to specific
> contexts.

The goal of the patchset is to make loopback block devices (and
similar) safely and robustly concurrently allocatable from the main OS
namespace, without any cgroup/containerization logic.

In systemd we want to be able to allocate loopback block devices from
any context, and concurrently without having to set up a
cgroup/namespace first for each user for it. Any approach that binds
two distinct subsystems like this together (e.g. "you need to set up
cgroups to safely allocate loopback block devices") is really
problematic for us, since we manage both, but independently and always
with minimal privileges.

> Which I rather prefer than adding sequence numbers to block devices;
> incidentally you could solve the same problem by _not_ reusing numbers
> aggressively but rather allocate the next free one after the most recently
> allocated one.

You are suggesting that instead of allocating loopback block devices
always from the "bottom", i.e. always handing out from "loop0" on,
with the lowest preferred we'd just always hand out "loop1", "loop2",
… with strictly monotonically increasing numbres and never reuse
"loop0" anymore and other names we already handed out? That would
certainly work, but this would require quite some kernel rework, since
the loopbck allocation API is really not designed to work like that
right now.

Moreover, the proposed sequence number stuff also covers
floppies/cdroms and other stuff nicely, i.e. where drives stick around
but their media changes. Also, USB sticks are currently effectively
always called /dev/sda. It would be great to be able to distinguish
each plug/replug too. Of course, you could argue that there too
/dev/sda should never be reused, but strictly monotonically increasing
/dev/sdb, /dev/sdc, …  and so on, and I'd sympathize with that, but
that makes it a major kernel rework, because basically every block
subsystem would have to be reworked to never reuse block device names

Also, i doubt people would be happy if they then regularly would have
to deal with device names such as /dev/loop84763874658743 or
/dev/sdzbghz just because their system has been running for a while.

> The better alternative here would be to extend the loop ioctl to pass in an
> UUID when allocating the device.
> That way you can easily figure out whether the loop device has been
> modified.

UUIDs instead of sequence numbers would mostly solve our probelms
too. i.e. chaotic, randomized assignment of identifiers instead of
linearly progressing assignment of idenitifers. However I prefer
sequence numbers as discussed in this thread before: they allow us to
derive ordering from things: thus if you see an uevent with a seqnum
smaller than the one you are interested in you know its worth waiting
for the ones you are looking for to appear. But if you see a uevent
with a seqnum greater than the one you are interested in then you know
it's pointless to wait, the device has already been acquired by
someone else. With randomized UUIDs you can't know that, since uses by
other participants are only recognizable as distinct from your own but
don't tell you if they are earlier or later than your own. After all
the AF_NETLINK/uevent socket is lossy, so you must be prepared for
dropped messages, hence it's reat if we can easily resync when your
own messages get dropped.


Lennart Poettering, Berlin

  parent reply	other threads:[~2021-06-23 15:49 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-23 10:58 [PATCH v3 0/6] block: add a sequence number to disks Matteo Croce
2021-06-23 10:58 ` [PATCH v3 1/6] block: add disk sequence number Matteo Croce
2021-06-23 11:48   ` Christoph Hellwig
2021-06-23 13:10     ` Matteo Croce
2021-06-23 13:51       ` Lennart Poettering
2021-06-23 14:01         ` Hannes Reinecke
2021-06-23 14:07           ` Luca Boccassi
2021-06-23 14:21             ` Hannes Reinecke
2021-06-23 14:34               ` Luca Boccassi
2021-06-23 14:55               ` Lennart Poettering
2021-06-23 14:12           ` Lennart Poettering
2021-06-23 15:02             ` Hannes Reinecke
2021-06-23 15:34               ` Luca Boccassi
2021-06-23 15:48               ` Lennart Poettering [this message]
2021-06-23 14:28       ` Christoph Hellwig
2021-06-23 10:58 ` [PATCH v3 2/6] block: add ioctl to read the " Matteo Croce
2021-06-23 10:58 ` [PATCH v3 3/6] block: refactor sysfs code Matteo Croce
2021-06-23 11:52   ` Christoph Hellwig
2021-06-23 19:03     ` Matteo Croce
2021-06-24  6:12       ` Christoph Hellwig
2021-06-23 10:58 ` [PATCH v3 4/6] block: export diskseq in sysfs Matteo Croce
2021-06-23 10:58 ` [PATCH v3 5/6] block: increment sequence number Matteo Croce
2021-06-23 10:58 ` [PATCH v3 6/6] loop: " Matteo Croce
2021-06-23 11:57   ` Christoph Hellwig
2021-06-23 13:13     ` Luca Boccassi
2021-06-23 14:25       ` Christoph Hellwig
2021-06-23 15:29         ` Lennart Poettering
2021-06-24  6:11           ` Christoph Hellwig
2021-06-23 12:03 ` [PATCH v3 0/6] block: add a sequence number to disks Hannes Reinecke
2021-06-23 12:46   ` Luca Boccassi
2021-06-23 14:07   ` Lennart Poettering

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YNNX6U5Ui95ZEJnw@gardel-login \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).