From: Lennart Poettering <firstname.lastname@example.org> To: Hannes Reinecke <email@example.com> Cc: Matteo Croce <firstname.lastname@example.org>, Christoph Hellwig <email@example.com>, firstname.lastname@example.org, email@example.com, Jens Axboe <firstname.lastname@example.org>, Linux Kernel Mailing List <email@example.com>, Luca Boccassi <firstname.lastname@example.org>, Alexander Viro <email@example.com>, Damien Le Moal <firstname.lastname@example.org>, Tejun Heo <email@example.com>, Javier Gonz??lez <firstname.lastname@example.org>, Niklas Cassel <email@example.com>, Johannes Thumshirn <firstname.lastname@example.org>, Matthew Wilcox <email@example.com>, JeffleXu <firstname.lastname@example.org> Subject: Re: [PATCH v3 1/6] block: add disk sequence number Date: Wed, 23 Jun 2021 17:48:57 +0200 [thread overview] Message-ID: <YNNX6U5Ui95ZEJnw@gardel-login> (raw) In-Reply-To: <email@example.com> On Mi, 23.06.21 17:02, Hannes Reinecke (firstname.lastname@example.org) wrote: > > you imply it was easy to know which device use a uevent belongs > > to. But that's the problem: it is not possible to do so safely. if i > > see a uevent for a block device "loop0" I cannot tell if it was from > > my own use of the device or for some previous user of it. > > > > And that's what we'd like to see fixed: i.e. we query the block device > > for the seqeno now used and then we can use that to filter the uevents > > and ignore the ones that do not carry the same sequence number as we > > got assigned for our user. > > It is notoriously tricky to monitor the intended use-case for kernel > devices, precisely because we do _not_ attach any additional information to > it. > I have send a proposal for LSF to implement block-namespaces, the prime > use-case of which is indeed attaching cgroup/namespace information to block > devices such that we _can_ match (block) devices to specific > contexts. The goal of the patchset is to make loopback block devices (and similar) safely and robustly concurrently allocatable from the main OS namespace, without any cgroup/containerization logic. In systemd we want to be able to allocate loopback block devices from any context, and concurrently without having to set up a cgroup/namespace first for each user for it. Any approach that binds two distinct subsystems like this together (e.g. "you need to set up cgroups to safely allocate loopback block devices") is really problematic for us, since we manage both, but independently and always with minimal privileges. > Which I rather prefer than adding sequence numbers to block devices; > incidentally you could solve the same problem by _not_ reusing numbers > aggressively but rather allocate the next free one after the most recently > allocated one. You are suggesting that instead of allocating loopback block devices always from the "bottom", i.e. always handing out from "loop0" on, with the lowest preferred we'd just always hand out "loop1", "loop2", … with strictly monotonically increasing numbres and never reuse "loop0" anymore and other names we already handed out? That would certainly work, but this would require quite some kernel rework, since the loopbck allocation API is really not designed to work like that right now. Moreover, the proposed sequence number stuff also covers floppies/cdroms and other stuff nicely, i.e. where drives stick around but their media changes. Also, USB sticks are currently effectively always called /dev/sda. It would be great to be able to distinguish each plug/replug too. Of course, you could argue that there too /dev/sda should never be reused, but strictly monotonically increasing /dev/sdb, /dev/sdc, … and so on, and I'd sympathize with that, but that makes it a major kernel rework, because basically every block subsystem would have to be reworked to never reuse block device names anymore. Also, i doubt people would be happy if they then regularly would have to deal with device names such as /dev/loop84763874658743 or /dev/sdzbghz just because their system has been running for a while. > The better alternative here would be to extend the loop ioctl to pass in an > UUID when allocating the device. > That way you can easily figure out whether the loop device has been > modified. UUIDs instead of sequence numbers would mostly solve our probelms too. i.e. chaotic, randomized assignment of identifiers instead of linearly progressing assignment of idenitifers. However I prefer sequence numbers as discussed in this thread before: they allow us to derive ordering from things: thus if you see an uevent with a seqnum smaller than the one you are interested in you know its worth waiting for the ones you are looking for to appear. But if you see a uevent with a seqnum greater than the one you are interested in then you know it's pointless to wait, the device has already been acquired by someone else. With randomized UUIDs you can't know that, since uses by other participants are only recognizable as distinct from your own but don't tell you if they are earlier or later than your own. After all the AF_NETLINK/uevent socket is lossy, so you must be prepared for dropped messages, hence it's reat if we can easily resync when your own messages get dropped. Lennart -- Lennart Poettering, Berlin
next prev parent reply other threads:[~2021-06-23 15:49 UTC|newest] Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-06-23 10:58 [PATCH v3 0/6] block: add a sequence number to disks Matteo Croce 2021-06-23 10:58 ` [PATCH v3 1/6] block: add disk sequence number Matteo Croce 2021-06-23 11:48 ` Christoph Hellwig 2021-06-23 13:10 ` Matteo Croce 2021-06-23 13:51 ` Lennart Poettering 2021-06-23 14:01 ` Hannes Reinecke 2021-06-23 14:07 ` Luca Boccassi 2021-06-23 14:21 ` Hannes Reinecke 2021-06-23 14:34 ` Luca Boccassi 2021-06-23 14:55 ` Lennart Poettering 2021-06-23 14:12 ` Lennart Poettering 2021-06-23 15:02 ` Hannes Reinecke 2021-06-23 15:34 ` Luca Boccassi 2021-06-23 15:48 ` Lennart Poettering [this message] 2021-06-23 14:28 ` Christoph Hellwig 2021-06-23 10:58 ` [PATCH v3 2/6] block: add ioctl to read the " Matteo Croce 2021-06-23 10:58 ` [PATCH v3 3/6] block: refactor sysfs code Matteo Croce 2021-06-23 11:52 ` Christoph Hellwig 2021-06-23 19:03 ` Matteo Croce 2021-06-24 6:12 ` Christoph Hellwig 2021-06-23 10:58 ` [PATCH v3 4/6] block: export diskseq in sysfs Matteo Croce 2021-06-23 10:58 ` [PATCH v3 5/6] block: increment sequence number Matteo Croce 2021-06-23 10:58 ` [PATCH v3 6/6] loop: " Matteo Croce 2021-06-23 11:57 ` Christoph Hellwig 2021-06-23 13:13 ` Luca Boccassi 2021-06-23 14:25 ` Christoph Hellwig 2021-06-23 15:29 ` Lennart Poettering 2021-06-24 6:11 ` Christoph Hellwig 2021-06-23 12:03 ` [PATCH v3 0/6] block: add a sequence number to disks Hannes Reinecke 2021-06-23 12:46 ` Luca Boccassi 2021-06-23 14:07 ` Lennart Poettering
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YNNX6U5Ui95ZEJnw@gardel-login \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: [PATCH v3 1/6] block: add disk sequence number' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).