linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lennart Poettering <mzxreary@0pointer.de>
To: Hannes Reinecke <hare@suse.de>
Cc: Luca Boccassi <bluca@debian.org>,
	Matteo Croce <mcroce@linux.microsoft.com>,
	Christoph Hellwig <hch@infradead.org>,
	linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Jens Axboe <axboe@kernel.dk>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Damien Le Moal <damien.lemoal@wdc.com>, Tejun Heo <tj@kernel.org>,
	Javier Gonz??lez <javier@javigon.com>,
	Niklas Cassel <niklas.cassel@wdc.com>,
	Johannes Thumshirn <johannes.thumshirn@wdc.com>,
	Matthew Wilcox <willy@infradead.org>,
	JeffleXu <jefflexu@linux.alibaba.com>
Subject: Re: [PATCH v3 1/6] block: add disk sequence number
Date: Wed, 23 Jun 2021 16:55:14 +0200	[thread overview]
Message-ID: <YNNLUnGMO/gNNIJK@gardel-login> (raw)
In-Reply-To: <f84cab19-fb5c-634b-d1ca-51404907a623@suse.de>

On Mi, 23.06.21 16:21, Hannes Reinecke (hare@suse.de) wrote:

> > We need this so that we can reliably correlate events to instances of a
> > device. Events alone cannot solve this problem, because events _are_
> > the problem.
> >
> In which sense?
> Yes, events can be delayed (if you list to uevents), but if you listen to
> kernel events there shouldn't be a delay, right?

uevents are delivered to userpace via an AF_NETLINK socket. The
AF_NETLINK socket is basically an asynchronous buffer.

I mean, consider what you are saying: you establish the AF_NETLINK
uevent watching socket, then you allocate /dev/loop0. Since you cannot
do that atomically, you'll first have to do one, and then the
other. But if you do that in two steps, then in the middle some other
process might get scheduled that quickly allocates /dev/loop0 and
releases it again, before your code gets to run. So now you have in
your AF_NETLINK socket buffer the uevents for that other process' use
of the device, and you cannot sanely distinguish them from your own.

of course you could do it the other way: allocate the device first,
and only then allocate the AF_NETLINK uevent socket. But then you
might or might not lose the "add" event for the device you just
allocated. And you don't know if you should wait for it or not.

This isn't even a constructed issue, this is the common case if you
have multiple processes all simultaneously trying to acquire a loopback
block device, because they all will end up eying /dev/loop0 at the same time.

But it gets worse IRL because of various factors. For example,
partition probing is asynchronous, so if you use LO_FLAGS_PARTSCAN and
want to watch for some partition device associated to your loopback
block device to show up, this can take *really* long, so the race
window is large. Or you actually use udev (like most userspace
probably should) because you want the metainfo it collects about the
device, in which case it will take even longer for the uevent to reach
you, i.e. the time window where a previous user's uevents and your own
for the same loopback device "overlap" can be quite large and you
cannot determine if they are yours or the previous user's uevents —
unless we have these new sequence numbers.

Lennart

--
Lennart Poettering, Berlin

  parent reply	other threads:[~2021-06-23 14:55 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-23 10:58 [PATCH v3 0/6] block: add a sequence number to disks Matteo Croce
2021-06-23 10:58 ` [PATCH v3 1/6] block: add disk sequence number Matteo Croce
2021-06-23 11:48   ` Christoph Hellwig
2021-06-23 13:10     ` Matteo Croce
2021-06-23 13:51       ` Lennart Poettering
2021-06-23 14:01         ` Hannes Reinecke
2021-06-23 14:07           ` Luca Boccassi
2021-06-23 14:21             ` Hannes Reinecke
2021-06-23 14:34               ` Luca Boccassi
2021-06-23 14:55               ` Lennart Poettering [this message]
2021-06-23 14:12           ` Lennart Poettering
2021-06-23 15:02             ` Hannes Reinecke
2021-06-23 15:34               ` Luca Boccassi
2021-06-23 15:48               ` Lennart Poettering
2021-06-23 14:28       ` Christoph Hellwig
2021-06-23 10:58 ` [PATCH v3 2/6] block: add ioctl to read the " Matteo Croce
2021-06-23 10:58 ` [PATCH v3 3/6] block: refactor sysfs code Matteo Croce
2021-06-23 11:52   ` Christoph Hellwig
2021-06-23 19:03     ` Matteo Croce
2021-06-24  6:12       ` Christoph Hellwig
2021-06-23 10:58 ` [PATCH v3 4/6] block: export diskseq in sysfs Matteo Croce
2021-06-23 10:58 ` [PATCH v3 5/6] block: increment sequence number Matteo Croce
2021-06-23 10:58 ` [PATCH v3 6/6] loop: " Matteo Croce
2021-06-23 11:57   ` Christoph Hellwig
2021-06-23 13:13     ` Luca Boccassi
2021-06-23 14:25       ` Christoph Hellwig
2021-06-23 15:29         ` Lennart Poettering
2021-06-24  6:11           ` Christoph Hellwig
2021-06-23 12:03 ` [PATCH v3 0/6] block: add a sequence number to disks Hannes Reinecke
2021-06-23 12:46   ` Luca Boccassi
2021-06-23 14:07   ` Lennart Poettering

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YNNLUnGMO/gNNIJK@gardel-login \
    --to=mzxreary@0pointer.de \
    --cc=axboe@kernel.dk \
    --cc=bluca@debian.org \
    --cc=damien.lemoal@wdc.com \
    --cc=hare@suse.de \
    --cc=hch@infradead.org \
    --cc=javier@javigon.com \
    --cc=jefflexu@linux.alibaba.com \
    --cc=johannes.thumshirn@wdc.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcroce@linux.microsoft.com \
    --cc=niklas.cassel@wdc.com \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).