linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lennart Poettering <mzxreary@0pointer.de>
To: Christoph Hellwig <hch@infradead.org>
Cc: Luca Boccassi <bluca@debian.org>,
	Matteo Croce <mcroce@linux.microsoft.com>,
	linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Jens Axboe <axboe@kernel.dk>,
	linux-kernel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Damien Le Moal <damien.lemoal@wdc.com>, Tejun Heo <tj@kernel.org>,
	Javier Gonz??lez <javier@javigon.com>,
	Niklas Cassel <niklas.cassel@wdc.com>,
	Johannes Thumshirn <johannes.thumshirn@wdc.com>,
	Hannes Reinecke <hare@suse.de>,
	Matthew Wilcox <willy@infradead.org>,
	JeffleXu <jefflexu@linux.alibaba.com>
Subject: Re: [PATCH v3 6/6] loop: increment sequence number
Date: Wed, 23 Jun 2021 17:29:17 +0200	[thread overview]
Message-ID: <YNNTTUYRlpXDqMgX@gardel-login> (raw)
In-Reply-To: <YNNEdbr+0p+PzinQ@infradead.org>

On Mi, 23.06.21 15:25, Christoph Hellwig (hch@infradead.org) wrote:

> On Wed, Jun 23, 2021 at 02:13:25PM +0100, Luca Boccassi wrote:
> > On Wed, 2021-06-23 at 12:57 +0100, Christoph Hellwig wrote:
> > > On Wed, Jun 23, 2021 at 12:58:58PM +0200, Matteo Croce wrote:
> > > > From: Matteo Croce <mcroce@microsoft.com>
> > > >
> > > > On a very loaded system, if there are many events queued up from multiple
> > > > attach/detach cycles, it's impossible to match them up with the
> > > > LOOP_CONFIGURE or LOOP_SET_FD call, since we don't know where the position
> > > > of our own association in the queue is[1].
> > > > Not even an empty uevent queue is a reliable indication that we already
> > > > received the uevent we were waiting for, since with multi-partition block
> > > > devices each partition's event is queued asynchronously and might be
> > > > delivered later.
> > > >
> > > > Increment the disk sequence number when setting or changing the backing
> > > > file, so the userspace knows which backing file generated the event:
> > >
> > > Instead of manually incrementing the sequence here, can we make loop
> > > generate the DISK_EVENT_MEDIA_CHANGE event on a backing device (aka
> > > media) change?
> >
> > Hi,
> >
> > This was answered in the v1 thread:
> >
> > https://lore.kernel.org/linux-fsdevel/20210315201331.GA2577561@casper.infradead.org/t/#m8a677028572e826352cbb1e19d1b9c1f3b6bff4b
> >
> > The fundamental issue is that we'd be back at trying to correlate
> > events to loopdev instances, which does not work reliably - hence this
> > patch series. With the new ioctl, we can get the id immediately and
> > without delay when we create the device, with no possible races. Then
> > we can handle events reliably, as we can correlate correctly in all
> > cases.
>
> I very much disagree with your reply there.  The device now points to
> a different media.  Both for the loop device, a floppy or a CD changer
> probably by some kind of user action.  In the last cast it might even
> by done entirely locally through a script just like the loop device.

I am not sure I grok your point.

but let me try to explain why I think it's better to make media
changes *also* trigger seqno changes, and not make media change events
the *only* way to trigger seqno changes.

1. First of all, loopback devices currently don't hook into the media
   change logic (which so far is focussed on time-based polling
   actually, for both CDs and floppies). Adding this would change
   semantics visibly to userspace (since userspace would suddenly see
   another action=change + DISK_MEDIA_CHANGE=1 uevent suddenly that it
   needs to handle correctly). One can certainly argue that userspace
   must be ready to get additional uevents like this any time, but
   knowing userspace a bit I am pretty sure this will confuse some
   userspace that doesn't expect this. I mean loopback block devices
   already issue "change" uevents on attachment and detachment, one
   that userpace typically expects, but adding the media change one
   probably means sending two (?) of these out for each
   attachment. One being the old one from the loopback device itself,
   and then another one for the media change from the mdia change
   logic. That's not just noisy, but also ugly.

2. We want seqnums to be allocated for devices not only when doing
   media change (e.g. when attaching or detaching a loopback device)
   but also when allocating a block device, so that even before the
   first media change event a block device has a sequence number. This
   means allocating a sequence number for block devices won't be
   limited to the media change code anyway.

3. Doing the sequence number bumping in media change code exclusively
   kinda suggests this was something we could properly abstract away,
   to be done only there, and that the rest of the block subsystems
   wouldn#t have to bother much. But I am pretty sure that's not
   correct: in particular in the loopback block device code (but in
   other block subsystems too I guess) we really want to be able to
   atomically attach a loopback block device and return the seqnum of
   that very attachmnt so that we can focus on uevents for it. Thus,
   attachment, allocation and returning the seqnum to userspace in the
   LOOP_CONFIGURE ioctl (or whatever is appropriate) kinda go hand in
   hand.

4. The media change protocol follows a protocol along with the eject
   button handling (DISK_EVENT_EJECT_REQUEST), the locking of the tray
   and the time based polling. i.e. it's specifically focussed on
   certain hw features, none of which really apply to loopback block
   devices, which have no trays to lock, but eject buttons to handle,
   and where media change is always triggered internally by privileged
   code, never externally by the user. I doubt it makes sense to mix
   these up. Currently udev+udisks in userspace implement that
   procotol for cdroms/floppies, and I doubt we would want to bother
   to extend this for loopback devices in userspace. In fact, if media
   change events are added to loopback devices, we probably would have
   to change userspace to ignore them.

TLDR: making loopback block devices generate media is a (probably
minor, but unclear) API break, probably means duplicating uevent
traffic for it, won't abstract things away anyway, won't allocate a
seqnum on device allocation, and won't actually use anything of the
real media change functionality.

Or to say this differently: if the goal is to make loopback block
devices to send CDROM compatible media change events to userspace,
then I think it would probably make more sense to attach the
DISK_MEDIA_CHANGE=1 property to the attachment/detachment uevents the
loopback devices *already* generate, rather than to try to shoehorn the
existing media change infrastructure into the loopback device logic,
trying to reuse it even though nothing of it is really needed.

But that said, I am not aware of anyone ever asking for CDROM
compatible EDISK_MEDIA_CHANGE=1 uevents for loopback devices. I really
wouldn't bother with that. Sounds like nothing anyone want with a
chance of breaking things everywhere. (i.e. remember the
"bind"/"unbind" uevent fiasco?)

Lennart

--
Lennart Poettering, Berlin

  reply	other threads:[~2021-06-23 15:29 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-23 10:58 [PATCH v3 0/6] block: add a sequence number to disks Matteo Croce
2021-06-23 10:58 ` [PATCH v3 1/6] block: add disk sequence number Matteo Croce
2021-06-23 11:48   ` Christoph Hellwig
2021-06-23 13:10     ` Matteo Croce
2021-06-23 13:51       ` Lennart Poettering
2021-06-23 14:01         ` Hannes Reinecke
2021-06-23 14:07           ` Luca Boccassi
2021-06-23 14:21             ` Hannes Reinecke
2021-06-23 14:34               ` Luca Boccassi
2021-06-23 14:55               ` Lennart Poettering
2021-06-23 14:12           ` Lennart Poettering
2021-06-23 15:02             ` Hannes Reinecke
2021-06-23 15:34               ` Luca Boccassi
2021-06-23 15:48               ` Lennart Poettering
2021-06-23 14:28       ` Christoph Hellwig
2021-06-23 10:58 ` [PATCH v3 2/6] block: add ioctl to read the " Matteo Croce
2021-06-23 10:58 ` [PATCH v3 3/6] block: refactor sysfs code Matteo Croce
2021-06-23 11:52   ` Christoph Hellwig
2021-06-23 19:03     ` Matteo Croce
2021-06-24  6:12       ` Christoph Hellwig
2021-06-23 10:58 ` [PATCH v3 4/6] block: export diskseq in sysfs Matteo Croce
2021-06-23 10:58 ` [PATCH v3 5/6] block: increment sequence number Matteo Croce
2021-06-23 10:58 ` [PATCH v3 6/6] loop: " Matteo Croce
2021-06-23 11:57   ` Christoph Hellwig
2021-06-23 13:13     ` Luca Boccassi
2021-06-23 14:25       ` Christoph Hellwig
2021-06-23 15:29         ` Lennart Poettering [this message]
2021-06-24  6:11           ` Christoph Hellwig
2021-06-23 12:03 ` [PATCH v3 0/6] block: add a sequence number to disks Hannes Reinecke
2021-06-23 12:46   ` Luca Boccassi
2021-06-23 14:07   ` Lennart Poettering

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YNNTTUYRlpXDqMgX@gardel-login \
    --to=mzxreary@0pointer.de \
    --cc=axboe@kernel.dk \
    --cc=bluca@debian.org \
    --cc=damien.lemoal@wdc.com \
    --cc=hare@suse.de \
    --cc=hch@infradead.org \
    --cc=javier@javigon.com \
    --cc=jefflexu@linux.alibaba.com \
    --cc=johannes.thumshirn@wdc.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcroce@linux.microsoft.com \
    --cc=niklas.cassel@wdc.com \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).