All of lore.kernel.org
 help / color / mirror / Atom feed
From: Demi Marie Obenour <demi@invisiblethingslab.com>
To: James Bottomley <James.Bottomley@hansenpartnership.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux Block Mailing List <linux-block@vger.kernel.org>,
	Linux Filesystem Mailing List <linux-fsdevel@vger.kernel.org>
Subject: Re: Race-free block device opening
Date: Sat, 7 May 2022 07:35:45 -0400	[thread overview]
Message-ID: <YnZZlR7BV/cyn8xS@itl-email> (raw)
In-Reply-To: <d134571f381868e1cec74aca905012d8aec9fec8.camel@HansenPartnership.com>

[-- Attachment #1: Type: text/plain, Size: 4409 bytes --]

On Wed, Apr 27, 2022 at 09:29:12AM -0400, James Bottomley wrote:
> On Tue, 2022-04-26 at 14:12 -0400, Demi Marie Obenour wrote:
> > Right now, opening block devices in a race-free way is incredibly
> > hard.
> 
> Could you be more specific about what the race you're having problems
> with is?  What is racing.

If I open /dev/mapper/qubes_dom0-vm--sys--net--private, it is possible
that something has destroyed the corresponding device and created a new
one with the same kernel name, *before* udev has managed to unlink the
device node.  As a result, I wind up opening the wrong device.

> > The only reasonable approach I know of is sd_device_new_from_path() +
> > sd_device_open(), and is only available in systemd git main.  It also
> > requires waiting on systemd-udev to have processed udev rules, which
> > can be a bottleneck.
> 
> This doesn't actually seem to be in my copy of systemd.

That’s because it is not in any release yet.

> >   There are better approaches in various special cases, such as using
> > device-mapper ioctls to check that the device one has opened still
> > has the name and/or UUID one expects.  However, none of them works
> > for a plain call to open(2).
> 
> Just so we're clear: if you call open on, say /dev/sdb1 and something
> happens to hot unplug and then replug a different device under that
> node, the file descriptor you got at open does *not* point to the new
> node.  It points to a dead device responder that errors everything.
> 
> The point being once you open() something, the file descriptor is
> guaranteed to point to the same device (or error).

That doesn’t help if the unplug and replug happens between passing the
path and udev having purged the now-stale symlink.

> > A much better approach would be for udev to point its symlinks at
> > "/dev/disk/by-diskseq/$DISKSEQ" for non-partition disk devices, or at
> > "/dev/disk/by-diskseq/${DISKSEQ}p${PARTITION}" for partitions.  A
> > filesystem would then be mounted at "/dev/disk/by-diskseq" that
> > provides for race-free opening of these paths.  This could be
> > implemented in userspace using FUSE, either with difficulty using the
> > current kernel API, or easily and efficiently using a new kernel API
> > for opening a block device by diskseq + partition.  However, I think
> > this should be handled by the Linux kernel itself.
> > 
> > What would be necessary to get this into the kernel?  I would like to
> > implement this, but I don’t have the time to do so anytime soon.  Is
> > anyone else interested in taking this on?  I suspect the kernel code
> > needed to implement this would be quite a bit smaller than the FUSE
> > implementation.
> 
> So it sounds like the problem is you want to be sure that the device
> doesn't change after you've called libblkid to identify it but before
> you call open?  If that's so, the way you do this in userspace is to
> call libblkid again after the open.  If the before and after id match,
> you're as sure as you can be the open was of the right device.

The devices I am working with are raw-format VM disks that contain
untrusted data.  They are identified not by their content, which the VM
has complete control over, but by various sysfs attributes such as
dm/name and dm/uuid.  And they need to be passed to interfaces, such as
libvirt and cryptsetup, that only accept device paths.

I can work around this in the case of cryptsetup by using the
libcryptsetup library and/or holding a file descriptor open, but neither
of those will work for libvirt since libvirtd is a separate process and
I cannot pass a file descriptor to it.  Furthermore, there is no way to
make libvirtd do any post-open() checking on the file descriptor it has
obtained.  While I plan to add a workaround in libxl and blkback for
loop and device-mapper devices, it is not reasonable to expect every
userspace tool to do the same.  

The approach I am suggesting avoids this problem entirely, because
/dev/mapper/qubes_dom0-vm--sys--net--private is now a symlink to a
device node under /dev/disk/by-diskseq/$DISKSEQ.  Those are never, ever
reused.  When the device goes away, the device node goes away too, and
so any attempt to open the symlink (without O_PATH|O_NOFOLLOW) gets
-ENOENT as it should.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

      reply	other threads:[~2022-05-07 11:36 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-26 18:12 Race-free block device opening Demi Marie Obenour
2022-04-26 18:35 ` Greg Kroah-Hartman
2022-04-26 21:31   ` Demi Marie Obenour
2022-04-26 22:07     ` Demi Marie Obenour
2022-04-27 13:29 ` James Bottomley
2022-05-07 11:35   ` Demi Marie Obenour [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YnZZlR7BV/cyn8xS@itl-email \
    --to=demi@invisiblethingslab.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.