xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Demi Marie Obenour <demi@invisiblethingslab.com>
To: "Marek Marczycowski-Górecki" <marmarek@invisiblethingslab.com>
Cc: Ian Jackson <iwj@xenproject.org>, Wei Liu <wl@xen.org>,
	Anthony PERARD <anthony.perard@citrix.com>,
Subject: Re: [RFC PATCH 0/2] libxl: support common cases without block script
Date: Wed, 28 Apr 2021 02:48:47 -0400	[thread overview]
Message-ID: <41538969-c59d-acc5-9eee-0dffca50d6ac@invisiblethingslab.com> (raw)
In-Reply-To: <cover.3a5d506462133586bd805b72a226916af6a33799.1619482896.git-series.marmarek@invisiblethingslab.com>

[-- Attachment #1.1.1: Type: text/plain, Size: 2485 bytes --]

When it comes to file-based block devices, the major difficulty is
the extremely bad kernel API.  The only fully safe way to use loop
devices is to use LOOP_CONFIGURE with LO_FLAGS_AUTOCLEAR and hold a
file descriptor open to the device until another piece of code (either
another userspace program or the kernel) has grabbed a reference to it.
Everything else risks either using a freed loop device (that might now
be attached to a different file) or risks leaking them on unclean exit.
The only exception is if one can make certain assumptions, such as no
other program freeing loop devices for the file in question.  This is
a reasonable assumption for Qubes dom0, but neither for Qubes domU nor
for Xen dom0 in general.  Nevertheless, this is effectively what the
current block script does: if I understand the code correctly, there
is a race where badly timed calls to losetup by another process could
result in the block script freeing the wrong loop device.

Worse, writes to XenStore only cause Linux to take a reference to
the device at some unspecified point in the future, rather than
synchronously.  It takes a major and minor number, which means we
need to hold a reference to the relevant loop device ourselves.
FreeBSD solves this by having XenStore include a path to the device
and/or regular file, but on Linux this leads to awkward issues with
namespaces.  Instead, I recommend that Linux gain an ioctl-based
interface in the future, which takes a file descriptor to the device
to use.  The kernel would then do the writes itself.

Thankfully, not all hope is lost, even with the current kernel API.
We can use sd_pid_notify_with_fds to stash the file descriptors in PID
1, which will never exit.  We can give those file descriptors a name,
so that we know which is which if we are restarted.  And we can close
devices that we know are not in use by any VMs.  The cache will allow
us to avoid duplicating devices, which is actually quite important ―
QubesOS doesn’t want each qube to have a separate file descriptor 
its kernel, for example.

Initially, I recommend focusing on handle the case where the process
using libxl is not restarted.  That is the simpler case, by far.
I suggest starting by just setting up a loop device prior to attaching
it, and destroying it when the device is detached.  Caching can be
added as the next step.
Demi Marie Obenour
QubesOS Developer, Invisible Things Lab

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 4941 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2021-04-28  7:00 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-27  0:22 Marek Marczykowski-Górecki
2021-04-27  0:22 ` [RFC PATCH 1/2] libxl: rename 'error' label to 'out' as it is used for success too Marek Marczykowski-Górecki
2021-04-27  0:22 ` [RFC PATCH 2/2] libxl: allow to skip block script completely Marek Marczykowski-Górecki
2021-04-28  6:48 ` Demi Marie Obenour [this message]
2021-04-28 12:26   ` [RFC PATCH 0/2] libxl: support common cases without block script Jason Andryuk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41538969-c59d-acc5-9eee-0dffca50d6ac@invisiblethingslab.com \
    --to=demi@invisiblethingslab.com \
    --cc=anthony.perard@citrix.com \
    --cc=iwj@xenproject.org \
    --cc=marmarek@invisiblethingslab.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    --subject='Re: [RFC PATCH 0/2] libxl: support common cases without block script' \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).