xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] libxl: support common cases without block script
@ 2021-04-27  0:22 Marek Marczykowski-Górecki
  2021-04-27  0:22 ` [RFC PATCH 1/2] libxl: rename 'error' label to 'out' as it is used for success too Marek Marczykowski-Górecki
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-04-27  0:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Marek Marczykowski-Górecki, Ian Jackson, Wei Liu, Anthony PERARD

This series in an attempt to speed up the domain start by removing slow block
script from the picture. The current RFC covers the simplest possible case only
- target being a block device directly. This case does not require locking at
all. Further version will cover also setting up a loop device.

This, compared to the default block script, saves about 0.5s of domain start
time, per disk. Similar speedup can be achieved with a trivial lock-less script
too. But for file-based disks, it won't be that simple with a script - setting
up a loop device lock-less is tricky and ability to keep an FD open and call
different ioctls on it greatly helps. Furthermore reusing the same loop device
for the same file can be done significantly better with a cache (which can be
stored in the libxl hosting process - like xl devd, or libvirt).

This surely isn't the only option to improve disk setup time, but is a very
atractive one. Few questions:
1. Is it acceptable approach at all?
2. Is empty 'script' parameter value going to fly? Unfortunately, NULL is
   already taken as "use default".

Marek Marczykowski-Górecki (2):
  libxl: rename 'error' label to 'out' as it is used for success too
  libxl: allow to skip block script completely

 docs/man/xl-disk-configuration.5.pod.in |  4 ++-
 tools/libs/light/libxl_disk.c           |  7 ++-
 tools/libs/light/libxl_linux.c          | 68 ++++++++++++++++++++++++--
 3 files changed, 74 insertions(+), 5 deletions(-)

base-commit: bea65a212c0581520203b6ad0d07615693f42f73
-- 
git-series 0.9.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH 1/2] libxl: rename 'error' label to 'out' as it is used for success too
  2021-04-27  0:22 [RFC PATCH 0/2] libxl: support common cases without block script Marek Marczykowski-Górecki
@ 2021-04-27  0:22 ` Marek Marczykowski-Górecki
  2021-04-27  0:22 ` [RFC PATCH 2/2] libxl: allow to skip block script completely Marek Marczykowski-Górecki
  2021-04-28  6:48 ` [RFC PATCH 0/2] libxl: support common cases without block script Demi Marie Obenour
  2 siblings, 0 replies; 5+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-04-27  0:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Marek Marczykowski-Górecki, Ian Jackson, Wei Liu, Anthony PERARD

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
 tools/libs/light/libxl_linux.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/libs/light/libxl_linux.c b/tools/libs/light/libxl_linux.c
index 8d62dfd255cb..cc8baf5c3eae 100644
--- a/tools/libs/light/libxl_linux.c
+++ b/tools/libs/light/libxl_linux.c
@@ -174,14 +174,14 @@ static int libxl__hotplug_disk(libxl__gc *gc, libxl__device *dev,
         LOGEVD(ERROR, errno, dev->domid,
                "unable to read script from %s", be_path);
         rc = ERROR_FAIL;
-        goto error;
+        goto out;
     }
 
     *env = get_hotplug_env(gc, script, dev);
     if (!*env) {
         LOGD(ERROR, dev->domid, "Failed to get hotplug environment");
         rc = ERROR_FAIL;
-        goto error;
+        goto out;
     }
 
     const int arraysize = 3;
@@ -194,7 +194,7 @@ static int libxl__hotplug_disk(libxl__gc *gc, libxl__device *dev,
     LOGD(DEBUG, dev->domid, "Args and environment ready");
     rc = 1;
 
-error:
+out:
     return rc;
 }
 
-- 
git-series 0.9.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH 2/2] libxl: allow to skip block script completely
  2021-04-27  0:22 [RFC PATCH 0/2] libxl: support common cases without block script Marek Marczykowski-Górecki
  2021-04-27  0:22 ` [RFC PATCH 1/2] libxl: rename 'error' label to 'out' as it is used for success too Marek Marczykowski-Górecki
@ 2021-04-27  0:22 ` Marek Marczykowski-Górecki
  2021-04-28  6:48 ` [RFC PATCH 0/2] libxl: support common cases without block script Demi Marie Obenour
  2 siblings, 0 replies; 5+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-04-27  0:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Marek Marczykowski-Górecki, Ian Jackson, Wei Liu, Anthony PERARD

Default block script is quite slow and requires global lock which slows
it down even more (for domains with multiple disks). The common case of
a block device-based disk is trivial to handle and does not require
locking. This can be handled directly within libxl, to avoid slow script
execution and waiting for it to finish. This, depending on the hardware,
may save about 0.5s of domain start time per disk.

Allow setting script name to empty string to skip executing the script
at all, and use target name as a block device path directly.

This does skip two functions of the block script:
 - checking if device isn't used anywhere else (including mounted in
   dom0)
 - setting up loop device for a file-based disk

The former is expected to be done by the operator manually (or by a
higher level management stack, that calls into libxl). The later is a
limitation of the current implementation, but should be possible to
extend in the future.

The code to fill 'physical-device' xenstore node is added via
libxl__hotplug_disk() (in libxl_linux.c) intentionally. This code is
called in device backend domain (which may be not dom0), contrary to
device_disk_add() which fills all the other xenstore entries, but is
called always in the toolstack domain (dom0).
libxl__hoplug_disk() is called from libxl__get_hotplug_script_info(),
which may not sound like the most logical place to actually change
some state, but it is called when we need it.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
 docs/man/xl-disk-configuration.5.pod.in |  4 ++-
 tools/libs/light/libxl_disk.c           |  7 ++-
 tools/libs/light/libxl_linux.c          | 62 ++++++++++++++++++++++++++-
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl-disk-configuration.5.pod.in b/docs/man/xl-disk-configuration.5.pod.in
index 71d0e86e3d63..3b3b3c329e13 100644
--- a/docs/man/xl-disk-configuration.5.pod.in
+++ b/docs/man/xl-disk-configuration.5.pod.in
@@ -261,6 +261,10 @@ information to be interpreted by the executable program I<SCRIPT>,
 
 These scripts are normally called "block-I<SCRIPT>".
 
+Setting empty script avoids calling the hoplug script at all (including
+the default one). It skips device sharing safety check and
+requires the target to point at a block device. Empty script value is supported
+on Linux backend domain only.
 
 =item B<direct-io-safe>
 
diff --git a/tools/libs/light/libxl_disk.c b/tools/libs/light/libxl_disk.c
index 411ffeaca6ce..4278db449a2f 100644
--- a/tools/libs/light/libxl_disk.c
+++ b/tools/libs/light/libxl_disk.c
@@ -324,8 +324,11 @@ static void device_disk_add(libxl__egc *egc, uint32_t domid,
                 flexarray_append(back, "params");
                 flexarray_append(back, dev);
 
-                script = libxl__abs_path(gc, disk->script?: "block",
-                                         libxl__xen_script_dir_path());
+                if (disk->script && !disk->script[0])
+                    script = "";
+                else
+                    script = libxl__abs_path(gc, disk->script?: "block",
+                                             libxl__xen_script_dir_path());
                 flexarray_append_pair(back, "script", script);
 
                 assert(device->backend_kind == LIBXL__DEVICE_KIND_VBD);
diff --git a/tools/libs/light/libxl_linux.c b/tools/libs/light/libxl_linux.c
index cc8baf5c3eae..8832f2c6483a 100644
--- a/tools/libs/light/libxl_linux.c
+++ b/tools/libs/light/libxl_linux.c
@@ -160,6 +160,62 @@ out:
     return rc;
 }
 
+static int libxl__hotplug_disk_direct_add(libxl__gc *gc, libxl__device *dev,
+                                      const char *be_path)
+{
+    char *params;
+    xs_transaction_t t = XBT_NULL;
+    struct stat st;
+    int rc = 0;
+    char *xs_kvs[] = { "physical-device", NULL,
+                       "physical-device-path", NULL,
+                       "hotplug-status", "connected",
+                       NULL, NULL, };
+
+    for (;;) {
+        rc = libxl__xs_transaction_start(gc, &t);
+        if (rc) goto error;
+
+        params = libxl__xs_read(gc, t,
+                                GCSPRINTF("%s/%s", be_path, "params"));
+        if (!params) {
+            rc = ERROR_FAIL;
+            goto error;
+        }
+
+        if (stat(params, &st) == -1) {
+            LOGED(ERROR, dev->domid, "failed to stat %s", params);
+            rc = ERROR_FAIL;
+            goto error;
+        }
+
+        if ((st.st_mode & S_IFMT) != S_IFBLK) {
+            LOGD(ERROR, dev->domid, "not a block device: %s", params);
+            rc = ERROR_FAIL;
+            goto error;
+        }
+
+        xs_kvs[1] = GCSPRINTF("%x:%x", major(st.st_rdev), minor(st.st_rdev));
+        xs_kvs[3] = params;
+        rc = libxl__xs_writev(gc, t, be_path, xs_kvs);
+        if (rc) goto error;
+
+        rc = libxl__xs_transaction_commit(gc, &t);
+        if (!rc) break;
+        if (rc < 0) goto error;
+    }
+
+    return 0;
+
+error:
+    libxl__xs_transaction_abort(gc, &t);
+    if (libxl__xs_write_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/%s", be_path, "hotplug-status"),
+                                "error"))
+        LOGD(ERROR, dev->domid, "failed to write 'hotplug-status'");
+    return rc;
+}
+
 static int libxl__hotplug_disk(libxl__gc *gc, libxl__device *dev,
                                char ***args, char ***env,
                                libxl__device_action action)
@@ -177,6 +233,12 @@ static int libxl__hotplug_disk(libxl__gc *gc, libxl__device *dev,
         goto out;
     }
 
+    if (!script[0]) {
+        if (action == LIBXL__DEVICE_ACTION_ADD)
+            rc = libxl__hotplug_disk_direct_add(gc, dev, be_path);
+        goto out;
+    }
+
     *env = get_hotplug_env(gc, script, dev);
     if (!*env) {
         LOGD(ERROR, dev->domid, "Failed to get hotplug environment");
-- 
git-series 0.9.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 0/2] libxl: support common cases without block script
  2021-04-27  0:22 [RFC PATCH 0/2] libxl: support common cases without block script Marek Marczykowski-Górecki
  2021-04-27  0:22 ` [RFC PATCH 1/2] libxl: rename 'error' label to 'out' as it is used for success too Marek Marczykowski-Górecki
  2021-04-27  0:22 ` [RFC PATCH 2/2] libxl: allow to skip block script completely Marek Marczykowski-Górecki
@ 2021-04-28  6:48 ` Demi Marie Obenour
  2021-04-28 12:26   ` Jason Andryuk
  2 siblings, 1 reply; 5+ messages in thread
From: Demi Marie Obenour @ 2021-04-28  6:48 UTC (permalink / raw)
  To: Marek Marczycowski-Górecki
  Cc: Ian Jackson, Wei Liu, Anthony PERARD, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 2485 bytes --]

When it comes to file-based block devices, the major difficulty is
the extremely bad kernel API.  The only fully safe way to use loop
devices is to use LOOP_CONFIGURE with LO_FLAGS_AUTOCLEAR and hold a
file descriptor open to the device until another piece of code (either
another userspace program or the kernel) has grabbed a reference to it.
Everything else risks either using a freed loop device (that might now
be attached to a different file) or risks leaking them on unclean exit.
The only exception is if one can make certain assumptions, such as no
other program freeing loop devices for the file in question.  This is
a reasonable assumption for Qubes dom0, but neither for Qubes domU nor
for Xen dom0 in general.  Nevertheless, this is effectively what the
current block script does: if I understand the code correctly, there
is a race where badly timed calls to losetup by another process could
result in the block script freeing the wrong loop device.

Worse, writes to XenStore only cause Linux to take a reference to
the device at some unspecified point in the future, rather than
synchronously.  It takes a major and minor number, which means we
need to hold a reference to the relevant loop device ourselves.
FreeBSD solves this by having XenStore include a path to the device
and/or regular file, but on Linux this leads to awkward issues with
namespaces.  Instead, I recommend that Linux gain an ioctl-based
interface in the future, which takes a file descriptor to the device
to use.  The kernel would then do the writes itself.

Thankfully, not all hope is lost, even with the current kernel API.
We can use sd_pid_notify_with_fds to stash the file descriptors in PID
1, which will never exit.  We can give those file descriptors a name,
so that we know which is which if we are restarted.  And we can close
devices that we know are not in use by any VMs.  The cache will allow
us to avoid duplicating devices, which is actually quite important ―
QubesOS doesn’t want each qube to have a separate file descriptor 
for
its kernel, for example.

Initially, I recommend focusing on handle the case where the process
using libxl is not restarted.  That is the simpler case, by far.
I suggest starting by just setting up a loop device prior to attaching
it, and destroying it when the device is detached.  Caching can be
added as the next step.
-- 
Demi Marie Obenour
she/her/hers
QubesOS Developer, Invisible Things Lab

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 4941 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 0/2] libxl: support common cases without block script
  2021-04-28  6:48 ` [RFC PATCH 0/2] libxl: support common cases without block script Demi Marie Obenour
@ 2021-04-28 12:26   ` Jason Andryuk
  0 siblings, 0 replies; 5+ messages in thread
From: Jason Andryuk @ 2021-04-28 12:26 UTC (permalink / raw)
  To: Demi Marie Obenour
  Cc: Marek Marczycowski-Górecki, Ian Jackson, Wei Liu,
	Anthony PERARD, xen-devel

On Wed, Apr 28, 2021 at 3:00 AM Demi Marie Obenour
<demi@invisiblethingslab.com> wrote:
>
> When it comes to file-based block devices, the major difficulty is
> the extremely bad kernel API.  The only fully safe way to use loop
> devices is to use LOOP_CONFIGURE with LO_FLAGS_AUTOCLEAR and hold a
> file descriptor open to the device until another piece of code (either
> another userspace program or the kernel) has grabbed a reference to it.
> Everything else risks either using a freed loop device (that might now
> be attached to a different file) or risks leaking them on unclean exit.
> The only exception is if one can make certain assumptions, such as no
> other program freeing loop devices for the file in question.  This is
> a reasonable assumption for Qubes dom0, but neither for Qubes domU nor
> for Xen dom0 in general.  Nevertheless, this is effectively what the
> current block script does: if I understand the code correctly, there
> is a race where badly timed calls to losetup by another process could
> result in the block script freeing the wrong loop device.

I posted this a while ago, but didn't get any response:

https://lore.kernel.org/xen-devel/CAKf6xpv-U91nF2Fik7GRN3SFeOWWcdR5R+ZcK5fgojE+-D43sg@mail.gmail.com/

tl;dr: AFAICT, the block script check_sharing function doesn't work
for loop devices

Regards,
Jason


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-04-28 12:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-27  0:22 [RFC PATCH 0/2] libxl: support common cases without block script Marek Marczykowski-Górecki
2021-04-27  0:22 ` [RFC PATCH 1/2] libxl: rename 'error' label to 'out' as it is used for success too Marek Marczykowski-Górecki
2021-04-27  0:22 ` [RFC PATCH 2/2] libxl: allow to skip block script completely Marek Marczykowski-Górecki
2021-04-28  6:48 ` [RFC PATCH 0/2] libxl: support common cases without block script Demi Marie Obenour
2021-04-28 12:26   ` Jason Andryuk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).