qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] migration/mapped-ram: Add direct-io support
@ 2024-04-26 14:20 Fabiano Rosas
  2024-04-26 14:20 ` [PATCH 1/9] monitor: Honor QMP request for fd removal immediately Fabiano Rosas
                   ` (9 more replies)
  0 siblings, 10 replies; 57+ messages in thread
From: Fabiano Rosas @ 2024-04-26 14:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana, Jim Fehlig

Hi everyone, here's the rest of the migration "mapped-ram" feature
that didn't get merged for 9.0. This series adds support for direct
I/O, the missing piece to get the desired performance improvements.

There's 3 parts to this:

1- The plumbing for the new "direct-io" migration parameter. With this
   we can already use direct-io with the file transport + multifd +
   mapped-ram. Patches 1-3.

Due to the alignment requirements of O_DIRECT and the fact that
multifd runs the channels in parallel with the migration thread, we
must open the migration file two times, one with O_DIRECT set and
another with it clear.

If the user is not passing in a file name which QEMU can open at will,
we must then require that the user pass the two file descriptors with
the flags already properly set. We'll use the already existing fdset +
QMP add-fd infrastructure for this.

2- Changes to the fdset infrastructure to support O_DIRECT. We need
   those to be able to select from the user-provided fdset the file
   descriptor that contains the O_DIRECT flag. Patches 4-5.

3- Some fdset validation to make sure the two-fds requirement is being
   met. Patches 6-7.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1269352083

Fabiano Rosas (9):
  monitor: Honor QMP request for fd removal immediately
  migration: Fix file migration with fdset
  tests/qtest/migration: Fix file migration offset check
  migration: Add direct-io parameter
  migration/multifd: Add direct-io support
  tests/qtest/migration: Add tests for file migration with direct-io
  monitor: fdset: Match against O_DIRECT
  migration: Add support for fdset with multifd + file
  tests/qtest/migration: Add a test for mapped-ram with passing of fds

 docs/devel/migration/main.rst       |  18 +++
 docs/devel/migration/mapped-ram.rst |   6 +-
 include/qemu/osdep.h                |   2 +
 migration/file.c                    | 108 ++++++++++++++-
 migration/migration-hmp-cmds.c      |  11 ++
 migration/migration.c               |  23 ++++
 migration/options.c                 |  30 +++++
 migration/options.h                 |   1 +
 monitor/fds.c                       |  13 +-
 qapi/migration.json                 |  18 ++-
 tests/qtest/migration-helpers.c     |  42 ++++++
 tests/qtest/migration-helpers.h     |   1 +
 tests/qtest/migration-test.c        | 202 +++++++++++++++++++++++++++-
 util/osdep.c                        |   9 ++
 14 files changed, 465 insertions(+), 19 deletions(-)


base-commit: a118c4aff4087eafb68f7132b233ad548cf16376
-- 
2.35.3



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 1/9] monitor: Honor QMP request for fd removal immediately
  2024-04-26 14:20 [PATCH 0/9] migration/mapped-ram: Add direct-io support Fabiano Rosas
@ 2024-04-26 14:20 ` Fabiano Rosas
  2024-05-03 16:02   ` Peter Xu
  2024-05-08  7:17   ` Daniel P. Berrangé
  2024-04-26 14:20 ` [PATCH 2/9] migration: Fix file migration with fdset Fabiano Rosas
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 57+ messages in thread
From: Fabiano Rosas @ 2024-04-26 14:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana, Jim Fehlig

We're enabling using the fdset interface to pass file descriptors for
use in the migration code. Since migrations can happen more than once
during the VMs lifetime, we need a way to remove an fd from the fdset
at the end of migration.

The current code only removes an fd from the fdset if the VM is
running. This causes a QMP call to "remove-fd" to not actually remove
the fd if the VM happens to be stopped.

While the fd would eventually be removed when monitor_fdset_cleanup()
is called again, the user request should be honored and the fd
actually removed. Calling remove-fd + query-fdset shows a recently
removed fd still present.

The runstate_is_running() check was introduced by commit ebe52b592d
("monitor: Prevent removing fd from set during init"), which by the
shortlog indicates that they were trying to avoid removing an
yet-unduplicated fd too early.

I don't see why an fd explicitly removed with qmp_remove_fd() should
be under runstate_is_running(). I'm assuming this was a mistake when
adding the parenthesis around the expression.

Move the runstate_is_running() check to apply only to the
QLIST_EMPTY(dup_fds) side of the expression and ignore it when
mon_fdset_fd->removed has been explicitly set.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 monitor/fds.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/monitor/fds.c b/monitor/fds.c
index d86c2c674c..4ec3b7eea9 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -173,9 +173,9 @@ static void monitor_fdset_cleanup(MonFdset *mon_fdset)
     MonFdsetFd *mon_fdset_fd_next;
 
     QLIST_FOREACH_SAFE(mon_fdset_fd, &mon_fdset->fds, next, mon_fdset_fd_next) {
-        if ((mon_fdset_fd->removed ||
-                (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0)) &&
-                runstate_is_running()) {
+        if (mon_fdset_fd->removed ||
+            (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0 &&
+             runstate_is_running())) {
             close(mon_fdset_fd->fd);
             g_free(mon_fdset_fd->opaque);
             QLIST_REMOVE(mon_fdset_fd, next);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 2/9] migration: Fix file migration with fdset
  2024-04-26 14:20 [PATCH 0/9] migration/mapped-ram: Add direct-io support Fabiano Rosas
  2024-04-26 14:20 ` [PATCH 1/9] monitor: Honor QMP request for fd removal immediately Fabiano Rosas
@ 2024-04-26 14:20 ` Fabiano Rosas
  2024-05-03 16:23   ` Peter Xu
  2024-05-08  8:00   ` Daniel P. Berrangé
  2024-04-26 14:20 ` [PATCH 3/9] tests/qtest/migration: Fix file migration offset check Fabiano Rosas
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 57+ messages in thread
From: Fabiano Rosas @ 2024-04-26 14:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana, Jim Fehlig

When the migration using the "file:" URI was implemented, I don't
think any of us noticed that if you pass in a file name with the
format "/dev/fdset/N", this allows a file descriptor to be passed in
to QEMU and that behaves just like the "fd:" URI. So the "file:"
support has been added without regard for the fdset part and we got
some things wrong.

The first issue is that we should not truncate the migration file if
we're allowing an fd + offset. We need to leave the file contents
untouched.

The second issue is that there's an expectation that QEMU removes the
fd after the migration has finished. That's what the "fd:" code
does. Otherwise a second migration on the same VM could attempt to
provide an fdset with the same name and QEMU would reject it.

We can fix the first issue by detecting when we're using the fdset
vs. the plain file name. This requires storing the fdset_id
somewhere. We can then use this stored fdset_id to do cleanup at the
end and also fix the second issue.

Fixes: 385f510df5 ("migration: file URI offset")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/file.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/migration/file.c b/migration/file.c
index ab18ba505a..8f30999400 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -10,6 +10,7 @@
 #include "qemu/cutils.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
+#include "qapi/qapi-commands-misc.h"
 #include "channel.h"
 #include "file.h"
 #include "migration.h"
@@ -23,6 +24,7 @@
 
 static struct FileOutgoingArgs {
     char *fname;
+    int64_t fdset_id;
 } outgoing_args;
 
 /* Remove the offset option from @filespec and return it in @offsetp. */
@@ -44,10 +46,39 @@ int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp)
     return 0;
 }
 
+static void file_remove_fdset(void)
+{
+    if (outgoing_args.fdset_id != -1) {
+        qmp_remove_fd(outgoing_args.fdset_id, false, -1, NULL);
+        outgoing_args.fdset_id = -1;
+    }
+}
+
+static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
+                             Error **errp)
+{
+    const char *fdset_id_str;
+
+    *fdset_id = -1;
+
+    if (!strstart(filename, "/dev/fdset/", &fdset_id_str)) {
+        return true;
+    }
+
+    *fdset_id = qemu_parse_fd(fdset_id_str);
+    if (*fdset_id == -1) {
+        error_setg_errno(errp, EINVAL, "Could not parse fdset %s", fdset_id_str);
+        return false;
+    }
+
+    return true;
+}
+
 void file_cleanup_outgoing_migration(void)
 {
     g_free(outgoing_args.fname);
     outgoing_args.fname = NULL;
+    file_remove_fdset();
 }
 
 bool file_send_channel_create(gpointer opaque, Error **errp)
@@ -81,11 +112,24 @@ void file_start_outgoing_migration(MigrationState *s,
     g_autofree char *filename = g_strdup(file_args->filename);
     uint64_t offset = file_args->offset;
     QIOChannel *ioc;
+    int flags = O_CREAT | O_WRONLY;
 
     trace_migration_file_outgoing(filename);
 
-    fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY | O_TRUNC,
-                                     0600, errp);
+    if (!file_parse_fdset(filename, &outgoing_args.fdset_id, errp)) {
+        return;
+    }
+
+    /*
+     * Only truncate if it's QEMU opening the file. If an fd has been
+     * passed in the file will already contain data written by the
+     * management layer.
+     */
+    if (outgoing_args.fdset_id == -1) {
+        flags |= O_TRUNC;
+    }
+
+    fioc = qio_channel_file_new_path(filename, flags, 0600, errp);
     if (!fioc) {
         return;
     }
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 3/9] tests/qtest/migration: Fix file migration offset check
  2024-04-26 14:20 [PATCH 0/9] migration/mapped-ram: Add direct-io support Fabiano Rosas
  2024-04-26 14:20 ` [PATCH 1/9] monitor: Honor QMP request for fd removal immediately Fabiano Rosas
  2024-04-26 14:20 ` [PATCH 2/9] migration: Fix file migration with fdset Fabiano Rosas
@ 2024-04-26 14:20 ` Fabiano Rosas
  2024-05-03 16:47   ` Peter Xu
  2024-04-26 14:20 ` [PATCH 4/9] migration: Add direct-io parameter Fabiano Rosas
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 57+ messages in thread
From: Fabiano Rosas @ 2024-04-26 14:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: berrange, armbru, Peter Xu, Claudio Fontana, Jim Fehlig,
	Thomas Huth, Laurent Vivier, Paolo Bonzini

When doing file migration, QEMU accepts an offset that should be
skipped when writing the migration stream to the file. The purpose of
the offset is to allow the management layer to put its own metadata at
the start of the file.

We have tests for this in migration-test, but only testing that the
migration stream starts at the correct offset and not that it actually
leaves the data intact. Unsurprisingly, there's been a bug in that
area that the tests didn't catch.

Fix the tests to write some data to the offset region and check that
it's actually there after the migration.

Fixes: 3dc35470c8 ("tests/qtest: migration-test: Add tests for file-based migration")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 tests/qtest/migration-test.c | 70 +++++++++++++++++++++++++++++++++---
 1 file changed, 65 insertions(+), 5 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 5d6d8cd634..7b177686b4 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2081,6 +2081,63 @@ static void test_precopy_file(void)
     test_file_common(&args, true);
 }
 
+#ifndef _WIN32
+static void file_dirty_offset_region(void)
+{
+#if defined(__linux__)
+    g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, FILE_TEST_FILENAME);
+    size_t size = FILE_TEST_OFFSET;
+    uintptr_t *addr, *p;
+    int fd;
+
+    fd = open(path, O_CREAT | O_RDWR, 0660);
+    g_assert(fd != -1);
+
+    g_assert(!ftruncate(fd, size));
+
+    addr = mmap(NULL, size, PROT_WRITE, MAP_SHARED, fd, 0);
+    g_assert(addr != MAP_FAILED);
+
+    /* ensure the skipped offset contains some data */
+    p = addr;
+    while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
+        *p = (unsigned long) FILE_TEST_FILENAME;
+        p++;
+    }
+
+    munmap(addr, size);
+    fsync(fd);
+    close(fd);
+#endif
+}
+
+static void *file_offset_start_hook(QTestState *from, QTestState *to)
+{
+    g_autofree char *file = g_strdup_printf("%s/%s", tmpfs, FILE_TEST_FILENAME);
+    int src_flags = O_WRONLY;
+    int dst_flags = O_RDONLY;
+    int fds[2];
+
+    file_dirty_offset_region();
+
+    fds[0] = open(file, src_flags, 0660);
+    assert(fds[0] != -1);
+
+    fds[1] = open(file, dst_flags, 0660);
+    assert(fds[1] != -1);
+
+    qtest_qmp_fds_assert_success(from, &fds[0], 1, "{'execute': 'add-fd', "
+                                 "'arguments': {'fdset-id': 1}}");
+
+    qtest_qmp_fds_assert_success(to, &fds[1], 1, "{'execute': 'add-fd', "
+                                 "'arguments': {'fdset-id': 1}}");
+
+    close(fds[0]);
+    close(fds[1]);
+
+    return NULL;
+}
+
 static void file_offset_finish_hook(QTestState *from, QTestState *to,
                                     void *opaque)
 {
@@ -2096,12 +2153,12 @@ static void file_offset_finish_hook(QTestState *from, QTestState *to,
     g_assert(addr != MAP_FAILED);
 
     /*
-     * Ensure the skipped offset contains zeros and the migration
-     * stream starts at the right place.
+     * Ensure the skipped offset region's data has not been touched
+     * and the migration stream starts at the right place.
      */
     p = addr;
     while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
-        g_assert(*p == 0);
+        g_assert_cmpstr((char *) *p, ==, FILE_TEST_FILENAME);
         p++;
     }
     g_assert_cmpint(cpu_to_be64(*p) >> 32, ==, QEMU_VM_FILE_MAGIC);
@@ -2113,17 +2170,18 @@ static void file_offset_finish_hook(QTestState *from, QTestState *to,
 
 static void test_precopy_file_offset(void)
 {
-    g_autofree char *uri = g_strdup_printf("file:%s/%s,offset=%d", tmpfs,
-                                           FILE_TEST_FILENAME,
+    g_autofree char *uri = g_strdup_printf("file:/dev/fdset/1,offset=%d",
                                            FILE_TEST_OFFSET);
     MigrateCommon args = {
         .connect_uri = uri,
         .listen_uri = "defer",
+        .start_hook = file_offset_start_hook,
         .finish_hook = file_offset_finish_hook,
     };
 
     test_file_common(&args, false);
 }
+#endif
 
 static void test_precopy_file_offset_bad(void)
 {
@@ -3636,8 +3694,10 @@ int main(int argc, char **argv)
 
     migration_test_add("/migration/precopy/file",
                        test_precopy_file);
+#ifndef _WIN32
     migration_test_add("/migration/precopy/file/offset",
                        test_precopy_file_offset);
+#endif
     migration_test_add("/migration/precopy/file/offset/bad",
                        test_precopy_file_offset_bad);
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 4/9] migration: Add direct-io parameter
  2024-04-26 14:20 [PATCH 0/9] migration/mapped-ram: Add direct-io support Fabiano Rosas
                   ` (2 preceding siblings ...)
  2024-04-26 14:20 ` [PATCH 3/9] tests/qtest/migration: Fix file migration offset check Fabiano Rosas
@ 2024-04-26 14:20 ` Fabiano Rosas
  2024-04-26 14:33   ` Markus Armbruster
                     ` (2 more replies)
  2024-04-26 14:20 ` [PATCH 5/9] migration/multifd: Add direct-io support Fabiano Rosas
                   ` (5 subsequent siblings)
  9 siblings, 3 replies; 57+ messages in thread
From: Fabiano Rosas @ 2024-04-26 14:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: berrange, armbru, Peter Xu, Claudio Fontana, Jim Fehlig, Eric Blake

Add the direct-io migration parameter that tells the migration code to
use O_DIRECT when opening the migration stream file whenever possible.

This is currently only used with the mapped-ram migration that has a
clear window guaranteed to perform aligned writes.

Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 include/qemu/osdep.h           |  2 ++
 migration/migration-hmp-cmds.c | 11 +++++++++++
 migration/options.c            | 30 ++++++++++++++++++++++++++++++
 migration/options.h            |  1 +
 qapi/migration.json            | 18 +++++++++++++++---
 util/osdep.c                   |  9 +++++++++
 6 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index c7053cdc2b..645c14a65d 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -612,6 +612,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive);
 bool qemu_has_ofd_lock(void);
 #endif
 
+bool qemu_has_direct_io(void);
+
 #if defined(__HAIKU__) && defined(__i386__)
 #define FMT_pid "%ld"
 #elif defined(WIN64)
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 7e96ae6ffd..8496a2b34e 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -397,6 +397,13 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, "%s: %s\n",
             MigrationParameter_str(MIGRATION_PARAMETER_MODE),
             qapi_enum_lookup(&MigMode_lookup, params->mode));
+
+        if (params->has_direct_io) {
+            monitor_printf(mon, "%s: %s\n",
+                           MigrationParameter_str(
+                               MIGRATION_PARAMETER_DIRECT_IO),
+                           params->direct_io ? "on" : "off");
+        }
     }
 
     qapi_free_MigrationParameters(params);
@@ -690,6 +697,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
         p->has_mode = true;
         visit_type_MigMode(v, param, &p->mode, &err);
         break;
+    case MIGRATION_PARAMETER_DIRECT_IO:
+        p->has_direct_io = true;
+        visit_type_bool(v, param, &p->direct_io, &err);
+        break;
     default:
         assert(0);
     }
diff --git a/migration/options.c b/migration/options.c
index 239f5ecfb4..ae464aa4f2 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -826,6 +826,22 @@ int migrate_decompress_threads(void)
     return s->parameters.decompress_threads;
 }
 
+bool migrate_direct_io(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    /* For now O_DIRECT is only supported with mapped-ram */
+    if (!s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM]) {
+        return false;
+    }
+
+    if (s->parameters.has_direct_io) {
+        return s->parameters.direct_io;
+    }
+
+    return false;
+}
+
 uint64_t migrate_downtime_limit(void)
 {
     MigrationState *s = migrate_get_current();
@@ -1061,6 +1077,11 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->has_zero_page_detection = true;
     params->zero_page_detection = s->parameters.zero_page_detection;
 
+    if (s->parameters.has_direct_io) {
+        params->has_direct_io = true;
+        params->direct_io = s->parameters.direct_io;
+    }
+
     return params;
 }
 
@@ -1097,6 +1118,7 @@ void migrate_params_init(MigrationParameters *params)
     params->has_vcpu_dirty_limit = true;
     params->has_mode = true;
     params->has_zero_page_detection = true;
+    params->has_direct_io = qemu_has_direct_io();
 }
 
 /*
@@ -1416,6 +1438,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
     if (params->has_zero_page_detection) {
         dest->zero_page_detection = params->zero_page_detection;
     }
+
+    if (params->has_direct_io) {
+        dest->direct_io = params->direct_io;
+    }
 }
 
 static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
@@ -1570,6 +1596,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
     if (params->has_zero_page_detection) {
         s->parameters.zero_page_detection = params->zero_page_detection;
     }
+
+    if (params->has_direct_io) {
+        s->parameters.direct_io = params->direct_io;
+    }
 }
 
 void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
diff --git a/migration/options.h b/migration/options.h
index ab8199e207..aa5509cd2a 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -76,6 +76,7 @@ uint8_t migrate_cpu_throttle_increment(void);
 uint8_t migrate_cpu_throttle_initial(void);
 bool migrate_cpu_throttle_tailslow(void);
 int migrate_decompress_threads(void);
+bool migrate_direct_io(void);
 uint64_t migrate_downtime_limit(void);
 uint8_t migrate_max_cpu_throttle(void);
 uint64_t migrate_max_bandwidth(void);
diff --git a/qapi/migration.json b/qapi/migration.json
index 8c65b90328..1a8a4b114c 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -914,6 +914,9 @@
 #     See description in @ZeroPageDetection.  Default is 'multifd'.
 #     (since 9.0)
 #
+# @direct-io: Open migration files with O_DIRECT when possible. This
+#     requires that the @mapped-ram capability is enabled. (since 9.1)
+#
 # Features:
 #
 # @deprecated: Member @block-incremental is deprecated.  Use
@@ -948,7 +951,8 @@
            { 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] },
            'vcpu-dirty-limit',
            'mode',
-           'zero-page-detection'] }
+           'zero-page-detection',
+           'direct-io'] }
 
 ##
 # @MigrateSetParameters:
@@ -1122,6 +1126,9 @@
 #     See description in @ZeroPageDetection.  Default is 'multifd'.
 #     (since 9.0)
 #
+# @direct-io: Open migration files with O_DIRECT when possible. This
+#     requires that the @mapped-ram capability is enabled. (since 9.1)
+#
 # Features:
 #
 # @deprecated: Member @block-incremental is deprecated.  Use
@@ -1176,7 +1183,8 @@
                                             'features': [ 'unstable' ] },
             '*vcpu-dirty-limit': 'uint64',
             '*mode': 'MigMode',
-            '*zero-page-detection': 'ZeroPageDetection'} }
+            '*zero-page-detection': 'ZeroPageDetection',
+            '*direct-io': 'bool' } }
 
 ##
 # @migrate-set-parameters:
@@ -1354,6 +1362,9 @@
 #     See description in @ZeroPageDetection.  Default is 'multifd'.
 #     (since 9.0)
 #
+# @direct-io: Open migration files with O_DIRECT when possible. This
+#     requires that the @mapped-ram capability is enabled. (since 9.1)
+#
 # Features:
 #
 # @deprecated: Member @block-incremental is deprecated.  Use
@@ -1405,7 +1416,8 @@
                                             'features': [ 'unstable' ] },
             '*vcpu-dirty-limit': 'uint64',
             '*mode': 'MigMode',
-            '*zero-page-detection': 'ZeroPageDetection'} }
+            '*zero-page-detection': 'ZeroPageDetection',
+            '*direct-io': 'bool' } }
 
 ##
 # @query-migrate-parameters:
diff --git a/util/osdep.c b/util/osdep.c
index e996c4744a..d0227a60ab 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -277,6 +277,15 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive)
 }
 #endif
 
+bool qemu_has_direct_io(void)
+{
+#ifdef O_DIRECT
+    return true;
+#else
+    return false;
+#endif
+}
+
 static int qemu_open_cloexec(const char *name, int flags, mode_t mode)
 {
     int ret;
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 5/9] migration/multifd: Add direct-io support
  2024-04-26 14:20 [PATCH 0/9] migration/mapped-ram: Add direct-io support Fabiano Rosas
                   ` (3 preceding siblings ...)
  2024-04-26 14:20 ` [PATCH 4/9] migration: Add direct-io parameter Fabiano Rosas
@ 2024-04-26 14:20 ` Fabiano Rosas
  2024-05-03 18:29   ` Peter Xu
  2024-05-08  8:27   ` Daniel P. Berrangé
  2024-04-26 14:20 ` [PATCH 6/9] tests/qtest/migration: Add tests for file migration with direct-io Fabiano Rosas
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 57+ messages in thread
From: Fabiano Rosas @ 2024-04-26 14:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana, Jim Fehlig

When multifd is used along with mapped-ram, we can take benefit of a
filesystem that supports the O_DIRECT flag and perform direct I/O in
the multifd threads. This brings a significant performance improvement
because direct-io writes bypass the page cache which would otherwise
be thrashed by the multifd data which is unlikely to be needed again
in a short period of time.

To be able to use a multifd channel opened with O_DIRECT, we must
ensure that a certain aligment is used. Filesystems usually require a
block-size alignment for direct I/O. The way to achieve this is by
enabling the mapped-ram feature, which already aligns its I/O properly
(see MAPPED_RAM_FILE_OFFSET_ALIGNMENT at ram.c).

By setting O_DIRECT on the multifd channels, all writes to the same
file descriptor need to be aligned as well, even the ones that come
from outside multifd, such as the QEMUFile I/O from the main migration
code. This makes it impossible to use the same file descriptor for the
QEMUFile and for the multifd channels. The various flags and metadata
written by the main migration code will always be unaligned by virtue
of their small size. To workaround this issue, we'll require a second
file descriptor to be used exclusively for direct I/O.

The second file descriptor can be obtained by QEMU by re-opening the
migration file (already possible), or by being provided by the user or
management application (support to be added in future patches).

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/file.c      | 22 +++++++++++++++++++---
 migration/migration.c | 23 +++++++++++++++++++++++
 2 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/migration/file.c b/migration/file.c
index 8f30999400..b9265b14dd 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -83,17 +83,33 @@ void file_cleanup_outgoing_migration(void)
 
 bool file_send_channel_create(gpointer opaque, Error **errp)
 {
-    QIOChannelFile *ioc;
+    QIOChannelFile *ioc = NULL;
     int flags = O_WRONLY;
-    bool ret = true;
+    bool ret = false;
+
+    if (migrate_direct_io()) {
+#ifdef O_DIRECT
+        /*
+         * Enable O_DIRECT for the secondary channels. These are used
+         * for sending ram pages and writes should be guaranteed to be
+         * aligned to at least page size.
+         */
+        flags |= O_DIRECT;
+#else
+        error_setg(errp, "System does not support O_DIRECT");
+        error_append_hint(errp,
+                          "Try disabling direct-io migration capability\n");
+        goto out;
+#endif
+    }
 
     ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
     if (!ioc) {
-        ret = false;
         goto out;
     }
 
     multifd_channel_connect(opaque, QIO_CHANNEL(ioc));
+    ret = true;
 
 out:
     /*
diff --git a/migration/migration.c b/migration/migration.c
index b5af6b5105..cb923a3f62 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -155,6 +155,16 @@ static bool migration_needs_seekable_channel(void)
     return migrate_mapped_ram();
 }
 
+static bool migration_needs_multiple_fds(void)
+{
+    /*
+     * When doing direct-io, multifd requires two different,
+     * non-duplicated file descriptors so we can use one of them for
+     * unaligned IO.
+     */
+    return migrate_multifd() && migrate_direct_io();
+}
+
 static bool transport_supports_seeking(MigrationAddress *addr)
 {
     if (addr->transport == MIGRATION_ADDRESS_TYPE_FILE) {
@@ -164,6 +174,12 @@ static bool transport_supports_seeking(MigrationAddress *addr)
     return false;
 }
 
+static bool transport_supports_multiple_fds(MigrationAddress *addr)
+{
+    /* file: works because QEMU can open it multiple times */
+    return addr->transport == MIGRATION_ADDRESS_TYPE_FILE;
+}
+
 static bool
 migration_channels_and_transport_compatible(MigrationAddress *addr,
                                             Error **errp)
@@ -180,6 +196,13 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
         return false;
     }
 
+    if (migration_needs_multiple_fds() &&
+        !transport_supports_multiple_fds(addr)) {
+        error_setg(errp,
+                   "Migration requires a transport that allows for multiple fds (e.g. file)");
+        return false;
+    }
+
     return true;
 }
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 6/9] tests/qtest/migration: Add tests for file migration with direct-io
  2024-04-26 14:20 [PATCH 0/9] migration/mapped-ram: Add direct-io support Fabiano Rosas
                   ` (4 preceding siblings ...)
  2024-04-26 14:20 ` [PATCH 5/9] migration/multifd: Add direct-io support Fabiano Rosas
@ 2024-04-26 14:20 ` Fabiano Rosas
  2024-05-03 18:38   ` Peter Xu
  2024-05-08  8:34   ` Daniel P. Berrangé
  2024-04-26 14:20 ` [PATCH 7/9] monitor: fdset: Match against O_DIRECT Fabiano Rosas
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 57+ messages in thread
From: Fabiano Rosas @ 2024-04-26 14:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: berrange, armbru, Peter Xu, Claudio Fontana, Jim Fehlig,
	Thomas Huth, Laurent Vivier, Paolo Bonzini

The tests are only allowed to run in systems that know about the
O_DIRECT flag and in filesystems which support it.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 tests/qtest/migration-helpers.c | 42 +++++++++++++++++++++++++++++++++
 tests/qtest/migration-helpers.h |  1 +
 tests/qtest/migration-test.c    | 42 +++++++++++++++++++++++++++++++++
 3 files changed, 85 insertions(+)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index ce6d6615b5..356cd4fa8c 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -473,3 +473,45 @@ void migration_test_add(const char *path, void (*fn)(void))
     qtest_add_data_func_full(path, test, migration_test_wrapper,
                              migration_test_destroy);
 }
+
+#ifdef O_DIRECT
+/*
+ * Probe for O_DIRECT support on the filesystem. Since this is used
+ * for tests, be conservative, if anything fails, assume it's
+ * unsupported.
+ */
+bool probe_o_direct_support(const char *tmpfs)
+{
+    g_autofree char *filename = g_strdup_printf("%s/probe-o-direct", tmpfs);
+    int fd, flags = O_CREAT | O_RDWR | O_TRUNC | O_DIRECT;
+    void *buf;
+    ssize_t ret, len;
+    uint64_t offset;
+
+    fd = open(filename, flags, 0660);
+    if (fd < 0) {
+        unlink(filename);
+        return false;
+    }
+
+    /*
+     * Assuming 4k should be enough to satisfy O_DIRECT alignment
+     * requirements. The migration code uses 1M to be conservative.
+     */
+    len = 0x100000;
+    offset = 0x100000;
+
+    buf = aligned_alloc(len, len);
+    g_assert(buf);
+
+    ret = pwrite(fd, buf, len, offset);
+    unlink(filename);
+    g_free(buf);
+
+    if (ret < 0) {
+        return false;
+    }
+
+    return true;
+}
+#endif
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 1339835698..d827e16145 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -54,5 +54,6 @@ char *find_common_machine_version(const char *mtype, const char *var1,
                                   const char *var2);
 char *resolve_machine_version(const char *alias, const char *var1,
                               const char *var2);
+bool probe_o_direct_support(const char *tmpfs);
 void migration_test_add(const char *path, void (*fn)(void));
 #endif /* MIGRATION_HELPERS_H */
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 7b177686b4..512b7ede8b 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2295,6 +2295,43 @@ static void test_multifd_file_mapped_ram(void)
     test_file_common(&args, true);
 }
 
+#ifdef O_DIRECT
+static void *migrate_mapped_ram_dio_start(QTestState *from,
+                                                 QTestState *to)
+{
+    migrate_mapped_ram_start(from, to);
+    migrate_set_parameter_bool(from, "direct-io", true);
+    migrate_set_parameter_bool(to, "direct-io", true);
+
+    return NULL;
+}
+
+static void *migrate_multifd_mapped_ram_dio_start(QTestState *from,
+                                                 QTestState *to)
+{
+    migrate_multifd_mapped_ram_start(from, to);
+    return migrate_mapped_ram_dio_start(from, to);
+}
+
+static void test_multifd_file_mapped_ram_dio(void)
+{
+    g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
+                                           FILE_TEST_FILENAME);
+    MigrateCommon args = {
+        .connect_uri = uri,
+        .listen_uri = "defer",
+        .start_hook = migrate_multifd_mapped_ram_dio_start,
+    };
+
+    if (!probe_o_direct_support(tmpfs)) {
+        g_test_skip("Filesystem does not support O_DIRECT");
+        return;
+    }
+
+    test_file_common(&args, true);
+}
+
+#endif /* O_DIRECT */
 
 static void test_precopy_tcp_plain(void)
 {
@@ -3719,6 +3756,11 @@ int main(int argc, char **argv)
     migration_test_add("/migration/multifd/file/mapped-ram/live",
                        test_multifd_file_mapped_ram_live);
 
+#ifdef O_DIRECT
+    migration_test_add("/migration/multifd/file/mapped-ram/dio",
+                       test_multifd_file_mapped_ram_dio);
+#endif
+
 #ifdef CONFIG_GNUTLS
     migration_test_add("/migration/precopy/unix/tls/psk",
                        test_precopy_unix_tls_psk);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 7/9] monitor: fdset: Match against O_DIRECT
  2024-04-26 14:20 [PATCH 0/9] migration/mapped-ram: Add direct-io support Fabiano Rosas
                   ` (5 preceding siblings ...)
  2024-04-26 14:20 ` [PATCH 6/9] tests/qtest/migration: Add tests for file migration with direct-io Fabiano Rosas
@ 2024-04-26 14:20 ` Fabiano Rosas
  2024-05-03 18:53   ` Peter Xu
  2024-04-26 14:20 ` [PATCH 8/9] migration: Add support for fdset with multifd + file Fabiano Rosas
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 57+ messages in thread
From: Fabiano Rosas @ 2024-04-26 14:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana, Jim Fehlig

We're about to enable the use of O_DIRECT in the migration code and
due to the alignment restrictions imposed by filesystems we need to
make sure the flag is only used when doing aligned IO.

The migration will do parallel IO to different regions of a file, so
we need to use more than one file descriptor. Those cannot be obtained
by duplicating (dup()) since duplicated file descriptors share the
file status flags, including O_DIRECT. If one migration channel does
unaligned IO while another sets O_DIRECT to do aligned IO, the
filesystem would fail the unaligned operation.

The add-fd QMP command along with the fdset code are specifically
designed to allow the user to pass a set of file descriptors with
different access flags into QEMU to be later fetched by code that
needs to alternate between those flags when doing IO.

Extend the fdset matching to behave the same with the O_DIRECT flag.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 monitor/fds.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/monitor/fds.c b/monitor/fds.c
index 4ec3b7eea9..62e324fcec 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -420,6 +420,11 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
         int fd = -1;
         int dup_fd;
         int mon_fd_flags;
+        int mask = O_ACCMODE;
+
+#ifdef O_DIRECT
+        mask |= O_DIRECT;
+#endif
 
         if (mon_fdset->id != fdset_id) {
             continue;
@@ -431,7 +436,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
                 return -1;
             }
 
-            if ((flags & O_ACCMODE) == (mon_fd_flags & O_ACCMODE)) {
+            if ((flags & mask) == (mon_fd_flags & mask)) {
                 fd = mon_fdset_fd->fd;
                 break;
             }
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 8/9] migration: Add support for fdset with multifd + file
  2024-04-26 14:20 [PATCH 0/9] migration/mapped-ram: Add direct-io support Fabiano Rosas
                   ` (6 preceding siblings ...)
  2024-04-26 14:20 ` [PATCH 7/9] monitor: fdset: Match against O_DIRECT Fabiano Rosas
@ 2024-04-26 14:20 ` Fabiano Rosas
  2024-05-08  8:53   ` Daniel P. Berrangé
  2024-04-26 14:20 ` [PATCH 9/9] tests/qtest/migration: Add a test for mapped-ram with passing of fds Fabiano Rosas
  2024-05-02 20:01 ` [PATCH 0/9] migration/mapped-ram: Add direct-io support Peter Xu
  9 siblings, 1 reply; 57+ messages in thread
From: Fabiano Rosas @ 2024-04-26 14:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, armbru, Peter Xu, Claudio Fontana, Jim Fehlig

Allow multifd to use an fdset when migrating to a file. This is useful
for the scenario where the management layer wants to have control over
the migration file.

By receiving the file descriptors directly, QEMU can delegate some
high level operating system operations to the management layer (such
as mandatory access control). The management layer might also want to
add its own headers before the migration stream.

Enable the "file:/dev/fdset/#" syntax for the multifd migration with
mapped-ram. The requirements for the fdset mechanism are:

On the migration source side:

- the fdset must contain two fds that are not duplicates between
  themselves;
- if direct-io is to be used, exactly one of the fds must have the
  O_DIRECT flag set;
- the file must be opened with WRONLY both times.

On the migration destination side:

- the fdset must contain one fd;
- the file must be opened with RDONLY.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 docs/devel/migration/main.rst       | 18 ++++++++++++++
 docs/devel/migration/mapped-ram.rst |  6 ++++-
 migration/file.c                    | 38 ++++++++++++++++++++++++++++-
 3 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
index 54385a23e5..50f6096470 100644
--- a/docs/devel/migration/main.rst
+++ b/docs/devel/migration/main.rst
@@ -47,6 +47,24 @@ over any transport.
   QEMU interference. Note that QEMU does not flush cached file
   data/metadata at the end of migration.
 
+  The file migration also supports using a file that has already been
+  opened. A set of file descriptors is passed to QEMU via an "fdset"
+  (see add-fd QMP command documentation). This method allows a
+  management application to have control over the migration file
+  opening operation. There are, however, strict requirements to this
+  interface:
+
+  On the migration source side:
+    - if the multifd capability is to be used, the fdset must contain
+      two file descriptors that are not duplicates between themselves;
+    - if the direct-io capability is to be used, exactly one of the
+      file descriptors must have the O_DIRECT flag set;
+    - the file must be opened with WRONLY.
+
+  On the migration destination side:
+    - the fdset must contain one file descriptor;
+    - the file must be opened with RDONLY.
+
 In addition, support is included for migration using RDMA, which
 transports the page data using ``RDMA``, where the hardware takes care of
 transporting the pages, and the load on the CPU is much lower.  While the
diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst
index fa4cefd9fc..e6505511f0 100644
--- a/docs/devel/migration/mapped-ram.rst
+++ b/docs/devel/migration/mapped-ram.rst
@@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a
 sequential stream. Having the pages at fixed offsets also allows the
 usage of O_DIRECT for save/restore of the migration stream as the
 pages are ensured to be written respecting O_DIRECT alignment
-restrictions (direct-io support not yet implemented).
+restrictions.
 
 Usage
 -----
@@ -35,6 +35,10 @@ Use a ``file:`` URL for migration:
 Mapped-ram migration is best done non-live, i.e. by stopping the VM on
 the source side before migrating.
 
+For best performance enable the ``direct-io`` capability as well:
+
+    ``migrate_set_capability direct-io on``
+
 Use-cases
 ---------
 
diff --git a/migration/file.c b/migration/file.c
index b9265b14dd..3bc8bc7463 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -17,6 +17,7 @@
 #include "io/channel-file.h"
 #include "io/channel-socket.h"
 #include "io/channel-util.h"
+#include "monitor/monitor.h"
 #include "options.h"
 #include "trace.h"
 
@@ -54,10 +55,18 @@ static void file_remove_fdset(void)
     }
 }
 
+/*
+ * With multifd, due to the behavior of the dup() system call, we need
+ * the fdset to have two non-duplicate fds so we can enable direct IO
+ * in the secondary channels without affecting the main channel.
+ */
 static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
                              Error **errp)
 {
+    FdsetInfoList *fds_info;
+    FdsetFdInfoList *fd_info;
     const char *fdset_id_str;
+    int nfds = 0;
 
     *fdset_id = -1;
 
@@ -71,6 +80,32 @@ static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
         return false;
     }
 
+    if (!migrate_multifd() || !migrate_direct_io()) {
+        return true;
+    }
+
+    for (fds_info = qmp_query_fdsets(NULL); fds_info;
+         fds_info = fds_info->next) {
+
+        if (*fdset_id != fds_info->value->fdset_id) {
+            continue;
+        }
+
+        for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) {
+            if (nfds++ > 2) {
+                break;
+            }
+        }
+    }
+
+    if (nfds != 2) {
+        error_setg(errp, "Outgoing migration needs two fds in the fdset, "
+                   "got %d", nfds);
+        qmp_remove_fd(*fdset_id, false, -1, NULL);
+        *fdset_id = -1;
+        return false;
+    }
+
     return true;
 }
 
@@ -209,10 +244,11 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp)
     g_autofree char *filename = g_strdup(file_args->filename);
     QIOChannelFile *fioc = NULL;
     uint64_t offset = file_args->offset;
+    int flags = O_RDONLY;
 
     trace_migration_file_incoming(filename);
 
-    fioc = qio_channel_file_new_path(filename, O_RDONLY, 0, errp);
+    fioc = qio_channel_file_new_path(filename, flags, 0, errp);
     if (!fioc) {
         return;
     }
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 9/9] tests/qtest/migration: Add a test for mapped-ram with passing of fds
  2024-04-26 14:20 [PATCH 0/9] migration/mapped-ram: Add direct-io support Fabiano Rosas
                   ` (7 preceding siblings ...)
  2024-04-26 14:20 ` [PATCH 8/9] migration: Add support for fdset with multifd + file Fabiano Rosas
@ 2024-04-26 14:20 ` Fabiano Rosas
  2024-05-08  8:56   ` Daniel P. Berrangé
  2024-05-02 20:01 ` [PATCH 0/9] migration/mapped-ram: Add direct-io support Peter Xu
  9 siblings, 1 reply; 57+ messages in thread
From: Fabiano Rosas @ 2024-04-26 14:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: berrange, armbru, Peter Xu, Claudio Fontana, Jim Fehlig,
	Thomas Huth, Laurent Vivier, Paolo Bonzini

Add a multifd test for mapped-ram with passing of fds into QEMU. This
is how libvirt will consume the feature.

There are a couple of details to the fdset mechanism:

- multifd needs two distinct file descriptors (not duplicated with
  dup()) on the outgoing side so it can enable O_DIRECT only on the
  channels that write with alignment. The dup() system call creates
  file descriptors that share status flags, of which O_DIRECT is one.

  the incoming side doesn't set O_DIRECT, so it can dup() fds and
  therefore can receive only one in the fdset.

- the open() access mode flags used for the fds passed into QEMU need
  to match the flags QEMU uses to open the file. Currently O_WRONLY
  for src and O_RDONLY for dst.

O_DIRECT is not supported on all systems/filesystems, so run the fdset
test without O_DIRECT if that's the case. The migration code should
still work in that scenario.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 tests/qtest/migration-test.c | 90 ++++++++++++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 512b7ede8b..d83f1bdd4f 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2331,8 +2331,93 @@ static void test_multifd_file_mapped_ram_dio(void)
     test_file_common(&args, true);
 }
 
+static void migrate_multifd_mapped_ram_fdset_dio_end(QTestState *from,
+                                                    QTestState *to,
+                                                    void *opaque)
+{
+    QDict *resp;
+    QList *fdsets;
+
+    file_offset_finish_hook(from, to, opaque);
+
+    /*
+     * Check that we removed the fdsets after migration, otherwise a
+     * second migration would fail due to too many fdsets.
+     */
+
+    resp = qtest_qmp(from, "{'execute': 'query-fdsets', "
+                     "'arguments': {}}");
+    g_assert(qdict_haskey(resp, "return"));
+    fdsets = qdict_get_qlist(resp, "return");
+    g_assert(fdsets && qlist_empty(fdsets));
+}
 #endif /* O_DIRECT */
 
+#ifndef _WIN32
+static void *migrate_multifd_mapped_ram_fdset(QTestState *from, QTestState *to)
+{
+    g_autofree char *file = g_strdup_printf("%s/%s", tmpfs, FILE_TEST_FILENAME);
+    int fds[3];
+    int src_flags = O_WRONLY;
+
+    file_dirty_offset_region();
+
+    /* main outgoing channel: no O_DIRECT */
+    fds[0] = open(file, src_flags, 0660);
+    assert(fds[0] != -1);
+
+    qtest_qmp_fds_assert_success(from, &fds[0], 1, "{'execute': 'add-fd', "
+                                 "'arguments': {'fdset-id': 1}}");
+
+#ifdef O_DIRECT
+    src_flags |= O_DIRECT;
+
+    /* secondary outgoing channels */
+    fds[1] = open(file, src_flags, 0660);
+    assert(fds[1] != -1);
+
+    qtest_qmp_fds_assert_success(from, &fds[1], 1, "{'execute': 'add-fd', "
+                                 "'arguments': {'fdset-id': 1}}");
+
+    /* incoming channel */
+    fds[2] = open(file, O_CREAT | O_RDONLY, 0660);
+    assert(fds[2] != -1);
+
+    qtest_qmp_fds_assert_success(to, &fds[2], 1, "{'execute': 'add-fd', "
+                                 "'arguments': {'fdset-id': 1}}");
+
+    migrate_multifd_mapped_ram_dio_start(from, to);
+#else
+    migrate_multifd_mapped_ram_start(from, to);
+#endif
+
+    return NULL;
+}
+
+static void test_multifd_file_mapped_ram_fdset(void)
+{
+    g_autofree char *uri = g_strdup_printf("file:/dev/fdset/1,offset=%d",
+                                           FILE_TEST_OFFSET);
+    MigrateCommon args = {
+        .connect_uri = uri,
+        .listen_uri = "defer",
+        .start_hook = migrate_multifd_mapped_ram_fdset,
+#ifdef O_DIRECT
+        .finish_hook = migrate_multifd_mapped_ram_fdset_dio_end,
+#endif
+    };
+
+#ifdef O_DIRECT
+    if (!probe_o_direct_support(tmpfs)) {
+        g_test_skip("Filesystem does not support O_DIRECT");
+        return;
+    }
+#endif
+
+    test_file_common(&args, true);
+}
+#endif /* _WIN32 */
+
 static void test_precopy_tcp_plain(void)
 {
     MigrateCommon args = {
@@ -3761,6 +3846,11 @@ int main(int argc, char **argv)
                        test_multifd_file_mapped_ram_dio);
 #endif
 
+#ifndef _WIN32
+    qtest_add_func("/migration/multifd/file/mapped-ram/fdset",
+                   test_multifd_file_mapped_ram_fdset);
+#endif
+
 #ifdef CONFIG_GNUTLS
     migration_test_add("/migration/precopy/unix/tls/psk",
                        test_precopy_unix_tls_psk);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/9] migration: Add direct-io parameter
  2024-04-26 14:20 ` [PATCH 4/9] migration: Add direct-io parameter Fabiano Rosas
@ 2024-04-26 14:33   ` Markus Armbruster
  2024-05-03 18:05   ` Peter Xu
  2024-05-08  8:25   ` Daniel P. Berrangé
  2 siblings, 0 replies; 57+ messages in thread
From: Markus Armbruster @ 2024-04-26 14:33 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, berrange, armbru, Peter Xu, Claudio Fontana,
	Jim Fehlig, Eric Blake

Fabiano Rosas <farosas@suse.de> writes:

> Add the direct-io migration parameter that tells the migration code to
> use O_DIRECT when opening the migration stream file whenever possible.
>
> This is currently only used with the mapped-ram migration that has a
> clear window guaranteed to perform aligned writes.
>
> Acked-by: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

[...]

> diff --git a/qapi/migration.json b/qapi/migration.json
> index 8c65b90328..1a8a4b114c 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -914,6 +914,9 @@
>  #     See description in @ZeroPageDetection.  Default is 'multifd'.
>  #     (since 9.0)
>  #
> +# @direct-io: Open migration files with O_DIRECT when possible. This
> +#     requires that the @mapped-ram capability is enabled. (since 9.1)
> +#

Two spaces between sentences for consistency, please.

>  # Features:
>  #
>  # @deprecated: Member @block-incremental is deprecated.  Use
> @@ -948,7 +951,8 @@
>             { 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] },
>             'vcpu-dirty-limit',
>             'mode',
> -           'zero-page-detection'] }
> +           'zero-page-detection',
> +           'direct-io'] }
>  
>  ##
>  # @MigrateSetParameters:

[...]



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/9] migration/mapped-ram: Add direct-io support
  2024-04-26 14:20 [PATCH 0/9] migration/mapped-ram: Add direct-io support Fabiano Rosas
                   ` (8 preceding siblings ...)
  2024-04-26 14:20 ` [PATCH 9/9] tests/qtest/migration: Add a test for mapped-ram with passing of fds Fabiano Rosas
@ 2024-05-02 20:01 ` Peter Xu
  2024-05-02 20:34   ` Fabiano Rosas
  9 siblings, 1 reply; 57+ messages in thread
From: Peter Xu @ 2024-05-02 20:01 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig

On Fri, Apr 26, 2024 at 11:20:33AM -0300, Fabiano Rosas wrote:
> If the user is not passing in a file name which QEMU can open at will,
> we must then require that the user pass the two file descriptors with
> the flags already properly set. We'll use the already existing fdset +
> QMP add-fd infrastructure for this.

Yes I remember such requirement that one extra fd is needed for direct-io,
however today when I looked closer at the man page it looks like F_SETFL
works with O_DIRECT too?

       F_SETFL (int)
              Set the file status flags to the value specified by arg.
              File access mode (O_RDONLY, O_WRONLY, O_RDWR) and file
              creation flags (i.e., O_CREAT, O_EXCL, O_NOCTTY, O_TRUNC) in
              arg are ignored.  On Linux, this command can change only the
              O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME, and O_NONBLOCK flags.
              It is not possible to change the O_DSYNC and O_SYNC flags;
              see BUGS, below.

====8<====
$ cat fcntl.c
#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <assert.h>
#include <unistd.h>

int main(void)
{
    int fd, newfd, ret, flags;

    fd = open("test.txt", O_RDWR | O_CREAT, 0660);
    assert(fd != -1);

    flags = fcntl(fd, F_GETFL);
    printf("old fd flags: 0x%x\n", flags);

    newfd = dup(fd);
    assert(newfd != -1);

    flags = fcntl(newfd, F_GETFL);
    printf("new fd flags: 0x%x\n", flags);

    flags |= O_DIRECT;
    ret = fcntl(newfd, F_SETFL, flags);

    flags = fcntl(fd, F_GETFL);
    printf("updated new flags: 0x%x\n", flags);
    
    return 0;
}
$ make fcntl
cc     fcntl.c   -o fcntl
$ ./fcntl 
old fd flags: 0x8002
new fd flags: 0x8002
updated new flags: 0xc002
====8<====

Perhaps I missed something important?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 0/9] migration/mapped-ram: Add direct-io support
  2024-05-02 20:01 ` [PATCH 0/9] migration/mapped-ram: Add direct-io support Peter Xu
@ 2024-05-02 20:34   ` Fabiano Rosas
  0 siblings, 0 replies; 57+ messages in thread
From: Fabiano Rosas @ 2024-05-02 20:34 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig

Peter Xu <peterx@redhat.com> writes:

> On Fri, Apr 26, 2024 at 11:20:33AM -0300, Fabiano Rosas wrote:
>> If the user is not passing in a file name which QEMU can open at will,
>> we must then require that the user pass the two file descriptors with
>> the flags already properly set. We'll use the already existing fdset +
>> QMP add-fd infrastructure for this.
>
> Yes I remember such requirement that one extra fd is needed for direct-io,
> however today when I looked closer at the man page it looks like F_SETFL
> works with O_DIRECT too?
>
>        F_SETFL (int)
>               Set the file status flags to the value specified by arg.
>               File access mode (O_RDONLY, O_WRONLY, O_RDWR) and file
>               creation flags (i.e., O_CREAT, O_EXCL, O_NOCTTY, O_TRUNC) in
>               arg are ignored.  On Linux, this command can change only the
>               O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME, and O_NONBLOCK flags.
>               It is not possible to change the O_DSYNC and O_SYNC flags;
>               see BUGS, below.
>
> ====8<====
> $ cat fcntl.c
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <fcntl.h>
> #include <assert.h>
> #include <unistd.h>
>
> int main(void)
> {
>     int fd, newfd, ret, flags;
>
>     fd = open("test.txt", O_RDWR | O_CREAT, 0660);
>     assert(fd != -1);
>
>     flags = fcntl(fd, F_GETFL);
>     printf("old fd flags: 0x%x\n", flags);
>
>     newfd = dup(fd);
>     assert(newfd != -1);
>
>     flags = fcntl(newfd, F_GETFL);
>     printf("new fd flags: 0x%x\n", flags);
>
>     flags |= O_DIRECT;
>     ret = fcntl(newfd, F_SETFL, flags);
>
>     flags = fcntl(fd, F_GETFL);
>     printf("updated new flags: 0x%x\n", flags);
>     
>     return 0;
> }
> $ make fcntl
> cc     fcntl.c   -o fcntl
> $ ./fcntl 
> old fd flags: 0x8002
> new fd flags: 0x8002
> updated new flags: 0xc002
> ====8<====
>
> Perhaps I missed something important?

The dup()'ed file descriptor shares file status flags with the original
fd. Your code example proves just that. In the last two blocks you're
doing F_SETFL on the 'newfd' and then seeing the change take effect on
'fd'. That's what we don't want to happen.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/9] monitor: Honor QMP request for fd removal immediately
  2024-04-26 14:20 ` [PATCH 1/9] monitor: Honor QMP request for fd removal immediately Fabiano Rosas
@ 2024-05-03 16:02   ` Peter Xu
  2024-05-16 21:46     ` Fabiano Rosas
  2024-05-08  7:17   ` Daniel P. Berrangé
  1 sibling, 1 reply; 57+ messages in thread
From: Peter Xu @ 2024-05-03 16:02 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig,
	Corey Bryant, Eric Blake, Kevin Wolf

On Fri, Apr 26, 2024 at 11:20:34AM -0300, Fabiano Rosas wrote:
> We're enabling using the fdset interface to pass file descriptors for
> use in the migration code. Since migrations can happen more than once
> during the VMs lifetime, we need a way to remove an fd from the fdset
> at the end of migration.
> 
> The current code only removes an fd from the fdset if the VM is
> running. This causes a QMP call to "remove-fd" to not actually remove
> the fd if the VM happens to be stopped.
> 
> While the fd would eventually be removed when monitor_fdset_cleanup()
> is called again, the user request should be honored and the fd
> actually removed. Calling remove-fd + query-fdset shows a recently
> removed fd still present.
> 
> The runstate_is_running() check was introduced by commit ebe52b592d
> ("monitor: Prevent removing fd from set during init"), which by the
> shortlog indicates that they were trying to avoid removing an
> yet-unduplicated fd too early.
> 
> I don't see why an fd explicitly removed with qmp_remove_fd() should
> be under runstate_is_running(). I'm assuming this was a mistake when
> adding the parenthesis around the expression.
> 
> Move the runstate_is_running() check to apply only to the
> QLIST_EMPTY(dup_fds) side of the expression and ignore it when
> mon_fdset_fd->removed has been explicitly set.

I am confused on why the fdset removal is as complicated.  I'm also
wondering here whether it's dropped because we checked against
"mon_refcount == 0", and maybe monitor_fdset_cleanup() is simply called
_before_ a monitor is created?  Why do we need such check on the first
place?

I'm thinking one case where the only QMP monitor got (for some reason)
disconnected, and reconnected again during VM running.  Won't current code
already lead to unwanted removal of mostly all fds due to mon_refcount==0?

I also am confused why ->removed flags is ever needed, and why we can't
already remove the fdsets fds if found matching.

Copy Corey, Eric and Kevin.

> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  monitor/fds.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/monitor/fds.c b/monitor/fds.c
> index d86c2c674c..4ec3b7eea9 100644
> --- a/monitor/fds.c
> +++ b/monitor/fds.c
> @@ -173,9 +173,9 @@ static void monitor_fdset_cleanup(MonFdset *mon_fdset)
>      MonFdsetFd *mon_fdset_fd_next;
>  
>      QLIST_FOREACH_SAFE(mon_fdset_fd, &mon_fdset->fds, next, mon_fdset_fd_next) {
> -        if ((mon_fdset_fd->removed ||
> -                (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0)) &&
> -                runstate_is_running()) {
> +        if (mon_fdset_fd->removed ||
> +            (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0 &&
> +             runstate_is_running())) {
>              close(mon_fdset_fd->fd);
>              g_free(mon_fdset_fd->opaque);
>              QLIST_REMOVE(mon_fdset_fd, next);
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/9] migration: Fix file migration with fdset
  2024-04-26 14:20 ` [PATCH 2/9] migration: Fix file migration with fdset Fabiano Rosas
@ 2024-05-03 16:23   ` Peter Xu
  2024-05-03 19:56     ` Fabiano Rosas
  2024-05-08  8:02     ` Daniel P. Berrangé
  2024-05-08  8:00   ` Daniel P. Berrangé
  1 sibling, 2 replies; 57+ messages in thread
From: Peter Xu @ 2024-05-03 16:23 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig

On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
> When the migration using the "file:" URI was implemented, I don't
> think any of us noticed that if you pass in a file name with the
> format "/dev/fdset/N", this allows a file descriptor to be passed in
> to QEMU and that behaves just like the "fd:" URI. So the "file:"
> support has been added without regard for the fdset part and we got
> some things wrong.
> 
> The first issue is that we should not truncate the migration file if
> we're allowing an fd + offset. We need to leave the file contents
> untouched.

I'm wondering whether we can use fallocate() instead on the ranges so that
we always don't open() with O_TRUNC.  Before that..  could you remind me
why do we need to truncate in the first place?  I definitely missed
something else here too.

> 
> The second issue is that there's an expectation that QEMU removes the
> fd after the migration has finished. That's what the "fd:" code
> does. Otherwise a second migration on the same VM could attempt to
> provide an fdset with the same name and QEMU would reject it.

Let me check what we do when with "fd:" and when migration completes or
cancels.

IIUC it's qio_channel_file_close() that does the final cleanup work on
e.g. to_dst_file, right?  Then there's qemu_close(), and it has:

    /* Close fd that was dup'd from an fdset */
    fdset_id = monitor_fdset_dup_fd_find(fd);
    if (fdset_id != -1) {
        int ret;

        ret = close(fd);
        if (ret == 0) {
            monitor_fdset_dup_fd_remove(fd);
        }

        return ret;
    }

Shouldn't this done the work already?

Off topic: I think this code is over complicated too, maybe I missed
something, but afaict we don't need monitor_fdset_dup_fd_find at all.. we
simply walk the list and remove stuff..  I attach a patch at the end that I
tried to clean that up, just in case there's early comments.  But we can
ignore that so we don't get side-tracked, and focus on the direct-io
issues.

Thanks,

=======

From 2f6b6d1224486d8ee830a7afe34738a07003b863 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Fri, 3 May 2024 11:27:20 -0400
Subject: [PATCH] monitor: Drop monitor_fdset_dup_fd_add()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This function is not needed, one remove function should already work.
Clean it up.

Here the code doesn't really care about whether we need to keep that dupfd
around if close() failed: when that happens something got very wrong,
keeping the dup_fd around the fdsets may not help that situation so far.

Cc: Dr. David Alan Gilbert <dave@treblig.org>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/monitor/monitor.h |  1 -
 monitor/fds.c             | 27 +++++----------------------
 stubs/fdset.c             |  5 -----
 util/osdep.c              | 15 +--------------
 4 files changed, 6 insertions(+), 42 deletions(-)

diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 965f5d5450..fd9b3f538c 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -53,7 +53,6 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, int64_t fdset_id,
                                 const char *opaque, Error **errp);
 int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags);
 void monitor_fdset_dup_fd_remove(int dup_fd);
-int64_t monitor_fdset_dup_fd_find(int dup_fd);
 
 void monitor_register_hmp(const char *name, bool info,
                           void (*cmd)(Monitor *mon, const QDict *qdict));
diff --git a/monitor/fds.c b/monitor/fds.c
index d86c2c674c..d5aecfb70e 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -458,7 +458,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
 #endif
 }
 
-static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
+void monitor_fdset_dup_fd_remove(int dup_fd)
 {
     MonFdset *mon_fdset;
     MonFdsetFd *mon_fdset_fd_dup;
@@ -467,31 +467,14 @@ static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
     QLIST_FOREACH(mon_fdset, &mon_fdsets, next) {
         QLIST_FOREACH(mon_fdset_fd_dup, &mon_fdset->dup_fds, next) {
             if (mon_fdset_fd_dup->fd == dup_fd) {
-                if (remove) {
-                    QLIST_REMOVE(mon_fdset_fd_dup, next);
-                    g_free(mon_fdset_fd_dup);
-                    if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
-                        monitor_fdset_cleanup(mon_fdset);
-                    }
-                    return -1;
-                } else {
-                    return mon_fdset->id;
+                QLIST_REMOVE(mon_fdset_fd_dup, next);
+                g_free(mon_fdset_fd_dup);
+                if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
+                    monitor_fdset_cleanup(mon_fdset);
                 }
             }
         }
     }
-
-    return -1;
-}
-
-int64_t monitor_fdset_dup_fd_find(int dup_fd)
-{
-    return monitor_fdset_dup_fd_find_remove(dup_fd, false);
-}
-
-void monitor_fdset_dup_fd_remove(int dup_fd)
-{
-    monitor_fdset_dup_fd_find_remove(dup_fd, true);
 }
 
 int monitor_fd_param(Monitor *mon, const char *fdname, Error **errp)
diff --git a/stubs/fdset.c b/stubs/fdset.c
index d7c39a28ac..389e368a29 100644
--- a/stubs/fdset.c
+++ b/stubs/fdset.c
@@ -9,11 +9,6 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
     return -1;
 }
 
-int64_t monitor_fdset_dup_fd_find(int dup_fd)
-{
-    return -1;
-}
-
 void monitor_fdset_dup_fd_remove(int dupfd)
 {
 }
diff --git a/util/osdep.c b/util/osdep.c
index e996c4744a..2d9749d060 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -393,21 +393,8 @@ int qemu_open_old(const char *name, int flags, ...)
 
 int qemu_close(int fd)
 {
-    int64_t fdset_id;
-
     /* Close fd that was dup'd from an fdset */
-    fdset_id = monitor_fdset_dup_fd_find(fd);
-    if (fdset_id != -1) {
-        int ret;
-
-        ret = close(fd);
-        if (ret == 0) {
-            monitor_fdset_dup_fd_remove(fd);
-        }
-
-        return ret;
-    }
-
+    monitor_fdset_dup_fd_remove(fd);
     return close(fd);
 }
 
-- 
2.44.0


-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 3/9] tests/qtest/migration: Fix file migration offset check
  2024-04-26 14:20 ` [PATCH 3/9] tests/qtest/migration: Fix file migration offset check Fabiano Rosas
@ 2024-05-03 16:47   ` Peter Xu
  2024-05-03 20:36     ` Fabiano Rosas
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Xu @ 2024-05-03 16:47 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig,
	Thomas Huth, Laurent Vivier, Paolo Bonzini

On Fri, Apr 26, 2024 at 11:20:36AM -0300, Fabiano Rosas wrote:
> When doing file migration, QEMU accepts an offset that should be
> skipped when writing the migration stream to the file. The purpose of
> the offset is to allow the management layer to put its own metadata at
> the start of the file.
> 
> We have tests for this in migration-test, but only testing that the
> migration stream starts at the correct offset and not that it actually
> leaves the data intact. Unsurprisingly, there's been a bug in that
> area that the tests didn't catch.
> 
> Fix the tests to write some data to the offset region and check that
> it's actually there after the migration.
> 
> Fixes: 3dc35470c8 ("tests/qtest: migration-test: Add tests for file-based migration")
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  tests/qtest/migration-test.c | 70 +++++++++++++++++++++++++++++++++---
>  1 file changed, 65 insertions(+), 5 deletions(-)
> 
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 5d6d8cd634..7b177686b4 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -2081,6 +2081,63 @@ static void test_precopy_file(void)
>      test_file_common(&args, true);
>  }
>  
> +#ifndef _WIN32
> +static void file_dirty_offset_region(void)
> +{
> +#if defined(__linux__)

Hmm, what's the case to cover when !_WIN32 && __linux__?  Can we remove one
layer of ifdef?

I'm also wondering why it can't work on win32?  I thought win32 has all
these stuff we used here, but I may miss something.

> +    g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, FILE_TEST_FILENAME);
> +    size_t size = FILE_TEST_OFFSET;
> +    uintptr_t *addr, *p;
> +    int fd;
> +
> +    fd = open(path, O_CREAT | O_RDWR, 0660);
> +    g_assert(fd != -1);
> +
> +    g_assert(!ftruncate(fd, size));
> +
> +    addr = mmap(NULL, size, PROT_WRITE, MAP_SHARED, fd, 0);
> +    g_assert(addr != MAP_FAILED);
> +
> +    /* ensure the skipped offset contains some data */
> +    p = addr;
> +    while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
> +        *p = (unsigned long) FILE_TEST_FILENAME;

This is fine, but not as clear what is assigned..  I think here we assigned
is the pointer pointing to the binary's RO section (rather than the chars).
Maybe using some random numbers would be more straightforward, but no
strong opinions.

> +        p++;
> +    }
> +
> +    munmap(addr, size);
> +    fsync(fd);
> +    close(fd);
> +#endif
> +}
> +
> +static void *file_offset_start_hook(QTestState *from, QTestState *to)
> +{
> +    g_autofree char *file = g_strdup_printf("%s/%s", tmpfs, FILE_TEST_FILENAME);
> +    int src_flags = O_WRONLY;
> +    int dst_flags = O_RDONLY;
> +    int fds[2];
> +
> +    file_dirty_offset_region();
> +
> +    fds[0] = open(file, src_flags, 0660);
> +    assert(fds[0] != -1);
> +
> +    fds[1] = open(file, dst_flags, 0660);
> +    assert(fds[1] != -1);
> +
> +    qtest_qmp_fds_assert_success(from, &fds[0], 1, "{'execute': 'add-fd', "
> +                                 "'arguments': {'fdset-id': 1}}");
> +
> +    qtest_qmp_fds_assert_success(to, &fds[1], 1, "{'execute': 'add-fd', "
> +                                 "'arguments': {'fdset-id': 1}}");
> +
> +    close(fds[0]);
> +    close(fds[1]);
> +
> +    return NULL;
> +}
> +
>  static void file_offset_finish_hook(QTestState *from, QTestState *to,
>                                      void *opaque)
>  {
> @@ -2096,12 +2153,12 @@ static void file_offset_finish_hook(QTestState *from, QTestState *to,
>      g_assert(addr != MAP_FAILED);
>  
>      /*
> -     * Ensure the skipped offset contains zeros and the migration
> -     * stream starts at the right place.
> +     * Ensure the skipped offset region's data has not been touched
> +     * and the migration stream starts at the right place.
>       */
>      p = addr;
>      while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
> -        g_assert(*p == 0);
> +        g_assert_cmpstr((char *) *p, ==, FILE_TEST_FILENAME);
>          p++;
>      }
>      g_assert_cmpint(cpu_to_be64(*p) >> 32, ==, QEMU_VM_FILE_MAGIC);
> @@ -2113,17 +2170,18 @@ static void file_offset_finish_hook(QTestState *from, QTestState *to,
>  
>  static void test_precopy_file_offset(void)
>  {
> -    g_autofree char *uri = g_strdup_printf("file:%s/%s,offset=%d", tmpfs,
> -                                           FILE_TEST_FILENAME,
> +    g_autofree char *uri = g_strdup_printf("file:/dev/fdset/1,offset=%d",
>                                             FILE_TEST_OFFSET);

Do we want to keep both tests to cover both normal file and fdsets?

>      MigrateCommon args = {
>          .connect_uri = uri,
>          .listen_uri = "defer",
> +        .start_hook = file_offset_start_hook,
>          .finish_hook = file_offset_finish_hook,
>      };
>  
>      test_file_common(&args, false);
>  }
> +#endif
>  
>  static void test_precopy_file_offset_bad(void)
>  {
> @@ -3636,8 +3694,10 @@ int main(int argc, char **argv)
>  
>      migration_test_add("/migration/precopy/file",
>                         test_precopy_file);
> +#ifndef _WIN32
>      migration_test_add("/migration/precopy/file/offset",
>                         test_precopy_file_offset);
> +#endif
>      migration_test_add("/migration/precopy/file/offset/bad",
>                         test_precopy_file_offset_bad);
>  
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/9] migration: Add direct-io parameter
  2024-04-26 14:20 ` [PATCH 4/9] migration: Add direct-io parameter Fabiano Rosas
  2024-04-26 14:33   ` Markus Armbruster
@ 2024-05-03 18:05   ` Peter Xu
  2024-05-03 20:49     ` Fabiano Rosas
  2024-05-08  8:25   ` Daniel P. Berrangé
  2 siblings, 1 reply; 57+ messages in thread
From: Peter Xu @ 2024-05-03 18:05 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig, Eric Blake

On Fri, Apr 26, 2024 at 11:20:37AM -0300, Fabiano Rosas wrote:
> Add the direct-io migration parameter that tells the migration code to
> use O_DIRECT when opening the migration stream file whenever possible.
> 
> This is currently only used with the mapped-ram migration that has a
> clear window guaranteed to perform aligned writes.
> 
> Acked-by: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  include/qemu/osdep.h           |  2 ++
>  migration/migration-hmp-cmds.c | 11 +++++++++++
>  migration/options.c            | 30 ++++++++++++++++++++++++++++++
>  migration/options.h            |  1 +
>  qapi/migration.json            | 18 +++++++++++++++---
>  util/osdep.c                   |  9 +++++++++
>  6 files changed, 68 insertions(+), 3 deletions(-)
> 
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index c7053cdc2b..645c14a65d 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -612,6 +612,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive);
>  bool qemu_has_ofd_lock(void);
>  #endif
>  
> +bool qemu_has_direct_io(void);
> +
>  #if defined(__HAIKU__) && defined(__i386__)
>  #define FMT_pid "%ld"
>  #elif defined(WIN64)
> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> index 7e96ae6ffd..8496a2b34e 100644
> --- a/migration/migration-hmp-cmds.c
> +++ b/migration/migration-hmp-cmds.c
> @@ -397,6 +397,13 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
>          monitor_printf(mon, "%s: %s\n",
>              MigrationParameter_str(MIGRATION_PARAMETER_MODE),
>              qapi_enum_lookup(&MigMode_lookup, params->mode));
> +
> +        if (params->has_direct_io) {
> +            monitor_printf(mon, "%s: %s\n",
> +                           MigrationParameter_str(
> +                               MIGRATION_PARAMETER_DIRECT_IO),
> +                           params->direct_io ? "on" : "off");
> +        }

This will be the first parameter to optionally display here.  I think it's
a sign of misuse of has_direct_io field..

IMHO has_direct_io should be best to be kept as "whether direct_io field is
valid" and that's all of it.  It hopefully shouldn't contain more
information than that, or otherwise it'll be another small challenge we
need to overcome when we can remove all these has_* fields, and can also be
easily overlooked.

IMHO what we should do is assert has_direct_io==true here too, meanwhile...

>      }
>  
>      qapi_free_MigrationParameters(params);
> @@ -690,6 +697,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
>          p->has_mode = true;
>          visit_type_MigMode(v, param, &p->mode, &err);
>          break;
> +    case MIGRATION_PARAMETER_DIRECT_IO:
> +        p->has_direct_io = true;
> +        visit_type_bool(v, param, &p->direct_io, &err);
> +        break;
>      default:
>          assert(0);
>      }
> diff --git a/migration/options.c b/migration/options.c
> index 239f5ecfb4..ae464aa4f2 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -826,6 +826,22 @@ int migrate_decompress_threads(void)
>      return s->parameters.decompress_threads;
>  }
>  
> +bool migrate_direct_io(void)
> +{
> +    MigrationState *s = migrate_get_current();
> +
> +    /* For now O_DIRECT is only supported with mapped-ram */
> +    if (!s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM]) {
> +        return false;
> +    }
> +
> +    if (s->parameters.has_direct_io) {
> +        return s->parameters.direct_io;
> +    }
> +
> +    return false;
> +}
> +
>  uint64_t migrate_downtime_limit(void)
>  {
>      MigrationState *s = migrate_get_current();
> @@ -1061,6 +1077,11 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
>      params->has_zero_page_detection = true;
>      params->zero_page_detection = s->parameters.zero_page_detection;
>  
> +    if (s->parameters.has_direct_io) {
> +        params->has_direct_io = true;
> +        params->direct_io = s->parameters.direct_io;
> +    }
> +
>      return params;
>  }
>  
> @@ -1097,6 +1118,7 @@ void migrate_params_init(MigrationParameters *params)
>      params->has_vcpu_dirty_limit = true;
>      params->has_mode = true;
>      params->has_zero_page_detection = true;
> +    params->has_direct_io = qemu_has_direct_io();
>  }
>  
>  /*
> @@ -1416,6 +1438,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
>      if (params->has_zero_page_detection) {
>          dest->zero_page_detection = params->zero_page_detection;
>      }
> +
> +    if (params->has_direct_io) {
> +        dest->direct_io = params->direct_io;

.. do proper check here to make sure the current QEMU is built with direct
IO support, then fail QMP migrate-set-parameters otherwise when someone
tries to enable it on a QEMU that doesn't support it.

Always displaying direct_io parameter also helps when we simply want to
check qemu version and whether it supports this feature in general.

> +    }
>  }
>  
>  static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
> @@ -1570,6 +1596,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
>      if (params->has_zero_page_detection) {
>          s->parameters.zero_page_detection = params->zero_page_detection;
>      }
> +
> +    if (params->has_direct_io) {
> +        s->parameters.direct_io = params->direct_io;
> +    }
>  }
>  
>  void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
> diff --git a/migration/options.h b/migration/options.h
> index ab8199e207..aa5509cd2a 100644
> --- a/migration/options.h
> +++ b/migration/options.h
> @@ -76,6 +76,7 @@ uint8_t migrate_cpu_throttle_increment(void);
>  uint8_t migrate_cpu_throttle_initial(void);
>  bool migrate_cpu_throttle_tailslow(void);
>  int migrate_decompress_threads(void);
> +bool migrate_direct_io(void);
>  uint64_t migrate_downtime_limit(void);
>  uint8_t migrate_max_cpu_throttle(void);
>  uint64_t migrate_max_bandwidth(void);
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 8c65b90328..1a8a4b114c 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -914,6 +914,9 @@
>  #     See description in @ZeroPageDetection.  Default is 'multifd'.
>  #     (since 9.0)
>  #
> +# @direct-io: Open migration files with O_DIRECT when possible. This
> +#     requires that the @mapped-ram capability is enabled. (since 9.1)

Here it seems to imply setting direct-io=true will fail if mapped-ram not
enabled, but in reality it's fine, it'll just be ignored.  I think that's
the right thing to do to reduce correlation effects between params/caps
(otherwise, when unset mapped-ram cap, we'll need to double check again to
unset direct-io too; just cumbersome).

I suggest we state the fact, that this field is ignored when mapped-ram
capability is not enabled, rather than "requires mapped-ram".  Same to all
the rest two places in qapi doc.

> +#
>  # Features:
>  #
>  # @deprecated: Member @block-incremental is deprecated.  Use
> @@ -948,7 +951,8 @@
>             { 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] },
>             'vcpu-dirty-limit',
>             'mode',
> -           'zero-page-detection'] }
> +           'zero-page-detection',
> +           'direct-io'] }
>  
>  ##
>  # @MigrateSetParameters:
> @@ -1122,6 +1126,9 @@
>  #     See description in @ZeroPageDetection.  Default is 'multifd'.
>  #     (since 9.0)
>  #
> +# @direct-io: Open migration files with O_DIRECT when possible. This
> +#     requires that the @mapped-ram capability is enabled. (since 9.1)
> +#
>  # Features:
>  #
>  # @deprecated: Member @block-incremental is deprecated.  Use
> @@ -1176,7 +1183,8 @@
>                                              'features': [ 'unstable' ] },
>              '*vcpu-dirty-limit': 'uint64',
>              '*mode': 'MigMode',
> -            '*zero-page-detection': 'ZeroPageDetection'} }
> +            '*zero-page-detection': 'ZeroPageDetection',
> +            '*direct-io': 'bool' } }
>  
>  ##
>  # @migrate-set-parameters:
> @@ -1354,6 +1362,9 @@
>  #     See description in @ZeroPageDetection.  Default is 'multifd'.
>  #     (since 9.0)
>  #
> +# @direct-io: Open migration files with O_DIRECT when possible. This
> +#     requires that the @mapped-ram capability is enabled. (since 9.1)
> +#
>  # Features:
>  #
>  # @deprecated: Member @block-incremental is deprecated.  Use
> @@ -1405,7 +1416,8 @@
>                                              'features': [ 'unstable' ] },
>              '*vcpu-dirty-limit': 'uint64',
>              '*mode': 'MigMode',
> -            '*zero-page-detection': 'ZeroPageDetection'} }
> +            '*zero-page-detection': 'ZeroPageDetection',
> +            '*direct-io': 'bool' } }
>  
>  ##
>  # @query-migrate-parameters:
> diff --git a/util/osdep.c b/util/osdep.c
> index e996c4744a..d0227a60ab 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -277,6 +277,15 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive)
>  }
>  #endif
>  
> +bool qemu_has_direct_io(void)
> +{
> +#ifdef O_DIRECT
> +    return true;
> +#else
> +    return false;
> +#endif
> +}
> +
>  static int qemu_open_cloexec(const char *name, int flags, mode_t mode)
>  {
>      int ret;
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 5/9] migration/multifd: Add direct-io support
  2024-04-26 14:20 ` [PATCH 5/9] migration/multifd: Add direct-io support Fabiano Rosas
@ 2024-05-03 18:29   ` Peter Xu
  2024-05-03 20:54     ` Fabiano Rosas
  2024-05-08  8:27   ` Daniel P. Berrangé
  1 sibling, 1 reply; 57+ messages in thread
From: Peter Xu @ 2024-05-03 18:29 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig

On Fri, Apr 26, 2024 at 11:20:38AM -0300, Fabiano Rosas wrote:
> When multifd is used along with mapped-ram, we can take benefit of a
> filesystem that supports the O_DIRECT flag and perform direct I/O in
> the multifd threads. This brings a significant performance improvement
> because direct-io writes bypass the page cache which would otherwise
> be thrashed by the multifd data which is unlikely to be needed again
> in a short period of time.
> 
> To be able to use a multifd channel opened with O_DIRECT, we must
> ensure that a certain aligment is used. Filesystems usually require a
> block-size alignment for direct I/O. The way to achieve this is by
> enabling the mapped-ram feature, which already aligns its I/O properly
> (see MAPPED_RAM_FILE_OFFSET_ALIGNMENT at ram.c).
> 
> By setting O_DIRECT on the multifd channels, all writes to the same
> file descriptor need to be aligned as well, even the ones that come
> from outside multifd, such as the QEMUFile I/O from the main migration
> code. This makes it impossible to use the same file descriptor for the
> QEMUFile and for the multifd channels. The various flags and metadata
> written by the main migration code will always be unaligned by virtue
> of their small size. To workaround this issue, we'll require a second
> file descriptor to be used exclusively for direct I/O.
> 
> The second file descriptor can be obtained by QEMU by re-opening the
> migration file (already possible), or by being provided by the user or
> management application (support to be added in future patches).
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/file.c      | 22 +++++++++++++++++++---
>  migration/migration.c | 23 +++++++++++++++++++++++
>  2 files changed, 42 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/file.c b/migration/file.c
> index 8f30999400..b9265b14dd 100644
> --- a/migration/file.c
> +++ b/migration/file.c
> @@ -83,17 +83,33 @@ void file_cleanup_outgoing_migration(void)
>  
>  bool file_send_channel_create(gpointer opaque, Error **errp)
>  {
> -    QIOChannelFile *ioc;
> +    QIOChannelFile *ioc = NULL;
>      int flags = O_WRONLY;
> -    bool ret = true;
> +    bool ret = false;
> +
> +    if (migrate_direct_io()) {
> +#ifdef O_DIRECT
> +        /*
> +         * Enable O_DIRECT for the secondary channels. These are used
> +         * for sending ram pages and writes should be guaranteed to be
> +         * aligned to at least page size.
> +         */
> +        flags |= O_DIRECT;
> +#else
> +        error_setg(errp, "System does not support O_DIRECT");
> +        error_append_hint(errp,
> +                          "Try disabling direct-io migration capability\n");
> +        goto out;
> +#endif

Hopefully if we can fail migrate-set-parameters correctly always, we will
never trigger this error.

I know Linux used some trick like this to even avoid such ifdefs:

  if (qemu_has_direct_io() && migrate_direct_io()) {
      // reference O_DIRECT
  }

So as long as qemu_has_direct_io() can return a constant "false" when
O_DIRECT not defined, the compiler is smart enough to ignore the O_DIRECT
inside the block.

Even if it won't work, we can still avoid that error (and rely on the
set-parameter failure):

#ifdef O_DIRECT
       if (migrate_direct_io()) {
           // reference O_DIRECT
       }
#endif

Then it should run the same, just to try making ifdefs as light as
possible..

> +    }
>  
>      ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
>      if (!ioc) {
> -        ret = false;
>          goto out;
>      }
>  
>      multifd_channel_connect(opaque, QIO_CHANNEL(ioc));
> +    ret = true;
>  
>  out:
>      /*
> diff --git a/migration/migration.c b/migration/migration.c
> index b5af6b5105..cb923a3f62 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -155,6 +155,16 @@ static bool migration_needs_seekable_channel(void)
>      return migrate_mapped_ram();
>  }
>  
> +static bool migration_needs_multiple_fds(void)

If I suggest to rename this, would you agree? :)

I'd try with "migrate_needs_extra_fd()" or "migrate_needs_two_fds()",
or... just to avoid "multi" + "fd" used altogether, perhaps.

Other than that looks all good.

Thanks,

> +{
> +    /*
> +     * When doing direct-io, multifd requires two different,
> +     * non-duplicated file descriptors so we can use one of them for
> +     * unaligned IO.
> +     */
> +    return migrate_multifd() && migrate_direct_io();
> +}
> +
>  static bool transport_supports_seeking(MigrationAddress *addr)
>  {
>      if (addr->transport == MIGRATION_ADDRESS_TYPE_FILE) {
> @@ -164,6 +174,12 @@ static bool transport_supports_seeking(MigrationAddress *addr)
>      return false;
>  }
>  
> +static bool transport_supports_multiple_fds(MigrationAddress *addr)
> +{
> +    /* file: works because QEMU can open it multiple times */
> +    return addr->transport == MIGRATION_ADDRESS_TYPE_FILE;
> +}
> +
>  static bool
>  migration_channels_and_transport_compatible(MigrationAddress *addr,
>                                              Error **errp)
> @@ -180,6 +196,13 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
>          return false;
>      }
>  
> +    if (migration_needs_multiple_fds() &&
> +        !transport_supports_multiple_fds(addr)) {
> +        error_setg(errp,
> +                   "Migration requires a transport that allows for multiple fds (e.g. file)");
> +        return false;
> +    }
> +
>      return true;
>  }
>  
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 6/9] tests/qtest/migration: Add tests for file migration with direct-io
  2024-04-26 14:20 ` [PATCH 6/9] tests/qtest/migration: Add tests for file migration with direct-io Fabiano Rosas
@ 2024-05-03 18:38   ` Peter Xu
  2024-05-03 21:05     ` Fabiano Rosas
  2024-05-08  8:34   ` Daniel P. Berrangé
  1 sibling, 1 reply; 57+ messages in thread
From: Peter Xu @ 2024-05-03 18:38 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig,
	Thomas Huth, Laurent Vivier, Paolo Bonzini

On Fri, Apr 26, 2024 at 11:20:39AM -0300, Fabiano Rosas wrote:
> The tests are only allowed to run in systems that know about the
> O_DIRECT flag and in filesystems which support it.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Mostly:

Reviewed-by: Peter Xu <peterx@redhat.com>

Two trivial comments below.

> ---
>  tests/qtest/migration-helpers.c | 42 +++++++++++++++++++++++++++++++++
>  tests/qtest/migration-helpers.h |  1 +
>  tests/qtest/migration-test.c    | 42 +++++++++++++++++++++++++++++++++
>  3 files changed, 85 insertions(+)
> 
> diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
> index ce6d6615b5..356cd4fa8c 100644
> --- a/tests/qtest/migration-helpers.c
> +++ b/tests/qtest/migration-helpers.c
> @@ -473,3 +473,45 @@ void migration_test_add(const char *path, void (*fn)(void))
>      qtest_add_data_func_full(path, test, migration_test_wrapper,
>                               migration_test_destroy);
>  }
> +
> +#ifdef O_DIRECT
> +/*
> + * Probe for O_DIRECT support on the filesystem. Since this is used
> + * for tests, be conservative, if anything fails, assume it's
> + * unsupported.
> + */
> +bool probe_o_direct_support(const char *tmpfs)
> +{
> +    g_autofree char *filename = g_strdup_printf("%s/probe-o-direct", tmpfs);
> +    int fd, flags = O_CREAT | O_RDWR | O_TRUNC | O_DIRECT;
> +    void *buf;
> +    ssize_t ret, len;
> +    uint64_t offset;
> +
> +    fd = open(filename, flags, 0660);
> +    if (fd < 0) {
> +        unlink(filename);
> +        return false;
> +    }
> +
> +    /*
> +     * Assuming 4k should be enough to satisfy O_DIRECT alignment
> +     * requirements. The migration code uses 1M to be conservative.
> +     */
> +    len = 0x100000;
> +    offset = 0x100000;
> +
> +    buf = aligned_alloc(len, len);

This is the first usage of aligned_alloc() in qemu.  IIUC it's just a newer
posix_memalign(), which QEMU has one use of, and it's protected with:

#if defined(CONFIG_POSIX_MEMALIGN)
    int ret;
    ret = posix_memalign(&ptr, alignment, size);
    ...
#endif

Didn't check deeper.  Just keep this in mind if you see any compilation
issues in future CIs, or simply switch to similar pattern.

> +    g_assert(buf);
> +
> +    ret = pwrite(fd, buf, len, offset);
> +    unlink(filename);
> +    g_free(buf);
> +
> +    if (ret < 0) {
> +        return false;
> +    }
> +
> +    return true;
> +}
> +#endif
> diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
> index 1339835698..d827e16145 100644
> --- a/tests/qtest/migration-helpers.h
> +++ b/tests/qtest/migration-helpers.h
> @@ -54,5 +54,6 @@ char *find_common_machine_version(const char *mtype, const char *var1,
>                                    const char *var2);
>  char *resolve_machine_version(const char *alias, const char *var1,
>                                const char *var2);
> +bool probe_o_direct_support(const char *tmpfs);
>  void migration_test_add(const char *path, void (*fn)(void));
>  #endif /* MIGRATION_HELPERS_H */
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 7b177686b4..512b7ede8b 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -2295,6 +2295,43 @@ static void test_multifd_file_mapped_ram(void)
>      test_file_common(&args, true);
>  }
>  
> +#ifdef O_DIRECT
> +static void *migrate_mapped_ram_dio_start(QTestState *from,
> +                                                 QTestState *to)
> +{
> +    migrate_mapped_ram_start(from, to);

This line seems redundant, migrate_multifd_mapped_ram_start() should cover that.

> +    migrate_set_parameter_bool(from, "direct-io", true);
> +    migrate_set_parameter_bool(to, "direct-io", true);
> +
> +    return NULL;
> +}
> +
> +static void *migrate_multifd_mapped_ram_dio_start(QTestState *from,
> +                                                 QTestState *to)
> +{
> +    migrate_multifd_mapped_ram_start(from, to);
> +    return migrate_mapped_ram_dio_start(from, to);
> +}
> +
> +static void test_multifd_file_mapped_ram_dio(void)
> +{
> +    g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
> +                                           FILE_TEST_FILENAME);
> +    MigrateCommon args = {
> +        .connect_uri = uri,
> +        .listen_uri = "defer",
> +        .start_hook = migrate_multifd_mapped_ram_dio_start,
> +    };
> +
> +    if (!probe_o_direct_support(tmpfs)) {
> +        g_test_skip("Filesystem does not support O_DIRECT");
> +        return;
> +    }
> +
> +    test_file_common(&args, true);
> +}
> +
> +#endif /* O_DIRECT */
>  
>  static void test_precopy_tcp_plain(void)
>  {
> @@ -3719,6 +3756,11 @@ int main(int argc, char **argv)
>      migration_test_add("/migration/multifd/file/mapped-ram/live",
>                         test_multifd_file_mapped_ram_live);
>  
> +#ifdef O_DIRECT
> +    migration_test_add("/migration/multifd/file/mapped-ram/dio",
> +                       test_multifd_file_mapped_ram_dio);
> +#endif
> +
>  #ifdef CONFIG_GNUTLS
>      migration_test_add("/migration/precopy/unix/tls/psk",
>                         test_precopy_unix_tls_psk);
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 7/9] monitor: fdset: Match against O_DIRECT
  2024-04-26 14:20 ` [PATCH 7/9] monitor: fdset: Match against O_DIRECT Fabiano Rosas
@ 2024-05-03 18:53   ` Peter Xu
  2024-05-03 21:19     ` Fabiano Rosas
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Xu @ 2024-05-03 18:53 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig

On Fri, Apr 26, 2024 at 11:20:40AM -0300, Fabiano Rosas wrote:
> We're about to enable the use of O_DIRECT in the migration code and
> due to the alignment restrictions imposed by filesystems we need to
> make sure the flag is only used when doing aligned IO.
> 
> The migration will do parallel IO to different regions of a file, so
> we need to use more than one file descriptor. Those cannot be obtained
> by duplicating (dup()) since duplicated file descriptors share the
> file status flags, including O_DIRECT. If one migration channel does
> unaligned IO while another sets O_DIRECT to do aligned IO, the
> filesystem would fail the unaligned operation.
> 
> The add-fd QMP command along with the fdset code are specifically
> designed to allow the user to pass a set of file descriptors with
> different access flags into QEMU to be later fetched by code that
> needs to alternate between those flags when doing IO.
> 
> Extend the fdset matching to behave the same with the O_DIRECT flag.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  monitor/fds.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/monitor/fds.c b/monitor/fds.c
> index 4ec3b7eea9..62e324fcec 100644
> --- a/monitor/fds.c
> +++ b/monitor/fds.c
> @@ -420,6 +420,11 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
>          int fd = -1;
>          int dup_fd;
>          int mon_fd_flags;
> +        int mask = O_ACCMODE;
> +
> +#ifdef O_DIRECT
> +        mask |= O_DIRECT;
> +#endif
>  
>          if (mon_fdset->id != fdset_id) {
>              continue;
> @@ -431,7 +436,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
>                  return -1;
>              }
>  
> -            if ((flags & O_ACCMODE) == (mon_fd_flags & O_ACCMODE)) {
> +            if ((flags & mask) == (mon_fd_flags & mask)) {
>                  fd = mon_fdset_fd->fd;
>                  break;
>              }

I think I see what you wanted to do, picking out the right fd out of two
when qemu_open_old(), which makes sense.

However what happens if the mgmt app only passes in 1 fd to the fdset?  The
issue is we have a "fallback dup()" plan right after this chunk of code:

        dup_fd = qemu_dup_flags(fd, flags);
        if (dup_fd == -1) {
            return -1;
        }

        mon_fdset_fd_dup = g_malloc0(sizeof(*mon_fdset_fd_dup));
        mon_fdset_fd_dup->fd = dup_fd;
        QLIST_INSERT_HEAD(&mon_fdset->dup_fds, mon_fdset_fd_dup, next);

I think it means even if the mgmt app only passes in 1 fd (rather than 2,
one with O_DIRECT, one without), QEMU can always successfully call
qemu_open_old() twice for each case, even though silently the two FDs will
actually impact on each other.  This doesn't look ideal if it's true.

But I also must confess I don't really understand this code at all: we
dup(), then we try F_SETFL on all the possible flags got passed in.
However AFAICT due to the fact that dup()ed FDs will share "struct file" it
means mostly all flags will be shared, except close-on-exec.  I don't ever
see anything protecting that F_SETFL to only touch close-on-exec, I think
it means it'll silently change file status flags for the other fd which we
dup()ed from.  Does it mean that we have issue already with such dup() usage?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/9] migration: Fix file migration with fdset
  2024-05-03 16:23   ` Peter Xu
@ 2024-05-03 19:56     ` Fabiano Rosas
  2024-05-03 21:04       ` Peter Xu
  2024-05-08  8:02     ` Daniel P. Berrangé
  1 sibling, 1 reply; 57+ messages in thread
From: Fabiano Rosas @ 2024-05-03 19:56 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig

Peter Xu <peterx@redhat.com> writes:

> On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
>> When the migration using the "file:" URI was implemented, I don't
>> think any of us noticed that if you pass in a file name with the
>> format "/dev/fdset/N", this allows a file descriptor to be passed in
>> to QEMU and that behaves just like the "fd:" URI. So the "file:"
>> support has been added without regard for the fdset part and we got
>> some things wrong.
>> 
>> The first issue is that we should not truncate the migration file if
>> we're allowing an fd + offset. We need to leave the file contents
>> untouched.
>
> I'm wondering whether we can use fallocate() instead on the ranges so that
> we always don't open() with O_TRUNC.  Before that..  could you remind me
> why do we need to truncate in the first place?  I definitely missed
> something else here too.

AFAIK, just to avoid any issues if the file is pre-existing. I don't see
the difference between O_TRUNC and fallocate in this case.

>
>> 
>> The second issue is that there's an expectation that QEMU removes the
>> fd after the migration has finished. That's what the "fd:" code
>> does. Otherwise a second migration on the same VM could attempt to
>> provide an fdset with the same name and QEMU would reject it.
>
> Let me check what we do when with "fd:" and when migration completes or
> cancels.
>
> IIUC it's qio_channel_file_close() that does the final cleanup work on
> e.g. to_dst_file, right?  Then there's qemu_close(), and it has:
>
>     /* Close fd that was dup'd from an fdset */
>     fdset_id = monitor_fdset_dup_fd_find(fd);
>     if (fdset_id != -1) {
>         int ret;
>
>         ret = close(fd);
>         if (ret == 0) {
>             monitor_fdset_dup_fd_remove(fd);
>         }
>
>         return ret;
>     }
>
> Shouldn't this done the work already?

That removes the mon_fdset_fd_dup->fd, we want to remove the
mon_fdset_fd->fd.

>
> Off topic: I think this code is over complicated too, maybe I missed
> something, but afaict we don't need monitor_fdset_dup_fd_find at all.. we
> simply walk the list and remove stuff..  I attach a patch at the end that I
> tried to clean that up, just in case there's early comments.  But we can
> ignore that so we don't get side-tracked, and focus on the direct-io
> issues.

Well, I'm not confident touching this code. This is more than a decade
old, I have no idea what the original motivations were. The possible
interactions with the user via command-line (-add-fd), QMP (add-fd) and
the monitor lifetime make me confused. Not to mention the fdset part
being plumbed into the guts of a widely used qemu_open_internal() that
very misleadingly presents itself as just a wrapper for open().

>
> Thanks,
>
> =======
>
> From 2f6b6d1224486d8ee830a7afe34738a07003b863 Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Fri, 3 May 2024 11:27:20 -0400
> Subject: [PATCH] monitor: Drop monitor_fdset_dup_fd_add()
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> This function is not needed, one remove function should already work.
> Clean it up.
>
> Here the code doesn't really care about whether we need to keep that dupfd
> around if close() failed: when that happens something got very wrong,
> keeping the dup_fd around the fdsets may not help that situation so far.
>
> Cc: Dr. David Alan Gilbert <dave@treblig.org>
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/monitor/monitor.h |  1 -
>  monitor/fds.c             | 27 +++++----------------------
>  stubs/fdset.c             |  5 -----
>  util/osdep.c              | 15 +--------------
>  4 files changed, 6 insertions(+), 42 deletions(-)
>
> diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> index 965f5d5450..fd9b3f538c 100644
> --- a/include/monitor/monitor.h
> +++ b/include/monitor/monitor.h
> @@ -53,7 +53,6 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, int64_t fdset_id,
>                                  const char *opaque, Error **errp);
>  int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags);
>  void monitor_fdset_dup_fd_remove(int dup_fd);
> -int64_t monitor_fdset_dup_fd_find(int dup_fd);
>  
>  void monitor_register_hmp(const char *name, bool info,
>                            void (*cmd)(Monitor *mon, const QDict *qdict));
> diff --git a/monitor/fds.c b/monitor/fds.c
> index d86c2c674c..d5aecfb70e 100644
> --- a/monitor/fds.c
> +++ b/monitor/fds.c
> @@ -458,7 +458,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
>  #endif
>  }
>  
> -static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
> +void monitor_fdset_dup_fd_remove(int dup_fd)
>  {
>      MonFdset *mon_fdset;
>      MonFdsetFd *mon_fdset_fd_dup;
> @@ -467,31 +467,14 @@ static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
>      QLIST_FOREACH(mon_fdset, &mon_fdsets, next) {
>          QLIST_FOREACH(mon_fdset_fd_dup, &mon_fdset->dup_fds, next) {
>              if (mon_fdset_fd_dup->fd == dup_fd) {
> -                if (remove) {
> -                    QLIST_REMOVE(mon_fdset_fd_dup, next);
> -                    g_free(mon_fdset_fd_dup);
> -                    if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
> -                        monitor_fdset_cleanup(mon_fdset);
> -                    }
> -                    return -1;
> -                } else {
> -                    return mon_fdset->id;
> +                QLIST_REMOVE(mon_fdset_fd_dup, next);
> +                g_free(mon_fdset_fd_dup);
> +                if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
> +                    monitor_fdset_cleanup(mon_fdset);
>                  }
>              }
>          }
>      }
> -
> -    return -1;
> -}
> -
> -int64_t monitor_fdset_dup_fd_find(int dup_fd)
> -{
> -    return monitor_fdset_dup_fd_find_remove(dup_fd, false);
> -}
> -
> -void monitor_fdset_dup_fd_remove(int dup_fd)
> -{
> -    monitor_fdset_dup_fd_find_remove(dup_fd, true);
>  }
>  
>  int monitor_fd_param(Monitor *mon, const char *fdname, Error **errp)
> diff --git a/stubs/fdset.c b/stubs/fdset.c
> index d7c39a28ac..389e368a29 100644
> --- a/stubs/fdset.c
> +++ b/stubs/fdset.c
> @@ -9,11 +9,6 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
>      return -1;
>  }
>  
> -int64_t monitor_fdset_dup_fd_find(int dup_fd)
> -{
> -    return -1;
> -}
> -
>  void monitor_fdset_dup_fd_remove(int dupfd)
>  {
>  }
> diff --git a/util/osdep.c b/util/osdep.c
> index e996c4744a..2d9749d060 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -393,21 +393,8 @@ int qemu_open_old(const char *name, int flags, ...)
>  
>  int qemu_close(int fd)
>  {
> -    int64_t fdset_id;
> -
>      /* Close fd that was dup'd from an fdset */
> -    fdset_id = monitor_fdset_dup_fd_find(fd);
> -    if (fdset_id != -1) {
> -        int ret;
> -
> -        ret = close(fd);
> -        if (ret == 0) {
> -            monitor_fdset_dup_fd_remove(fd);
> -        }
> -
> -        return ret;
> -    }
> -
> +    monitor_fdset_dup_fd_remove(fd);
>      return close(fd);
>  }
>  
> -- 
> 2.44.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 3/9] tests/qtest/migration: Fix file migration offset check
  2024-05-03 16:47   ` Peter Xu
@ 2024-05-03 20:36     ` Fabiano Rosas
  2024-05-03 21:08       ` Peter Xu
  2024-05-08  8:10       ` Daniel P. Berrangé
  0 siblings, 2 replies; 57+ messages in thread
From: Fabiano Rosas @ 2024-05-03 20:36 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig,
	Thomas Huth, Laurent Vivier, Paolo Bonzini

Peter Xu <peterx@redhat.com> writes:

> On Fri, Apr 26, 2024 at 11:20:36AM -0300, Fabiano Rosas wrote:
>> When doing file migration, QEMU accepts an offset that should be
>> skipped when writing the migration stream to the file. The purpose of
>> the offset is to allow the management layer to put its own metadata at
>> the start of the file.
>> 
>> We have tests for this in migration-test, but only testing that the
>> migration stream starts at the correct offset and not that it actually
>> leaves the data intact. Unsurprisingly, there's been a bug in that
>> area that the tests didn't catch.
>> 
>> Fix the tests to write some data to the offset region and check that
>> it's actually there after the migration.
>> 
>> Fixes: 3dc35470c8 ("tests/qtest: migration-test: Add tests for file-based migration")
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  tests/qtest/migration-test.c | 70 +++++++++++++++++++++++++++++++++---
>>  1 file changed, 65 insertions(+), 5 deletions(-)
>> 
>> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> index 5d6d8cd634..7b177686b4 100644
>> --- a/tests/qtest/migration-test.c
>> +++ b/tests/qtest/migration-test.c
>> @@ -2081,6 +2081,63 @@ static void test_precopy_file(void)
>>      test_file_common(&args, true);
>>  }
>>  
>> +#ifndef _WIN32
>> +static void file_dirty_offset_region(void)
>> +{
>> +#if defined(__linux__)
>
> Hmm, what's the case to cover when !_WIN32 && __linux__?  Can we remove one
> layer of ifdef?
>
> I'm also wondering why it can't work on win32?  I thought win32 has all
> these stuff we used here, but I may miss something.
>

__linux__ is because of mmap, !_WIN32 is because of the passing of
fds. We might be able to keep !_WIN32 only, I'll check.

>> +    g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, FILE_TEST_FILENAME);
>> +    size_t size = FILE_TEST_OFFSET;
>> +    uintptr_t *addr, *p;
>> +    int fd;
>> +
>> +    fd = open(path, O_CREAT | O_RDWR, 0660);
>> +    g_assert(fd != -1);
>> +
>> +    g_assert(!ftruncate(fd, size));
>> +
>> +    addr = mmap(NULL, size, PROT_WRITE, MAP_SHARED, fd, 0);
>> +    g_assert(addr != MAP_FAILED);
>> +
>> +    /* ensure the skipped offset contains some data */
>> +    p = addr;
>> +    while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
>> +        *p = (unsigned long) FILE_TEST_FILENAME;
>
> This is fine, but not as clear what is assigned..  I think here we assigned
> is the pointer pointing to the binary's RO section (rather than the chars).

Haha you're right, I was assigning the FILE_TEST_OFFSET previously and
just switched to the FILENAME without thinking. I'll fix it up.

> Maybe using some random numbers would be more straightforward, but no
> strong opinions.
>
>> +        p++;
>> +    }
>> +
>> +    munmap(addr, size);
>> +    fsync(fd);
>> +    close(fd);
>> +#endif
>> +}
>> +
>> +static void *file_offset_start_hook(QTestState *from, QTestState *to)
>> +{
>> +    g_autofree char *file = g_strdup_printf("%s/%s", tmpfs, FILE_TEST_FILENAME);
>> +    int src_flags = O_WRONLY;
>> +    int dst_flags = O_RDONLY;
>> +    int fds[2];
>> +
>> +    file_dirty_offset_region();
>> +
>> +    fds[0] = open(file, src_flags, 0660);
>> +    assert(fds[0] != -1);
>> +
>> +    fds[1] = open(file, dst_flags, 0660);
>> +    assert(fds[1] != -1);
>> +
>> +    qtest_qmp_fds_assert_success(from, &fds[0], 1, "{'execute': 'add-fd', "
>> +                                 "'arguments': {'fdset-id': 1}}");
>> +
>> +    qtest_qmp_fds_assert_success(to, &fds[1], 1, "{'execute': 'add-fd', "
>> +                                 "'arguments': {'fdset-id': 1}}");
>> +
>> +    close(fds[0]);
>> +    close(fds[1]);
>> +
>> +    return NULL;
>> +}
>> +
>>  static void file_offset_finish_hook(QTestState *from, QTestState *to,
>>                                      void *opaque)
>>  {
>> @@ -2096,12 +2153,12 @@ static void file_offset_finish_hook(QTestState *from, QTestState *to,
>>      g_assert(addr != MAP_FAILED);
>>  
>>      /*
>> -     * Ensure the skipped offset contains zeros and the migration
>> -     * stream starts at the right place.
>> +     * Ensure the skipped offset region's data has not been touched
>> +     * and the migration stream starts at the right place.
>>       */
>>      p = addr;
>>      while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
>> -        g_assert(*p == 0);
>> +        g_assert_cmpstr((char *) *p, ==, FILE_TEST_FILENAME);
>>          p++;
>>      }
>>      g_assert_cmpint(cpu_to_be64(*p) >> 32, ==, QEMU_VM_FILE_MAGIC);
>> @@ -2113,17 +2170,18 @@ static void file_offset_finish_hook(QTestState *from, QTestState *to,
>>  
>>  static void test_precopy_file_offset(void)
>>  {
>> -    g_autofree char *uri = g_strdup_printf("file:%s/%s,offset=%d", tmpfs,
>> -                                           FILE_TEST_FILENAME,
>> +    g_autofree char *uri = g_strdup_printf("file:/dev/fdset/1,offset=%d",
>>                                             FILE_TEST_OFFSET);
>
> Do we want to keep both tests to cover both normal file and fdsets?
>

I think the fdset + offset is the most complex in terms of requirements,
so I don't think we need to test the other one.

I'm actually already a bit concerned about the amount of tests we
have. I was even thinking of starting playing with some code coverage
tools and prune some of the tests if possible.

>>      MigrateCommon args = {
>>          .connect_uri = uri,
>>          .listen_uri = "defer",
>> +        .start_hook = file_offset_start_hook,
>>          .finish_hook = file_offset_finish_hook,
>>      };
>>  
>>      test_file_common(&args, false);
>>  }
>> +#endif
>>  
>>  static void test_precopy_file_offset_bad(void)
>>  {
>> @@ -3636,8 +3694,10 @@ int main(int argc, char **argv)
>>  
>>      migration_test_add("/migration/precopy/file",
>>                         test_precopy_file);
>> +#ifndef _WIN32
>>      migration_test_add("/migration/precopy/file/offset",
>>                         test_precopy_file_offset);
>> +#endif
>>      migration_test_add("/migration/precopy/file/offset/bad",
>>                         test_precopy_file_offset_bad);
>>  
>> -- 
>> 2.35.3
>> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/9] migration: Add direct-io parameter
  2024-05-03 18:05   ` Peter Xu
@ 2024-05-03 20:49     ` Fabiano Rosas
  2024-05-03 21:16       ` Peter Xu
  0 siblings, 1 reply; 57+ messages in thread
From: Fabiano Rosas @ 2024-05-03 20:49 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig, Eric Blake

Peter Xu <peterx@redhat.com> writes:

> On Fri, Apr 26, 2024 at 11:20:37AM -0300, Fabiano Rosas wrote:
>> Add the direct-io migration parameter that tells the migration code to
>> use O_DIRECT when opening the migration stream file whenever possible.
>> 
>> This is currently only used with the mapped-ram migration that has a
>> clear window guaranteed to perform aligned writes.
>> 
>> Acked-by: Markus Armbruster <armbru@redhat.com>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  include/qemu/osdep.h           |  2 ++
>>  migration/migration-hmp-cmds.c | 11 +++++++++++
>>  migration/options.c            | 30 ++++++++++++++++++++++++++++++
>>  migration/options.h            |  1 +
>>  qapi/migration.json            | 18 +++++++++++++++---
>>  util/osdep.c                   |  9 +++++++++
>>  6 files changed, 68 insertions(+), 3 deletions(-)
>> 
>> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
>> index c7053cdc2b..645c14a65d 100644
>> --- a/include/qemu/osdep.h
>> +++ b/include/qemu/osdep.h
>> @@ -612,6 +612,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive);
>>  bool qemu_has_ofd_lock(void);
>>  #endif
>>  
>> +bool qemu_has_direct_io(void);
>> +
>>  #if defined(__HAIKU__) && defined(__i386__)
>>  #define FMT_pid "%ld"
>>  #elif defined(WIN64)
>> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
>> index 7e96ae6ffd..8496a2b34e 100644
>> --- a/migration/migration-hmp-cmds.c
>> +++ b/migration/migration-hmp-cmds.c
>> @@ -397,6 +397,13 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
>>          monitor_printf(mon, "%s: %s\n",
>>              MigrationParameter_str(MIGRATION_PARAMETER_MODE),
>>              qapi_enum_lookup(&MigMode_lookup, params->mode));
>> +
>> +        if (params->has_direct_io) {
>> +            monitor_printf(mon, "%s: %s\n",
>> +                           MigrationParameter_str(
>> +                               MIGRATION_PARAMETER_DIRECT_IO),
>> +                           params->direct_io ? "on" : "off");
>> +        }
>
> This will be the first parameter to optionally display here.  I think it's
> a sign of misuse of has_direct_io field..
>
> IMHO has_direct_io should be best to be kept as "whether direct_io field is
> valid" and that's all of it.  It hopefully shouldn't contain more
> information than that, or otherwise it'll be another small challenge we
> need to overcome when we can remove all these has_* fields, and can also be
> easily overlooked.

I don't think I understand why we have those has_* fields. I thought my
usage of 'params->has_direct_io = qemu_has_direct_io()' was the correct
one, i.e. checking whether QEMU has any support for that parameter. Can
you help me out here?

>
> IMHO what we should do is assert has_direct_io==true here too, meanwhile...
>
>>      }
>>  
>>      qapi_free_MigrationParameters(params);
>> @@ -690,6 +697,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
>>          p->has_mode = true;
>>          visit_type_MigMode(v, param, &p->mode, &err);
>>          break;
>> +    case MIGRATION_PARAMETER_DIRECT_IO:
>> +        p->has_direct_io = true;
>> +        visit_type_bool(v, param, &p->direct_io, &err);
>> +        break;
>>      default:
>>          assert(0);
>>      }
>> diff --git a/migration/options.c b/migration/options.c
>> index 239f5ecfb4..ae464aa4f2 100644
>> --- a/migration/options.c
>> +++ b/migration/options.c
>> @@ -826,6 +826,22 @@ int migrate_decompress_threads(void)
>>      return s->parameters.decompress_threads;
>>  }
>>  
>> +bool migrate_direct_io(void)
>> +{
>> +    MigrationState *s = migrate_get_current();
>> +
>> +    /* For now O_DIRECT is only supported with mapped-ram */
>> +    if (!s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM]) {
>> +        return false;
>> +    }
>> +
>> +    if (s->parameters.has_direct_io) {
>> +        return s->parameters.direct_io;
>> +    }
>> +
>> +    return false;
>> +}
>> +
>>  uint64_t migrate_downtime_limit(void)
>>  {
>>      MigrationState *s = migrate_get_current();
>> @@ -1061,6 +1077,11 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
>>      params->has_zero_page_detection = true;
>>      params->zero_page_detection = s->parameters.zero_page_detection;
>>  
>> +    if (s->parameters.has_direct_io) {
>> +        params->has_direct_io = true;
>> +        params->direct_io = s->parameters.direct_io;
>> +    }
>> +
>>      return params;
>>  }
>>  
>> @@ -1097,6 +1118,7 @@ void migrate_params_init(MigrationParameters *params)
>>      params->has_vcpu_dirty_limit = true;
>>      params->has_mode = true;
>>      params->has_zero_page_detection = true;
>> +    params->has_direct_io = qemu_has_direct_io();
>>  }
>>  
>>  /*
>> @@ -1416,6 +1438,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
>>      if (params->has_zero_page_detection) {
>>          dest->zero_page_detection = params->zero_page_detection;
>>      }
>> +
>> +    if (params->has_direct_io) {
>> +        dest->direct_io = params->direct_io;
>
> .. do proper check here to make sure the current QEMU is built with direct
> IO support, then fail QMP migrate-set-parameters otherwise when someone
> tries to enable it on a QEMU that doesn't support it.

I'm already checking at migrate_params_init() with
qemu_has_direct_io(). But ok, you want to move it here... Is this
function the correct one instead of migrate_params_check()? I see these
TODO comments mentioning QAPI_CLONE(), we can't clone the object if this
one parameter needs special treatment. I might be getting all this
wrong, bear with me.

>
> Always displaying direct_io parameter also helps when we simply want to
> check qemu version and whether it supports this feature in general.
>

Makes sense.

>> +    }
>>  }
>>  
>>  static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
>> @@ -1570,6 +1596,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
>>      if (params->has_zero_page_detection) {
>>          s->parameters.zero_page_detection = params->zero_page_detection;
>>      }
>> +
>> +    if (params->has_direct_io) {
>> +        s->parameters.direct_io = params->direct_io;
>> +    }
>>  }
>>  
>>  void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
>> diff --git a/migration/options.h b/migration/options.h
>> index ab8199e207..aa5509cd2a 100644
>> --- a/migration/options.h
>> +++ b/migration/options.h
>> @@ -76,6 +76,7 @@ uint8_t migrate_cpu_throttle_increment(void);
>>  uint8_t migrate_cpu_throttle_initial(void);
>>  bool migrate_cpu_throttle_tailslow(void);
>>  int migrate_decompress_threads(void);
>> +bool migrate_direct_io(void);
>>  uint64_t migrate_downtime_limit(void);
>>  uint8_t migrate_max_cpu_throttle(void);
>>  uint64_t migrate_max_bandwidth(void);
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 8c65b90328..1a8a4b114c 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -914,6 +914,9 @@
>>  #     See description in @ZeroPageDetection.  Default is 'multifd'.
>>  #     (since 9.0)
>>  #
>> +# @direct-io: Open migration files with O_DIRECT when possible. This
>> +#     requires that the @mapped-ram capability is enabled. (since 9.1)
>
> Here it seems to imply setting direct-io=true will fail if mapped-ram not
> enabled, but in reality it's fine, it'll just be ignored.  I think that's
> the right thing to do to reduce correlation effects between params/caps
> (otherwise, when unset mapped-ram cap, we'll need to double check again to
> unset direct-io too; just cumbersome).
>
> I suggest we state the fact, that this field is ignored when mapped-ram
> capability is not enabled, rather than "requires mapped-ram".  Same to all
> the rest two places in qapi doc.
>

Ok.

>> +#
>>  # Features:
>>  #
>>  # @deprecated: Member @block-incremental is deprecated.  Use
>> @@ -948,7 +951,8 @@
>>             { 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] },
>>             'vcpu-dirty-limit',
>>             'mode',
>> -           'zero-page-detection'] }
>> +           'zero-page-detection',
>> +           'direct-io'] }
>>  
>>  ##
>>  # @MigrateSetParameters:
>> @@ -1122,6 +1126,9 @@
>>  #     See description in @ZeroPageDetection.  Default is 'multifd'.
>>  #     (since 9.0)
>>  #
>> +# @direct-io: Open migration files with O_DIRECT when possible. This
>> +#     requires that the @mapped-ram capability is enabled. (since 9.1)
>> +#
>>  # Features:
>>  #
>>  # @deprecated: Member @block-incremental is deprecated.  Use
>> @@ -1176,7 +1183,8 @@
>>                                              'features': [ 'unstable' ] },
>>              '*vcpu-dirty-limit': 'uint64',
>>              '*mode': 'MigMode',
>> -            '*zero-page-detection': 'ZeroPageDetection'} }
>> +            '*zero-page-detection': 'ZeroPageDetection',
>> +            '*direct-io': 'bool' } }
>>  
>>  ##
>>  # @migrate-set-parameters:
>> @@ -1354,6 +1362,9 @@
>>  #     See description in @ZeroPageDetection.  Default is 'multifd'.
>>  #     (since 9.0)
>>  #
>> +# @direct-io: Open migration files with O_DIRECT when possible. This
>> +#     requires that the @mapped-ram capability is enabled. (since 9.1)
>> +#
>>  # Features:
>>  #
>>  # @deprecated: Member @block-incremental is deprecated.  Use
>> @@ -1405,7 +1416,8 @@
>>                                              'features': [ 'unstable' ] },
>>              '*vcpu-dirty-limit': 'uint64',
>>              '*mode': 'MigMode',
>> -            '*zero-page-detection': 'ZeroPageDetection'} }
>> +            '*zero-page-detection': 'ZeroPageDetection',
>> +            '*direct-io': 'bool' } }
>>  
>>  ##
>>  # @query-migrate-parameters:
>> diff --git a/util/osdep.c b/util/osdep.c
>> index e996c4744a..d0227a60ab 100644
>> --- a/util/osdep.c
>> +++ b/util/osdep.c
>> @@ -277,6 +277,15 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive)
>>  }
>>  #endif
>>  
>> +bool qemu_has_direct_io(void)
>> +{
>> +#ifdef O_DIRECT
>> +    return true;
>> +#else
>> +    return false;
>> +#endif
>> +}
>> +
>>  static int qemu_open_cloexec(const char *name, int flags, mode_t mode)
>>  {
>>      int ret;
>> -- 
>> 2.35.3
>> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 5/9] migration/multifd: Add direct-io support
  2024-05-03 18:29   ` Peter Xu
@ 2024-05-03 20:54     ` Fabiano Rosas
  2024-05-03 21:18       ` Peter Xu
  0 siblings, 1 reply; 57+ messages in thread
From: Fabiano Rosas @ 2024-05-03 20:54 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig

Peter Xu <peterx@redhat.com> writes:

> On Fri, Apr 26, 2024 at 11:20:38AM -0300, Fabiano Rosas wrote:
>> When multifd is used along with mapped-ram, we can take benefit of a
>> filesystem that supports the O_DIRECT flag and perform direct I/O in
>> the multifd threads. This brings a significant performance improvement
>> because direct-io writes bypass the page cache which would otherwise
>> be thrashed by the multifd data which is unlikely to be needed again
>> in a short period of time.
>> 
>> To be able to use a multifd channel opened with O_DIRECT, we must
>> ensure that a certain aligment is used. Filesystems usually require a
>> block-size alignment for direct I/O. The way to achieve this is by
>> enabling the mapped-ram feature, which already aligns its I/O properly
>> (see MAPPED_RAM_FILE_OFFSET_ALIGNMENT at ram.c).
>> 
>> By setting O_DIRECT on the multifd channels, all writes to the same
>> file descriptor need to be aligned as well, even the ones that come
>> from outside multifd, such as the QEMUFile I/O from the main migration
>> code. This makes it impossible to use the same file descriptor for the
>> QEMUFile and for the multifd channels. The various flags and metadata
>> written by the main migration code will always be unaligned by virtue
>> of their small size. To workaround this issue, we'll require a second
>> file descriptor to be used exclusively for direct I/O.
>> 
>> The second file descriptor can be obtained by QEMU by re-opening the
>> migration file (already possible), or by being provided by the user or
>> management application (support to be added in future patches).
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  migration/file.c      | 22 +++++++++++++++++++---
>>  migration/migration.c | 23 +++++++++++++++++++++++
>>  2 files changed, 42 insertions(+), 3 deletions(-)
>> 
>> diff --git a/migration/file.c b/migration/file.c
>> index 8f30999400..b9265b14dd 100644
>> --- a/migration/file.c
>> +++ b/migration/file.c
>> @@ -83,17 +83,33 @@ void file_cleanup_outgoing_migration(void)
>>  
>>  bool file_send_channel_create(gpointer opaque, Error **errp)
>>  {
>> -    QIOChannelFile *ioc;
>> +    QIOChannelFile *ioc = NULL;
>>      int flags = O_WRONLY;
>> -    bool ret = true;
>> +    bool ret = false;
>> +
>> +    if (migrate_direct_io()) {
>> +#ifdef O_DIRECT
>> +        /*
>> +         * Enable O_DIRECT for the secondary channels. These are used
>> +         * for sending ram pages and writes should be guaranteed to be
>> +         * aligned to at least page size.
>> +         */
>> +        flags |= O_DIRECT;
>> +#else
>> +        error_setg(errp, "System does not support O_DIRECT");
>> +        error_append_hint(errp,
>> +                          "Try disabling direct-io migration capability\n");
>> +        goto out;
>> +#endif
>
> Hopefully if we can fail migrate-set-parameters correctly always, we will
> never trigger this error.
>
> I know Linux used some trick like this to even avoid such ifdefs:
>
>   if (qemu_has_direct_io() && migrate_direct_io()) {
>       // reference O_DIRECT
>   }
>
> So as long as qemu_has_direct_io() can return a constant "false" when
> O_DIRECT not defined, the compiler is smart enough to ignore the O_DIRECT
> inside the block.
>
> Even if it won't work, we can still avoid that error (and rely on the
> set-parameter failure):
>
> #ifdef O_DIRECT
>        if (migrate_direct_io()) {
>            // reference O_DIRECT
>        }
> #endif
>
> Then it should run the same, just to try making ifdefs as light as
> possible..

Ok.

Just FYI, in v2 I'm adding direct-io to migration incoming side as well,
so I put this logic into a helper:

static bool file_enable_direct_io(int *flags, Error **errp)
{
    if (migrate_direct_io()) {
#ifdef O_DIRECT
        *flags |= O_DIRECT;
#else
        error_setg(errp, "System does not support O_DIRECT");
        error_append_hint(errp,
                          "Try disabling direct-io migration capability\n");
        return false;
#endif
    }

    return true;
}

But I'll apply your suggestions nonetheless.

>
>> +    }
>>  
>>      ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
>>      if (!ioc) {
>> -        ret = false;
>>          goto out;
>>      }
>>  
>>      multifd_channel_connect(opaque, QIO_CHANNEL(ioc));
>> +    ret = true;
>>  
>>  out:
>>      /*
>> diff --git a/migration/migration.c b/migration/migration.c
>> index b5af6b5105..cb923a3f62 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -155,6 +155,16 @@ static bool migration_needs_seekable_channel(void)
>>      return migrate_mapped_ram();
>>  }
>>  
>> +static bool migration_needs_multiple_fds(void)
>
> If I suggest to rename this, would you agree? :)
>

Sure, although this is the more accurate usage than "multifd" hehe.

> I'd try with "migrate_needs_extra_fd()" or "migrate_needs_two_fds()",
> or... just to avoid "multi" + "fd" used altogether, perhaps.
>
> Other than that looks all good.
>
> Thanks,
>
>> +{
>> +    /*
>> +     * When doing direct-io, multifd requires two different,
>> +     * non-duplicated file descriptors so we can use one of them for
>> +     * unaligned IO.
>> +     */
>> +    return migrate_multifd() && migrate_direct_io();
>> +}
>> +
>>  static bool transport_supports_seeking(MigrationAddress *addr)
>>  {
>>      if (addr->transport == MIGRATION_ADDRESS_TYPE_FILE) {
>> @@ -164,6 +174,12 @@ static bool transport_supports_seeking(MigrationAddress *addr)
>>      return false;
>>  }
>>  
>> +static bool transport_supports_multiple_fds(MigrationAddress *addr)
>> +{
>> +    /* file: works because QEMU can open it multiple times */
>> +    return addr->transport == MIGRATION_ADDRESS_TYPE_FILE;
>> +}
>> +
>>  static bool
>>  migration_channels_and_transport_compatible(MigrationAddress *addr,
>>                                              Error **errp)
>> @@ -180,6 +196,13 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
>>          return false;
>>      }
>>  
>> +    if (migration_needs_multiple_fds() &&
>> +        !transport_supports_multiple_fds(addr)) {
>> +        error_setg(errp,
>> +                   "Migration requires a transport that allows for multiple fds (e.g. file)");
>> +        return false;
>> +    }
>> +
>>      return true;
>>  }
>>  
>> -- 
>> 2.35.3
>> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/9] migration: Fix file migration with fdset
  2024-05-03 19:56     ` Fabiano Rosas
@ 2024-05-03 21:04       ` Peter Xu
  2024-05-03 21:31         ` Fabiano Rosas
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Xu @ 2024-05-03 21:04 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig

On Fri, May 03, 2024 at 04:56:08PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
> >> When the migration using the "file:" URI was implemented, I don't
> >> think any of us noticed that if you pass in a file name with the
> >> format "/dev/fdset/N", this allows a file descriptor to be passed in
> >> to QEMU and that behaves just like the "fd:" URI. So the "file:"
> >> support has been added without regard for the fdset part and we got
> >> some things wrong.
> >> 
> >> The first issue is that we should not truncate the migration file if
> >> we're allowing an fd + offset. We need to leave the file contents
> >> untouched.
> >
> > I'm wondering whether we can use fallocate() instead on the ranges so that
> > we always don't open() with O_TRUNC.  Before that..  could you remind me
> > why do we need to truncate in the first place?  I definitely missed
> > something else here too.
> 
> AFAIK, just to avoid any issues if the file is pre-existing. I don't see
> the difference between O_TRUNC and fallocate in this case.

Then, shall we avoid truncations at all, leaving all the feasibility to
user (also errors prone to make)?

> 
> >
> >> 
> >> The second issue is that there's an expectation that QEMU removes the
> >> fd after the migration has finished. That's what the "fd:" code
> >> does. Otherwise a second migration on the same VM could attempt to
> >> provide an fdset with the same name and QEMU would reject it.
> >
> > Let me check what we do when with "fd:" and when migration completes or
> > cancels.
> >
> > IIUC it's qio_channel_file_close() that does the final cleanup work on
> > e.g. to_dst_file, right?  Then there's qemu_close(), and it has:
> >
> >     /* Close fd that was dup'd from an fdset */
> >     fdset_id = monitor_fdset_dup_fd_find(fd);
> >     if (fdset_id != -1) {
> >         int ret;
> >
> >         ret = close(fd);
> >         if (ret == 0) {
> >             monitor_fdset_dup_fd_remove(fd);
> >         }
> >
> >         return ret;
> >     }
> >
> > Shouldn't this done the work already?
> 
> That removes the mon_fdset_fd_dup->fd, we want to remove the
> mon_fdset_fd->fd.

What I read so far is when we are removing the dup-fds, we'll do one more
thing:

monitor_fdset_dup_fd_find_remove():
                    if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
                        monitor_fdset_cleanup(mon_fdset);
                    }

It means if we removed all the dup-fds correctly, we should also remove the
whole fdset, which includes the ->fds, IIUC.

> 
> >
> > Off topic: I think this code is over complicated too, maybe I missed
> > something, but afaict we don't need monitor_fdset_dup_fd_find at all.. we
> > simply walk the list and remove stuff..  I attach a patch at the end that I
> > tried to clean that up, just in case there's early comments.  But we can
> > ignore that so we don't get side-tracked, and focus on the direct-io
> > issues.
> 
> Well, I'm not confident touching this code. This is more than a decade
> old, I have no idea what the original motivations were. The possible
> interactions with the user via command-line (-add-fd), QMP (add-fd) and
> the monitor lifetime make me confused. Not to mention the fdset part
> being plumbed into the guts of a widely used qemu_open_internal() that
> very misleadingly presents itself as just a wrapper for open().

If to make QEMU long live, we'll probably need to touch it at some
point.. or at least discuss about it and figure things out. We pay tech
debts like this when there's no good comment / docs to refer in this case,
then the earlier, perhaps also the better.. to try taking the stab, imho.

Definitely not a request to clean everything up. :) Let's see whether
others can chim in with better knowledge of the history.

> 
> >
> > Thanks,
> >
> > =======
> >
> > From 2f6b6d1224486d8ee830a7afe34738a07003b863 Mon Sep 17 00:00:00 2001
> > From: Peter Xu <peterx@redhat.com>
> > Date: Fri, 3 May 2024 11:27:20 -0400
> > Subject: [PATCH] monitor: Drop monitor_fdset_dup_fd_add()
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> >
> > This function is not needed, one remove function should already work.
> > Clean it up.
> >
> > Here the code doesn't really care about whether we need to keep that dupfd
> > around if close() failed: when that happens something got very wrong,
> > keeping the dup_fd around the fdsets may not help that situation so far.
> >
> > Cc: Dr. David Alan Gilbert <dave@treblig.org>
> > Cc: Markus Armbruster <armbru@redhat.com>
> > Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Daniel P. Berrangé <berrange@redhat.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  include/monitor/monitor.h |  1 -
> >  monitor/fds.c             | 27 +++++----------------------
> >  stubs/fdset.c             |  5 -----
> >  util/osdep.c              | 15 +--------------
> >  4 files changed, 6 insertions(+), 42 deletions(-)
> >
> > diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> > index 965f5d5450..fd9b3f538c 100644
> > --- a/include/monitor/monitor.h
> > +++ b/include/monitor/monitor.h
> > @@ -53,7 +53,6 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, int64_t fdset_id,
> >                                  const char *opaque, Error **errp);
> >  int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags);
> >  void monitor_fdset_dup_fd_remove(int dup_fd);
> > -int64_t monitor_fdset_dup_fd_find(int dup_fd);
> >  
> >  void monitor_register_hmp(const char *name, bool info,
> >                            void (*cmd)(Monitor *mon, const QDict *qdict));
> > diff --git a/monitor/fds.c b/monitor/fds.c
> > index d86c2c674c..d5aecfb70e 100644
> > --- a/monitor/fds.c
> > +++ b/monitor/fds.c
> > @@ -458,7 +458,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
> >  #endif
> >  }
> >  
> > -static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
> > +void monitor_fdset_dup_fd_remove(int dup_fd)
> >  {
> >      MonFdset *mon_fdset;
> >      MonFdsetFd *mon_fdset_fd_dup;
> > @@ -467,31 +467,14 @@ static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
> >      QLIST_FOREACH(mon_fdset, &mon_fdsets, next) {
> >          QLIST_FOREACH(mon_fdset_fd_dup, &mon_fdset->dup_fds, next) {
> >              if (mon_fdset_fd_dup->fd == dup_fd) {
> > -                if (remove) {
> > -                    QLIST_REMOVE(mon_fdset_fd_dup, next);
> > -                    g_free(mon_fdset_fd_dup);
> > -                    if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
> > -                        monitor_fdset_cleanup(mon_fdset);
> > -                    }
> > -                    return -1;
> > -                } else {
> > -                    return mon_fdset->id;
> > +                QLIST_REMOVE(mon_fdset_fd_dup, next);
> > +                g_free(mon_fdset_fd_dup);
> > +                if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
> > +                    monitor_fdset_cleanup(mon_fdset);
> >                  }
> >              }
> >          }
> >      }
> > -
> > -    return -1;
> > -}
> > -
> > -int64_t monitor_fdset_dup_fd_find(int dup_fd)
> > -{
> > -    return monitor_fdset_dup_fd_find_remove(dup_fd, false);
> > -}
> > -
> > -void monitor_fdset_dup_fd_remove(int dup_fd)
> > -{
> > -    monitor_fdset_dup_fd_find_remove(dup_fd, true);
> >  }
> >  
> >  int monitor_fd_param(Monitor *mon, const char *fdname, Error **errp)
> > diff --git a/stubs/fdset.c b/stubs/fdset.c
> > index d7c39a28ac..389e368a29 100644
> > --- a/stubs/fdset.c
> > +++ b/stubs/fdset.c
> > @@ -9,11 +9,6 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
> >      return -1;
> >  }
> >  
> > -int64_t monitor_fdset_dup_fd_find(int dup_fd)
> > -{
> > -    return -1;
> > -}
> > -
> >  void monitor_fdset_dup_fd_remove(int dupfd)
> >  {
> >  }
> > diff --git a/util/osdep.c b/util/osdep.c
> > index e996c4744a..2d9749d060 100644
> > --- a/util/osdep.c
> > +++ b/util/osdep.c
> > @@ -393,21 +393,8 @@ int qemu_open_old(const char *name, int flags, ...)
> >  
> >  int qemu_close(int fd)
> >  {
> > -    int64_t fdset_id;
> > -
> >      /* Close fd that was dup'd from an fdset */
> > -    fdset_id = monitor_fdset_dup_fd_find(fd);
> > -    if (fdset_id != -1) {
> > -        int ret;
> > -
> > -        ret = close(fd);
> > -        if (ret == 0) {
> > -            monitor_fdset_dup_fd_remove(fd);
> > -        }
> > -
> > -        return ret;
> > -    }
> > -
> > +    monitor_fdset_dup_fd_remove(fd);
> >      return close(fd);
> >  }
> >  
> > -- 
> > 2.44.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 6/9] tests/qtest/migration: Add tests for file migration with direct-io
  2024-05-03 18:38   ` Peter Xu
@ 2024-05-03 21:05     ` Fabiano Rosas
  2024-05-03 21:25       ` Peter Xu
  0 siblings, 1 reply; 57+ messages in thread
From: Fabiano Rosas @ 2024-05-03 21:05 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig,
	Thomas Huth, Laurent Vivier, Paolo Bonzini

Peter Xu <peterx@redhat.com> writes:

> On Fri, Apr 26, 2024 at 11:20:39AM -0300, Fabiano Rosas wrote:
>> The tests are only allowed to run in systems that know about the
>> O_DIRECT flag and in filesystems which support it.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>
> Mostly:
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
>
> Two trivial comments below.
>
>> ---
>>  tests/qtest/migration-helpers.c | 42 +++++++++++++++++++++++++++++++++
>>  tests/qtest/migration-helpers.h |  1 +
>>  tests/qtest/migration-test.c    | 42 +++++++++++++++++++++++++++++++++
>>  3 files changed, 85 insertions(+)
>> 
>> diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
>> index ce6d6615b5..356cd4fa8c 100644
>> --- a/tests/qtest/migration-helpers.c
>> +++ b/tests/qtest/migration-helpers.c
>> @@ -473,3 +473,45 @@ void migration_test_add(const char *path, void (*fn)(void))
>>      qtest_add_data_func_full(path, test, migration_test_wrapper,
>>                               migration_test_destroy);
>>  }
>> +
>> +#ifdef O_DIRECT
>> +/*
>> + * Probe for O_DIRECT support on the filesystem. Since this is used
>> + * for tests, be conservative, if anything fails, assume it's
>> + * unsupported.
>> + */
>> +bool probe_o_direct_support(const char *tmpfs)
>> +{
>> +    g_autofree char *filename = g_strdup_printf("%s/probe-o-direct", tmpfs);
>> +    int fd, flags = O_CREAT | O_RDWR | O_TRUNC | O_DIRECT;
>> +    void *buf;
>> +    ssize_t ret, len;
>> +    uint64_t offset;
>> +
>> +    fd = open(filename, flags, 0660);
>> +    if (fd < 0) {
>> +        unlink(filename);
>> +        return false;
>> +    }
>> +
>> +    /*
>> +     * Assuming 4k should be enough to satisfy O_DIRECT alignment
>> +     * requirements. The migration code uses 1M to be conservative.
>> +     */
>> +    len = 0x100000;
>> +    offset = 0x100000;
>> +
>> +    buf = aligned_alloc(len, len);
>
> This is the first usage of aligned_alloc() in qemu.  IIUC it's just a newer
> posix_memalign(), which QEMU has one use of, and it's protected with:
>
> #if defined(CONFIG_POSIX_MEMALIGN)
>     int ret;
>     ret = posix_memalign(&ptr, alignment, size);
>     ...
> #endif
>
> Didn't check deeper.  Just keep this in mind if you see any compilation
> issues in future CIs, or simply switch to similar pattern.
>
>> +    g_assert(buf);
>> +
>> +    ret = pwrite(fd, buf, len, offset);
>> +    unlink(filename);
>> +    g_free(buf);
>> +
>> +    if (ret < 0) {
>> +        return false;
>> +    }
>> +
>> +    return true;
>> +}
>> +#endif
>> diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
>> index 1339835698..d827e16145 100644
>> --- a/tests/qtest/migration-helpers.h
>> +++ b/tests/qtest/migration-helpers.h
>> @@ -54,5 +54,6 @@ char *find_common_machine_version(const char *mtype, const char *var1,
>>                                    const char *var2);
>>  char *resolve_machine_version(const char *alias, const char *var1,
>>                                const char *var2);
>> +bool probe_o_direct_support(const char *tmpfs);
>>  void migration_test_add(const char *path, void (*fn)(void));
>>  #endif /* MIGRATION_HELPERS_H */
>> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> index 7b177686b4..512b7ede8b 100644
>> --- a/tests/qtest/migration-test.c
>> +++ b/tests/qtest/migration-test.c
>> @@ -2295,6 +2295,43 @@ static void test_multifd_file_mapped_ram(void)
>>      test_file_common(&args, true);
>>  }
>>  
>> +#ifdef O_DIRECT
>> +static void *migrate_mapped_ram_dio_start(QTestState *from,
>> +                                                 QTestState *to)
>> +{
>> +    migrate_mapped_ram_start(from, to);
>
> This line seems redundant, migrate_multifd_mapped_ram_start() should cover that.
>

This is an artifact of another patch that adds direct-io + mapped-ram
without multifd. I'm bringing that back on v2. We were having a
discussion[1] about it in the libvirt mailing list. Having direct-io
even without multifd might still be useful for libvirt.

1- https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/K4BDDJDMJ22XMJEFAUE323H5S5E47VQX/

>> +    migrate_set_parameter_bool(from, "direct-io", true);
>> +    migrate_set_parameter_bool(to, "direct-io", true);
>> +
>> +    return NULL;
>> +}
>> +
>> +static void *migrate_multifd_mapped_ram_dio_start(QTestState *from,
>> +                                                 QTestState *to)
>> +{
>> +    migrate_multifd_mapped_ram_start(from, to);
>> +    return migrate_mapped_ram_dio_start(from, to);
>> +}
>> +
>> +static void test_multifd_file_mapped_ram_dio(void)
>> +{
>> +    g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
>> +                                           FILE_TEST_FILENAME);
>> +    MigrateCommon args = {
>> +        .connect_uri = uri,
>> +        .listen_uri = "defer",
>> +        .start_hook = migrate_multifd_mapped_ram_dio_start,
>> +    };
>> +
>> +    if (!probe_o_direct_support(tmpfs)) {
>> +        g_test_skip("Filesystem does not support O_DIRECT");
>> +        return;
>> +    }
>> +
>> +    test_file_common(&args, true);
>> +}
>> +
>> +#endif /* O_DIRECT */
>>  
>>  static void test_precopy_tcp_plain(void)
>>  {
>> @@ -3719,6 +3756,11 @@ int main(int argc, char **argv)
>>      migration_test_add("/migration/multifd/file/mapped-ram/live",
>>                         test_multifd_file_mapped_ram_live);
>>  
>> +#ifdef O_DIRECT
>> +    migration_test_add("/migration/multifd/file/mapped-ram/dio",
>> +                       test_multifd_file_mapped_ram_dio);
>> +#endif
>> +
>>  #ifdef CONFIG_GNUTLS
>>      migration_test_add("/migration/precopy/unix/tls/psk",
>>                         test_precopy_unix_tls_psk);
>> -- 
>> 2.35.3
>> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 3/9] tests/qtest/migration: Fix file migration offset check
  2024-05-03 20:36     ` Fabiano Rosas
@ 2024-05-03 21:08       ` Peter Xu
  2024-05-08  8:10       ` Daniel P. Berrangé
  1 sibling, 0 replies; 57+ messages in thread
From: Peter Xu @ 2024-05-03 21:08 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig,
	Thomas Huth, Laurent Vivier, Paolo Bonzini

On Fri, May 03, 2024 at 05:36:59PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Fri, Apr 26, 2024 at 11:20:36AM -0300, Fabiano Rosas wrote:
> >> When doing file migration, QEMU accepts an offset that should be
> >> skipped when writing the migration stream to the file. The purpose of
> >> the offset is to allow the management layer to put its own metadata at
> >> the start of the file.
> >> 
> >> We have tests for this in migration-test, but only testing that the
> >> migration stream starts at the correct offset and not that it actually
> >> leaves the data intact. Unsurprisingly, there's been a bug in that
> >> area that the tests didn't catch.
> >> 
> >> Fix the tests to write some data to the offset region and check that
> >> it's actually there after the migration.
> >> 
> >> Fixes: 3dc35470c8 ("tests/qtest: migration-test: Add tests for file-based migration")
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> ---
> >>  tests/qtest/migration-test.c | 70 +++++++++++++++++++++++++++++++++---
> >>  1 file changed, 65 insertions(+), 5 deletions(-)
> >> 
> >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> >> index 5d6d8cd634..7b177686b4 100644
> >> --- a/tests/qtest/migration-test.c
> >> +++ b/tests/qtest/migration-test.c
> >> @@ -2081,6 +2081,63 @@ static void test_precopy_file(void)
> >>      test_file_common(&args, true);
> >>  }
> >>  
> >> +#ifndef _WIN32
> >> +static void file_dirty_offset_region(void)
> >> +{
> >> +#if defined(__linux__)
> >
> > Hmm, what's the case to cover when !_WIN32 && __linux__?  Can we remove one
> > layer of ifdef?
> >
> > I'm also wondering why it can't work on win32?  I thought win32 has all
> > these stuff we used here, but I may miss something.
> >
> 
> __linux__ is because of mmap, !_WIN32 is because of the passing of
> fds. We might be able to keep !_WIN32 only, I'll check.

Thanks, or simply use __linux__; we don't lose that much if test less on
very special hosts.  Just feel a bit over-engineer to use two ifdefs for
one such test.

> 
> >> +    g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, FILE_TEST_FILENAME);
> >> +    size_t size = FILE_TEST_OFFSET;
> >> +    uintptr_t *addr, *p;
> >> +    int fd;
> >> +
> >> +    fd = open(path, O_CREAT | O_RDWR, 0660);
> >> +    g_assert(fd != -1);
> >> +
> >> +    g_assert(!ftruncate(fd, size));
> >> +
> >> +    addr = mmap(NULL, size, PROT_WRITE, MAP_SHARED, fd, 0);
> >> +    g_assert(addr != MAP_FAILED);
> >> +
> >> +    /* ensure the skipped offset contains some data */
> >> +    p = addr;
> >> +    while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
> >> +        *p = (unsigned long) FILE_TEST_FILENAME;
> >
> > This is fine, but not as clear what is assigned..  I think here we assigned
> > is the pointer pointing to the binary's RO section (rather than the chars).
> 
> Haha you're right, I was assigning the FILE_TEST_OFFSET previously and
> just switched to the FILENAME without thinking. I'll fix it up.

:)

> 
> > Maybe using some random numbers would be more straightforward, but no
> > strong opinions.
> >
> >> +        p++;
> >> +    }
> >> +
> >> +    munmap(addr, size);
> >> +    fsync(fd);
> >> +    close(fd);
> >> +#endif
> >> +}
> >> +
> >> +static void *file_offset_start_hook(QTestState *from, QTestState *to)
> >> +{
> >> +    g_autofree char *file = g_strdup_printf("%s/%s", tmpfs, FILE_TEST_FILENAME);
> >> +    int src_flags = O_WRONLY;
> >> +    int dst_flags = O_RDONLY;
> >> +    int fds[2];
> >> +
> >> +    file_dirty_offset_region();
> >> +
> >> +    fds[0] = open(file, src_flags, 0660);
> >> +    assert(fds[0] != -1);
> >> +
> >> +    fds[1] = open(file, dst_flags, 0660);
> >> +    assert(fds[1] != -1);
> >> +
> >> +    qtest_qmp_fds_assert_success(from, &fds[0], 1, "{'execute': 'add-fd', "
> >> +                                 "'arguments': {'fdset-id': 1}}");
> >> +
> >> +    qtest_qmp_fds_assert_success(to, &fds[1], 1, "{'execute': 'add-fd', "
> >> +                                 "'arguments': {'fdset-id': 1}}");
> >> +
> >> +    close(fds[0]);
> >> +    close(fds[1]);
> >> +
> >> +    return NULL;
> >> +}
> >> +
> >>  static void file_offset_finish_hook(QTestState *from, QTestState *to,
> >>                                      void *opaque)
> >>  {
> >> @@ -2096,12 +2153,12 @@ static void file_offset_finish_hook(QTestState *from, QTestState *to,
> >>      g_assert(addr != MAP_FAILED);
> >>  
> >>      /*
> >> -     * Ensure the skipped offset contains zeros and the migration
> >> -     * stream starts at the right place.
> >> +     * Ensure the skipped offset region's data has not been touched
> >> +     * and the migration stream starts at the right place.
> >>       */
> >>      p = addr;
> >>      while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
> >> -        g_assert(*p == 0);
> >> +        g_assert_cmpstr((char *) *p, ==, FILE_TEST_FILENAME);
> >>          p++;
> >>      }
> >>      g_assert_cmpint(cpu_to_be64(*p) >> 32, ==, QEMU_VM_FILE_MAGIC);
> >> @@ -2113,17 +2170,18 @@ static void file_offset_finish_hook(QTestState *from, QTestState *to,
> >>  
> >>  static void test_precopy_file_offset(void)
> >>  {
> >> -    g_autofree char *uri = g_strdup_printf("file:%s/%s,offset=%d", tmpfs,
> >> -                                           FILE_TEST_FILENAME,
> >> +    g_autofree char *uri = g_strdup_printf("file:/dev/fdset/1,offset=%d",
> >>                                             FILE_TEST_OFFSET);
> >
> > Do we want to keep both tests to cover both normal file and fdsets?
> >
> 
> I think the fdset + offset is the most complex in terms of requirements,
> so I don't think we need to test the other one.

They will still cover different qemu code paths, right?  Even if only
slightly different.

> 
> I'm actually already a bit concerned about the amount of tests we
> have. I was even thinking of starting playing with some code coverage
> tools and prune some of the tests if possible.

IMHO we don't need to drop any test, but if / when we find it runs too
slow, we either:

  - try to speed it up - I never tried, but I _feel_ like I can make it
    faster in some way, just like when Dan used to do with reducing
    migration-test runtimes, perhaps from different angles, or

  - mark more tests optional to run by default, then we use getenv() to
    select those.

Said that, what you're exploring sounds interesting irrelevant.

> 
> >>      MigrateCommon args = {
> >>          .connect_uri = uri,
> >>          .listen_uri = "defer",
> >> +        .start_hook = file_offset_start_hook,
> >>          .finish_hook = file_offset_finish_hook,
> >>      };
> >>  
> >>      test_file_common(&args, false);
> >>  }
> >> +#endif
> >>  
> >>  static void test_precopy_file_offset_bad(void)
> >>  {
> >> @@ -3636,8 +3694,10 @@ int main(int argc, char **argv)
> >>  
> >>      migration_test_add("/migration/precopy/file",
> >>                         test_precopy_file);
> >> +#ifndef _WIN32
> >>      migration_test_add("/migration/precopy/file/offset",
> >>                         test_precopy_file_offset);
> >> +#endif
> >>      migration_test_add("/migration/precopy/file/offset/bad",
> >>                         test_precopy_file_offset_bad);
> >>  
> >> -- 
> >> 2.35.3
> >> 
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/9] migration: Add direct-io parameter
  2024-05-03 20:49     ` Fabiano Rosas
@ 2024-05-03 21:16       ` Peter Xu
  2024-05-14 14:10         ` Markus Armbruster
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Xu @ 2024-05-03 21:16 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig, Eric Blake

On Fri, May 03, 2024 at 05:49:32PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Fri, Apr 26, 2024 at 11:20:37AM -0300, Fabiano Rosas wrote:
> >> Add the direct-io migration parameter that tells the migration code to
> >> use O_DIRECT when opening the migration stream file whenever possible.
> >> 
> >> This is currently only used with the mapped-ram migration that has a
> >> clear window guaranteed to perform aligned writes.
> >> 
> >> Acked-by: Markus Armbruster <armbru@redhat.com>
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> ---
> >>  include/qemu/osdep.h           |  2 ++
> >>  migration/migration-hmp-cmds.c | 11 +++++++++++
> >>  migration/options.c            | 30 ++++++++++++++++++++++++++++++
> >>  migration/options.h            |  1 +
> >>  qapi/migration.json            | 18 +++++++++++++++---
> >>  util/osdep.c                   |  9 +++++++++
> >>  6 files changed, 68 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> >> index c7053cdc2b..645c14a65d 100644
> >> --- a/include/qemu/osdep.h
> >> +++ b/include/qemu/osdep.h
> >> @@ -612,6 +612,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive);
> >>  bool qemu_has_ofd_lock(void);
> >>  #endif
> >>  
> >> +bool qemu_has_direct_io(void);
> >> +
> >>  #if defined(__HAIKU__) && defined(__i386__)
> >>  #define FMT_pid "%ld"
> >>  #elif defined(WIN64)
> >> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> >> index 7e96ae6ffd..8496a2b34e 100644
> >> --- a/migration/migration-hmp-cmds.c
> >> +++ b/migration/migration-hmp-cmds.c
> >> @@ -397,6 +397,13 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
> >>          monitor_printf(mon, "%s: %s\n",
> >>              MigrationParameter_str(MIGRATION_PARAMETER_MODE),
> >>              qapi_enum_lookup(&MigMode_lookup, params->mode));
> >> +
> >> +        if (params->has_direct_io) {
> >> +            monitor_printf(mon, "%s: %s\n",
> >> +                           MigrationParameter_str(
> >> +                               MIGRATION_PARAMETER_DIRECT_IO),
> >> +                           params->direct_io ? "on" : "off");
> >> +        }
> >
> > This will be the first parameter to optionally display here.  I think it's
> > a sign of misuse of has_direct_io field..
> >
> > IMHO has_direct_io should be best to be kept as "whether direct_io field is
> > valid" and that's all of it.  It hopefully shouldn't contain more
> > information than that, or otherwise it'll be another small challenge we
> > need to overcome when we can remove all these has_* fields, and can also be
> > easily overlooked.
> 
> I don't think I understand why we have those has_* fields. I thought my
> usage of 'params->has_direct_io = qemu_has_direct_io()' was the correct
> one, i.e. checking whether QEMU has any support for that parameter. Can
> you help me out here?

Here params is the pointer to "struct MigrationParameters", which is
defined in qapi/migration.json.  And we have had "has_*" only because we
allow optional fields with asterisks:

  { 'struct': 'MigrationParameters',
    'data': { '*announce-initial': 'size',
              ...
              } }

So that's why it better only means "whether this field existed", because
it's how it is defined.

IIRC we (or say, Markus) used to have some attempts deduplicates those
*MigrationParameter* things, and if success we have chance to drop has_*
fields (in which case we simply always have them; that "has_" makes more
sense only if in a QMP session to allow user only specify one or more
things if not all).

> 
> >
> > IMHO what we should do is assert has_direct_io==true here too, meanwhile...
> >
> >>      }
> >>  
> >>      qapi_free_MigrationParameters(params);
> >> @@ -690,6 +697,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
> >>          p->has_mode = true;
> >>          visit_type_MigMode(v, param, &p->mode, &err);
> >>          break;
> >> +    case MIGRATION_PARAMETER_DIRECT_IO:
> >> +        p->has_direct_io = true;
> >> +        visit_type_bool(v, param, &p->direct_io, &err);
> >> +        break;
> >>      default:
> >>          assert(0);
> >>      }
> >> diff --git a/migration/options.c b/migration/options.c
> >> index 239f5ecfb4..ae464aa4f2 100644
> >> --- a/migration/options.c
> >> +++ b/migration/options.c
> >> @@ -826,6 +826,22 @@ int migrate_decompress_threads(void)
> >>      return s->parameters.decompress_threads;
> >>  }
> >>  
> >> +bool migrate_direct_io(void)
> >> +{
> >> +    MigrationState *s = migrate_get_current();
> >> +
> >> +    /* For now O_DIRECT is only supported with mapped-ram */
> >> +    if (!s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM]) {
> >> +        return false;
> >> +    }
> >> +
> >> +    if (s->parameters.has_direct_io) {
> >> +        return s->parameters.direct_io;
> >> +    }
> >> +
> >> +    return false;
> >> +}
> >> +
> >>  uint64_t migrate_downtime_limit(void)
> >>  {
> >>      MigrationState *s = migrate_get_current();
> >> @@ -1061,6 +1077,11 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
> >>      params->has_zero_page_detection = true;
> >>      params->zero_page_detection = s->parameters.zero_page_detection;
> >>  
> >> +    if (s->parameters.has_direct_io) {
> >> +        params->has_direct_io = true;
> >> +        params->direct_io = s->parameters.direct_io;
> >> +    }
> >> +
> >>      return params;
> >>  }
> >>  
> >> @@ -1097,6 +1118,7 @@ void migrate_params_init(MigrationParameters *params)
> >>      params->has_vcpu_dirty_limit = true;
> >>      params->has_mode = true;
> >>      params->has_zero_page_detection = true;
> >> +    params->has_direct_io = qemu_has_direct_io();
> >>  }
> >>  
> >>  /*
> >> @@ -1416,6 +1438,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
> >>      if (params->has_zero_page_detection) {
> >>          dest->zero_page_detection = params->zero_page_detection;
> >>      }
> >> +
> >> +    if (params->has_direct_io) {
> >> +        dest->direct_io = params->direct_io;
> >
> > .. do proper check here to make sure the current QEMU is built with direct
> > IO support, then fail QMP migrate-set-parameters otherwise when someone
> > tries to enable it on a QEMU that doesn't support it.
> 
> I'm already checking at migrate_params_init() with
> qemu_has_direct_io(). But ok, you want to move it here... Is this
> function the correct one instead of migrate_params_check()? I see these

Oh I perhaps commented on the wrong line.  migrate_params_check() is the
place where we should throw such error and check for O_DIRECT for sure..

> TODO comments mentioning QAPI_CLONE(), we can't clone the object if this
> one parameter needs special treatment. I might be getting all this
> wrong, bear with me.

Nah, I think I just wanted to comment inside migrate_params_check() but I
did it all wrong, sorry.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 5/9] migration/multifd: Add direct-io support
  2024-05-03 20:54     ` Fabiano Rosas
@ 2024-05-03 21:18       ` Peter Xu
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Xu @ 2024-05-03 21:18 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig

On Fri, May 03, 2024 at 05:54:28PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Fri, Apr 26, 2024 at 11:20:38AM -0300, Fabiano Rosas wrote:
> >> When multifd is used along with mapped-ram, we can take benefit of a
> >> filesystem that supports the O_DIRECT flag and perform direct I/O in
> >> the multifd threads. This brings a significant performance improvement
> >> because direct-io writes bypass the page cache which would otherwise
> >> be thrashed by the multifd data which is unlikely to be needed again
> >> in a short period of time.
> >> 
> >> To be able to use a multifd channel opened with O_DIRECT, we must
> >> ensure that a certain aligment is used. Filesystems usually require a
> >> block-size alignment for direct I/O. The way to achieve this is by
> >> enabling the mapped-ram feature, which already aligns its I/O properly
> >> (see MAPPED_RAM_FILE_OFFSET_ALIGNMENT at ram.c).
> >> 
> >> By setting O_DIRECT on the multifd channels, all writes to the same
> >> file descriptor need to be aligned as well, even the ones that come
> >> from outside multifd, such as the QEMUFile I/O from the main migration
> >> code. This makes it impossible to use the same file descriptor for the
> >> QEMUFile and for the multifd channels. The various flags and metadata
> >> written by the main migration code will always be unaligned by virtue
> >> of their small size. To workaround this issue, we'll require a second
> >> file descriptor to be used exclusively for direct I/O.
> >> 
> >> The second file descriptor can be obtained by QEMU by re-opening the
> >> migration file (already possible), or by being provided by the user or
> >> management application (support to be added in future patches).
> >> 
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> ---
> >>  migration/file.c      | 22 +++++++++++++++++++---
> >>  migration/migration.c | 23 +++++++++++++++++++++++
> >>  2 files changed, 42 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/migration/file.c b/migration/file.c
> >> index 8f30999400..b9265b14dd 100644
> >> --- a/migration/file.c
> >> +++ b/migration/file.c
> >> @@ -83,17 +83,33 @@ void file_cleanup_outgoing_migration(void)
> >>  
> >>  bool file_send_channel_create(gpointer opaque, Error **errp)
> >>  {
> >> -    QIOChannelFile *ioc;
> >> +    QIOChannelFile *ioc = NULL;
> >>      int flags = O_WRONLY;
> >> -    bool ret = true;
> >> +    bool ret = false;
> >> +
> >> +    if (migrate_direct_io()) {
> >> +#ifdef O_DIRECT
> >> +        /*
> >> +         * Enable O_DIRECT for the secondary channels. These are used
> >> +         * for sending ram pages and writes should be guaranteed to be
> >> +         * aligned to at least page size.
> >> +         */
> >> +        flags |= O_DIRECT;
> >> +#else
> >> +        error_setg(errp, "System does not support O_DIRECT");
> >> +        error_append_hint(errp,
> >> +                          "Try disabling direct-io migration capability\n");
> >> +        goto out;
> >> +#endif
> >
> > Hopefully if we can fail migrate-set-parameters correctly always, we will
> > never trigger this error.
> >
> > I know Linux used some trick like this to even avoid such ifdefs:
> >
> >   if (qemu_has_direct_io() && migrate_direct_io()) {
> >       // reference O_DIRECT
> >   }
> >
> > So as long as qemu_has_direct_io() can return a constant "false" when
> > O_DIRECT not defined, the compiler is smart enough to ignore the O_DIRECT
> > inside the block.
> >
> > Even if it won't work, we can still avoid that error (and rely on the
> > set-parameter failure):
> >
> > #ifdef O_DIRECT
> >        if (migrate_direct_io()) {
> >            // reference O_DIRECT
> >        }
> > #endif
> >
> > Then it should run the same, just to try making ifdefs as light as
> > possible..
> 
> Ok.
> 
> Just FYI, in v2 I'm adding direct-io to migration incoming side as well,
> so I put this logic into a helper:
> 
> static bool file_enable_direct_io(int *flags, Error **errp)
> {
>     if (migrate_direct_io()) {
> #ifdef O_DIRECT
>         *flags |= O_DIRECT;
> #else
>         error_setg(errp, "System does not support O_DIRECT");
>         error_append_hint(errp,
>                           "Try disabling direct-io migration capability\n");
>         return false;
> #endif
>     }
> 
>     return true;
> }
> 
> But I'll apply your suggestions nonetheless.

Thanks, please give it a shot, I hope it will work with either way.

One thing to mention is, if you want to play with the qemu_has_direct_io()
approach with no "#ifdefs", you can't keep qemu_has_direct_io() in osdep.c,
but you must define it in osdep.h as static inline functions.  Otherwise I
think osdep.o is forced to include it as a function so that trick won't work.

Just try compile without O_DIRECT should see.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 7/9] monitor: fdset: Match against O_DIRECT
  2024-05-03 18:53   ` Peter Xu
@ 2024-05-03 21:19     ` Fabiano Rosas
  2024-05-03 22:16       ` Peter Xu
  0 siblings, 1 reply; 57+ messages in thread
From: Fabiano Rosas @ 2024-05-03 21:19 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig

Peter Xu <peterx@redhat.com> writes:

> On Fri, Apr 26, 2024 at 11:20:40AM -0300, Fabiano Rosas wrote:
>> We're about to enable the use of O_DIRECT in the migration code and
>> due to the alignment restrictions imposed by filesystems we need to
>> make sure the flag is only used when doing aligned IO.
>> 
>> The migration will do parallel IO to different regions of a file, so
>> we need to use more than one file descriptor. Those cannot be obtained
>> by duplicating (dup()) since duplicated file descriptors share the
>> file status flags, including O_DIRECT. If one migration channel does
>> unaligned IO while another sets O_DIRECT to do aligned IO, the
>> filesystem would fail the unaligned operation.
>> 
>> The add-fd QMP command along with the fdset code are specifically
>> designed to allow the user to pass a set of file descriptors with
>> different access flags into QEMU to be later fetched by code that
>> needs to alternate between those flags when doing IO.
>> 
>> Extend the fdset matching to behave the same with the O_DIRECT flag.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  monitor/fds.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>> 
>> diff --git a/monitor/fds.c b/monitor/fds.c
>> index 4ec3b7eea9..62e324fcec 100644
>> --- a/monitor/fds.c
>> +++ b/monitor/fds.c
>> @@ -420,6 +420,11 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
>>          int fd = -1;
>>          int dup_fd;
>>          int mon_fd_flags;
>> +        int mask = O_ACCMODE;
>> +
>> +#ifdef O_DIRECT
>> +        mask |= O_DIRECT;
>> +#endif
>>  
>>          if (mon_fdset->id != fdset_id) {
>>              continue;
>> @@ -431,7 +436,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
>>                  return -1;
>>              }
>>  
>> -            if ((flags & O_ACCMODE) == (mon_fd_flags & O_ACCMODE)) {
>> +            if ((flags & mask) == (mon_fd_flags & mask)) {
>>                  fd = mon_fdset_fd->fd;
>>                  break;
>>              }
>
> I think I see what you wanted to do, picking out the right fd out of two
> when qemu_open_old(), which makes sense.
>
> However what happens if the mgmt app only passes in 1 fd to the fdset?  The
> issue is we have a "fallback dup()" plan right after this chunk of code:
>

I'm validating the fdset at file_parse_fdset() beforehand. If there's
anything else than 2 fds then we'll error out:

    if (nfds != 2) {
        error_setg(errp, "Outgoing migration needs two fds in the fdset, "
                   "got %d", nfds);
        qmp_remove_fd(*id, false, -1, NULL);
        *id = -1;
        return false;
    }

>         dup_fd = qemu_dup_flags(fd, flags);
>         if (dup_fd == -1) {
>             return -1;
>         }
>
>         mon_fdset_fd_dup = g_malloc0(sizeof(*mon_fdset_fd_dup));
>         mon_fdset_fd_dup->fd = dup_fd;
>         QLIST_INSERT_HEAD(&mon_fdset->dup_fds, mon_fdset_fd_dup, next);
>
> I think it means even if the mgmt app only passes in 1 fd (rather than 2,
> one with O_DIRECT, one without), QEMU can always successfully call
> qemu_open_old() twice for each case, even though silently the two FDs will
> actually impact on each other.  This doesn't look ideal if it's true.
>
> But I also must confess I don't really understand this code at all: we
> dup(), then we try F_SETFL on all the possible flags got passed in.
> However AFAICT due to the fact that dup()ed FDs will share "struct file" it
> means mostly all flags will be shared, except close-on-exec.  I don't ever
> see anything protecting that F_SETFL to only touch close-on-exec, I think
> it means it'll silently change file status flags for the other fd which we
> dup()ed from.  Does it mean that we have issue already with such dup() usage?

I think you're right, but I also think there's a requirement even from
this code that the fds in the fdset cannot be dup()ed. I don't see it
enforced anywhere, but maybe that's a consequence of the larger use-case
for which this feature was introduced.

For our scenario, the open() man page says one can use kcmp() to compare
the fds and determine if they are a result of dup(). Maybe we should do
that extra check? We're defining a pretty rigid interface between QEMU
and the management layer, so not likely to break once it's written. I'm
also not sure how bad would it be to call syscall() directly from QEMU
(kcmp has no libc wrapper).

>
> Thanks,


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 6/9] tests/qtest/migration: Add tests for file migration with direct-io
  2024-05-03 21:05     ` Fabiano Rosas
@ 2024-05-03 21:25       ` Peter Xu
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Xu @ 2024-05-03 21:25 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig,
	Thomas Huth, Laurent Vivier, Paolo Bonzini

On Fri, May 03, 2024 at 06:05:19PM -0300, Fabiano Rosas wrote:
> >> +#ifdef O_DIRECT
> >> +static void *migrate_mapped_ram_dio_start(QTestState *from,
> >> +                                                 QTestState *to)
> >> +{
> >> +    migrate_mapped_ram_start(from, to);
> >
> > This line seems redundant, migrate_multifd_mapped_ram_start() should cover that.
> >
> 
> This is an artifact of another patch that adds direct-io + mapped-ram
> without multifd. I'm bringing that back on v2. We were having a
> discussion[1] about it in the libvirt mailing list. Having direct-io
> even without multifd might still be useful for libvirt.
> 
> 1- https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/K4BDDJDMJ22XMJEFAUE323H5S5E47VQX/

Ah that's fine then.  Maybe add a comment somewhere for future readers?  Or
a sentence in the commit log would work too.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/9] migration: Fix file migration with fdset
  2024-05-03 21:04       ` Peter Xu
@ 2024-05-03 21:31         ` Fabiano Rosas
  2024-05-03 21:56           ` Peter Xu
  0 siblings, 1 reply; 57+ messages in thread
From: Fabiano Rosas @ 2024-05-03 21:31 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig

Peter Xu <peterx@redhat.com> writes:

> On Fri, May 03, 2024 at 04:56:08PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
>> >> When the migration using the "file:" URI was implemented, I don't
>> >> think any of us noticed that if you pass in a file name with the
>> >> format "/dev/fdset/N", this allows a file descriptor to be passed in
>> >> to QEMU and that behaves just like the "fd:" URI. So the "file:"
>> >> support has been added without regard for the fdset part and we got
>> >> some things wrong.
>> >> 
>> >> The first issue is that we should not truncate the migration file if
>> >> we're allowing an fd + offset. We need to leave the file contents
>> >> untouched.
>> >
>> > I'm wondering whether we can use fallocate() instead on the ranges so that
>> > we always don't open() with O_TRUNC.  Before that..  could you remind me
>> > why do we need to truncate in the first place?  I definitely missed
>> > something else here too.
>> 
>> AFAIK, just to avoid any issues if the file is pre-existing. I don't see
>> the difference between O_TRUNC and fallocate in this case.
>
> Then, shall we avoid truncations at all, leaving all the feasibility to
> user (also errors prone to make)?
>

Is this a big deal? I'd rather close that possible gap and avoid the bug
reports.

>> 
>> >
>> >> 
>> >> The second issue is that there's an expectation that QEMU removes the
>> >> fd after the migration has finished. That's what the "fd:" code
>> >> does. Otherwise a second migration on the same VM could attempt to
>> >> provide an fdset with the same name and QEMU would reject it.
>> >
>> > Let me check what we do when with "fd:" and when migration completes or
>> > cancels.
>> >
>> > IIUC it's qio_channel_file_close() that does the final cleanup work on
>> > e.g. to_dst_file, right?  Then there's qemu_close(), and it has:
>> >
>> >     /* Close fd that was dup'd from an fdset */
>> >     fdset_id = monitor_fdset_dup_fd_find(fd);
>> >     if (fdset_id != -1) {
>> >         int ret;
>> >
>> >         ret = close(fd);
>> >         if (ret == 0) {
>> >             monitor_fdset_dup_fd_remove(fd);
>> >         }
>> >
>> >         return ret;
>> >     }
>> >
>> > Shouldn't this done the work already?
>> 
>> That removes the mon_fdset_fd_dup->fd, we want to remove the
>> mon_fdset_fd->fd.
>
> What I read so far is when we are removing the dup-fds, we'll do one more
> thing:
>
> monitor_fdset_dup_fd_find_remove():
>                     if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
>                         monitor_fdset_cleanup(mon_fdset);
>                     }
>
> It means if we removed all the dup-fds correctly, we should also remove the
> whole fdset, which includes the ->fds, IIUC.
>

Since mon_fdset_fd->removed == false, we hit the runstate_is_running()
problem. I'm not sure, but probably mon_refcount > 0 as well. So the fd
would not be removed.

But I'll retest this on Monday just be sure, it's been a while since I
wrote some parts of this.

>> 
>> >
>> > Off topic: I think this code is over complicated too, maybe I missed
>> > something, but afaict we don't need monitor_fdset_dup_fd_find at all.. we
>> > simply walk the list and remove stuff..  I attach a patch at the end that I
>> > tried to clean that up, just in case there's early comments.  But we can
>> > ignore that so we don't get side-tracked, and focus on the direct-io
>> > issues.
>> 
>> Well, I'm not confident touching this code. This is more than a decade
>> old, I have no idea what the original motivations were. The possible
>> interactions with the user via command-line (-add-fd), QMP (add-fd) and
>> the monitor lifetime make me confused. Not to mention the fdset part
>> being plumbed into the guts of a widely used qemu_open_internal() that
>> very misleadingly presents itself as just a wrapper for open().
>
> If to make QEMU long live, we'll probably need to touch it at some
> point.. or at least discuss about it and figure things out. We pay tech
> debts like this when there's no good comment / docs to refer in this case,
> then the earlier, perhaps also the better.. to try taking the stab, imho.
>
> Definitely not a request to clean everything up. :) Let's see whether
> others can chim in with better knowledge of the history.
>
>> 
>> >
>> > Thanks,
>> >
>> > =======
>> >
>> > From 2f6b6d1224486d8ee830a7afe34738a07003b863 Mon Sep 17 00:00:00 2001
>> > From: Peter Xu <peterx@redhat.com>
>> > Date: Fri, 3 May 2024 11:27:20 -0400
>> > Subject: [PATCH] monitor: Drop monitor_fdset_dup_fd_add()
>> > MIME-Version: 1.0
>> > Content-Type: text/plain; charset=UTF-8
>> > Content-Transfer-Encoding: 8bit
>> >
>> > This function is not needed, one remove function should already work.
>> > Clean it up.
>> >
>> > Here the code doesn't really care about whether we need to keep that dupfd
>> > around if close() failed: when that happens something got very wrong,
>> > keeping the dup_fd around the fdsets may not help that situation so far.
>> >
>> > Cc: Dr. David Alan Gilbert <dave@treblig.org>
>> > Cc: Markus Armbruster <armbru@redhat.com>
>> > Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
>> > Cc: Paolo Bonzini <pbonzini@redhat.com>
>> > Cc: Daniel P. Berrangé <berrange@redhat.com>
>> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> > ---
>> >  include/monitor/monitor.h |  1 -
>> >  monitor/fds.c             | 27 +++++----------------------
>> >  stubs/fdset.c             |  5 -----
>> >  util/osdep.c              | 15 +--------------
>> >  4 files changed, 6 insertions(+), 42 deletions(-)
>> >
>> > diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
>> > index 965f5d5450..fd9b3f538c 100644
>> > --- a/include/monitor/monitor.h
>> > +++ b/include/monitor/monitor.h
>> > @@ -53,7 +53,6 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, int64_t fdset_id,
>> >                                  const char *opaque, Error **errp);
>> >  int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags);
>> >  void monitor_fdset_dup_fd_remove(int dup_fd);
>> > -int64_t monitor_fdset_dup_fd_find(int dup_fd);
>> >  
>> >  void monitor_register_hmp(const char *name, bool info,
>> >                            void (*cmd)(Monitor *mon, const QDict *qdict));
>> > diff --git a/monitor/fds.c b/monitor/fds.c
>> > index d86c2c674c..d5aecfb70e 100644
>> > --- a/monitor/fds.c
>> > +++ b/monitor/fds.c
>> > @@ -458,7 +458,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
>> >  #endif
>> >  }
>> >  
>> > -static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
>> > +void monitor_fdset_dup_fd_remove(int dup_fd)
>> >  {
>> >      MonFdset *mon_fdset;
>> >      MonFdsetFd *mon_fdset_fd_dup;
>> > @@ -467,31 +467,14 @@ static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
>> >      QLIST_FOREACH(mon_fdset, &mon_fdsets, next) {
>> >          QLIST_FOREACH(mon_fdset_fd_dup, &mon_fdset->dup_fds, next) {
>> >              if (mon_fdset_fd_dup->fd == dup_fd) {
>> > -                if (remove) {
>> > -                    QLIST_REMOVE(mon_fdset_fd_dup, next);
>> > -                    g_free(mon_fdset_fd_dup);
>> > -                    if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
>> > -                        monitor_fdset_cleanup(mon_fdset);
>> > -                    }
>> > -                    return -1;
>> > -                } else {
>> > -                    return mon_fdset->id;
>> > +                QLIST_REMOVE(mon_fdset_fd_dup, next);
>> > +                g_free(mon_fdset_fd_dup);
>> > +                if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
>> > +                    monitor_fdset_cleanup(mon_fdset);
>> >                  }
>> >              }
>> >          }
>> >      }
>> > -
>> > -    return -1;
>> > -}
>> > -
>> > -int64_t monitor_fdset_dup_fd_find(int dup_fd)
>> > -{
>> > -    return monitor_fdset_dup_fd_find_remove(dup_fd, false);
>> > -}
>> > -
>> > -void monitor_fdset_dup_fd_remove(int dup_fd)
>> > -{
>> > -    monitor_fdset_dup_fd_find_remove(dup_fd, true);
>> >  }
>> >  
>> >  int monitor_fd_param(Monitor *mon, const char *fdname, Error **errp)
>> > diff --git a/stubs/fdset.c b/stubs/fdset.c
>> > index d7c39a28ac..389e368a29 100644
>> > --- a/stubs/fdset.c
>> > +++ b/stubs/fdset.c
>> > @@ -9,11 +9,6 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
>> >      return -1;
>> >  }
>> >  
>> > -int64_t monitor_fdset_dup_fd_find(int dup_fd)
>> > -{
>> > -    return -1;
>> > -}
>> > -
>> >  void monitor_fdset_dup_fd_remove(int dupfd)
>> >  {
>> >  }
>> > diff --git a/util/osdep.c b/util/osdep.c
>> > index e996c4744a..2d9749d060 100644
>> > --- a/util/osdep.c
>> > +++ b/util/osdep.c
>> > @@ -393,21 +393,8 @@ int qemu_open_old(const char *name, int flags, ...)
>> >  
>> >  int qemu_close(int fd)
>> >  {
>> > -    int64_t fdset_id;
>> > -
>> >      /* Close fd that was dup'd from an fdset */
>> > -    fdset_id = monitor_fdset_dup_fd_find(fd);
>> > -    if (fdset_id != -1) {
>> > -        int ret;
>> > -
>> > -        ret = close(fd);
>> > -        if (ret == 0) {
>> > -            monitor_fdset_dup_fd_remove(fd);
>> > -        }
>> > -
>> > -        return ret;
>> > -    }
>> > -
>> > +    monitor_fdset_dup_fd_remove(fd);
>> >      return close(fd);
>> >  }
>> >  
>> > -- 
>> > 2.44.0
>> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/9] migration: Fix file migration with fdset
  2024-05-03 21:31         ` Fabiano Rosas
@ 2024-05-03 21:56           ` Peter Xu
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Xu @ 2024-05-03 21:56 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig

On Fri, May 03, 2024 at 06:31:06PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Fri, May 03, 2024 at 04:56:08PM -0300, Fabiano Rosas wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >> 
> >> > On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
> >> >> When the migration using the "file:" URI was implemented, I don't
> >> >> think any of us noticed that if you pass in a file name with the
> >> >> format "/dev/fdset/N", this allows a file descriptor to be passed in
> >> >> to QEMU and that behaves just like the "fd:" URI. So the "file:"
> >> >> support has been added without regard for the fdset part and we got
> >> >> some things wrong.
> >> >> 
> >> >> The first issue is that we should not truncate the migration file if
> >> >> we're allowing an fd + offset. We need to leave the file contents
> >> >> untouched.
> >> >
> >> > I'm wondering whether we can use fallocate() instead on the ranges so that
> >> > we always don't open() with O_TRUNC.  Before that..  could you remind me
> >> > why do we need to truncate in the first place?  I definitely missed
> >> > something else here too.
> >> 
> >> AFAIK, just to avoid any issues if the file is pre-existing. I don't see
> >> the difference between O_TRUNC and fallocate in this case.
> >
> > Then, shall we avoid truncations at all, leaving all the feasibility to
> > user (also errors prone to make)?
> >
> 
> Is this a big deal? I'd rather close that possible gap and avoid the bug
> reports.

No possible of such report if the user uses Libvirt or even more virt
stacks, am I right?  While this is only for whoever uses QEMU directly, and
only if the one forgot to remove a leftover image file?

I'd not worry about those people who use QEMU directly - they aren't the
people we need to care too much about, imho (and I'm definitely one of
them..).  The problem is I feel it an overkill introducing a migration
global var just for this purpose.

No strong opinions, if you feel strongly like so I'm ok with it.  But if
one day if we want to remove FileOutgoingArgs I'll also leave that to you
as a trade-off. :-)

> 
> >> 
> >> >
> >> >> 
> >> >> The second issue is that there's an expectation that QEMU removes the
> >> >> fd after the migration has finished. That's what the "fd:" code
> >> >> does. Otherwise a second migration on the same VM could attempt to
> >> >> provide an fdset with the same name and QEMU would reject it.
> >> >
> >> > Let me check what we do when with "fd:" and when migration completes or
> >> > cancels.
> >> >
> >> > IIUC it's qio_channel_file_close() that does the final cleanup work on
> >> > e.g. to_dst_file, right?  Then there's qemu_close(), and it has:
> >> >
> >> >     /* Close fd that was dup'd from an fdset */
> >> >     fdset_id = monitor_fdset_dup_fd_find(fd);
> >> >     if (fdset_id != -1) {
> >> >         int ret;
> >> >
> >> >         ret = close(fd);
> >> >         if (ret == 0) {
> >> >             monitor_fdset_dup_fd_remove(fd);
> >> >         }
> >> >
> >> >         return ret;
> >> >     }
> >> >
> >> > Shouldn't this done the work already?
> >> 
> >> That removes the mon_fdset_fd_dup->fd, we want to remove the
> >> mon_fdset_fd->fd.
> >
> > What I read so far is when we are removing the dup-fds, we'll do one more
> > thing:
> >
> > monitor_fdset_dup_fd_find_remove():
> >                     if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
> >                         monitor_fdset_cleanup(mon_fdset);
> >                     }
> >
> > It means if we removed all the dup-fds correctly, we should also remove the
> > whole fdset, which includes the ->fds, IIUC.
> >
> 
> Since mon_fdset_fd->removed == false, we hit the runstate_is_running()
> problem. I'm not sure, but probably mon_refcount > 0 as well. So the fd
> would not be removed.
> 
> But I'll retest this on Monday just be sure, it's been a while since I
> wrote some parts of this.

Thanks.  And I hope we can also get some more clues too when you dig out
more out of the whole add-fd API; I hope we don't pile up more complicated
logics on top of a mistery.  I feel like this is the time we figure things
out.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 7/9] monitor: fdset: Match against O_DIRECT
  2024-05-03 21:19     ` Fabiano Rosas
@ 2024-05-03 22:16       ` Peter Xu
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Xu @ 2024-05-03 22:16 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig

On Fri, May 03, 2024 at 06:19:30PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Fri, Apr 26, 2024 at 11:20:40AM -0300, Fabiano Rosas wrote:
> >> We're about to enable the use of O_DIRECT in the migration code and
> >> due to the alignment restrictions imposed by filesystems we need to
> >> make sure the flag is only used when doing aligned IO.
> >> 
> >> The migration will do parallel IO to different regions of a file, so
> >> we need to use more than one file descriptor. Those cannot be obtained
> >> by duplicating (dup()) since duplicated file descriptors share the
> >> file status flags, including O_DIRECT. If one migration channel does
> >> unaligned IO while another sets O_DIRECT to do aligned IO, the
> >> filesystem would fail the unaligned operation.
> >> 
> >> The add-fd QMP command along with the fdset code are specifically
> >> designed to allow the user to pass a set of file descriptors with
> >> different access flags into QEMU to be later fetched by code that
> >> needs to alternate between those flags when doing IO.
> >> 
> >> Extend the fdset matching to behave the same with the O_DIRECT flag.
> >> 
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> ---
> >>  monitor/fds.c | 7 ++++++-
> >>  1 file changed, 6 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/monitor/fds.c b/monitor/fds.c
> >> index 4ec3b7eea9..62e324fcec 100644
> >> --- a/monitor/fds.c
> >> +++ b/monitor/fds.c
> >> @@ -420,6 +420,11 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
> >>          int fd = -1;
> >>          int dup_fd;
> >>          int mon_fd_flags;
> >> +        int mask = O_ACCMODE;
> >> +
> >> +#ifdef O_DIRECT
> >> +        mask |= O_DIRECT;
> >> +#endif
> >>  
> >>          if (mon_fdset->id != fdset_id) {
> >>              continue;
> >> @@ -431,7 +436,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
> >>                  return -1;
> >>              }
> >>  
> >> -            if ((flags & O_ACCMODE) == (mon_fd_flags & O_ACCMODE)) {
> >> +            if ((flags & mask) == (mon_fd_flags & mask)) {
> >>                  fd = mon_fdset_fd->fd;
> >>                  break;
> >>              }
> >
> > I think I see what you wanted to do, picking out the right fd out of two
> > when qemu_open_old(), which makes sense.
> >
> > However what happens if the mgmt app only passes in 1 fd to the fdset?  The
> > issue is we have a "fallback dup()" plan right after this chunk of code:
> >
> 
> I'm validating the fdset at file_parse_fdset() beforehand. If there's
> anything else than 2 fds then we'll error out:
> 
>     if (nfds != 2) {
>         error_setg(errp, "Outgoing migration needs two fds in the fdset, "
>                    "got %d", nfds);
>         qmp_remove_fd(*id, false, -1, NULL);
>         *id = -1;
>         return false;
>     }
> 
> >         dup_fd = qemu_dup_flags(fd, flags);
> >         if (dup_fd == -1) {
> >             return -1;
> >         }
> >
> >         mon_fdset_fd_dup = g_malloc0(sizeof(*mon_fdset_fd_dup));
> >         mon_fdset_fd_dup->fd = dup_fd;
> >         QLIST_INSERT_HEAD(&mon_fdset->dup_fds, mon_fdset_fd_dup, next);
> >
> > I think it means even if the mgmt app only passes in 1 fd (rather than 2,
> > one with O_DIRECT, one without), QEMU can always successfully call
> > qemu_open_old() twice for each case, even though silently the two FDs will
> > actually impact on each other.  This doesn't look ideal if it's true.
> >
> > But I also must confess I don't really understand this code at all: we
> > dup(), then we try F_SETFL on all the possible flags got passed in.
> > However AFAICT due to the fact that dup()ed FDs will share "struct file" it
> > means mostly all flags will be shared, except close-on-exec.  I don't ever
> > see anything protecting that F_SETFL to only touch close-on-exec, I think
> > it means it'll silently change file status flags for the other fd which we
> > dup()ed from.  Does it mean that we have issue already with such dup() usage?
> 
> I think you're right, but I also think there's a requirement even from
> this code that the fds in the fdset cannot be dup()ed. I don't see it
> enforced anywhere, but maybe that's a consequence of the larger use-case
> for which this feature was introduced.

I think that's the thing we need to figure out for add-fd usages.  The bad
thing is there're too many qemu_open_internal() users... so we can't easily
tell what we're looking for. May need some time reading the code or the
history.. pretty sad.  I hope someone can chim in.

> 
> For our scenario, the open() man page says one can use kcmp() to compare
> the fds and determine if they are a result of dup(). Maybe we should do
> that extra check? We're defining a pretty rigid interface between QEMU
> and the management layer, so not likely to break once it's written. I'm
> also not sure how bad would it be to call syscall() directly from QEMU
> (kcmp has no libc wrapper).

That should be all fine, see:

$ git grep " syscall(" | wc -l
28

And if we want we can also do fcntl(F_GETFL) on both fds later, making sure
they have proper flags (one must have O_DIRECT, one must not).

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/9] monitor: Honor QMP request for fd removal immediately
  2024-04-26 14:20 ` [PATCH 1/9] monitor: Honor QMP request for fd removal immediately Fabiano Rosas
  2024-05-03 16:02   ` Peter Xu
@ 2024-05-08  7:17   ` Daniel P. Berrangé
  2024-05-16 22:00     ` Fabiano Rosas
  1 sibling, 1 reply; 57+ messages in thread
From: Daniel P. Berrangé @ 2024-05-08  7:17 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, armbru, Peter Xu, Claudio Fontana, Jim Fehlig

On Fri, Apr 26, 2024 at 11:20:34AM -0300, Fabiano Rosas wrote:
> We're enabling using the fdset interface to pass file descriptors for
> use in the migration code. Since migrations can happen more than once
> during the VMs lifetime, we need a way to remove an fd from the fdset
> at the end of migration.
> 
> The current code only removes an fd from the fdset if the VM is
> running. This causes a QMP call to "remove-fd" to not actually remove
> the fd if the VM happens to be stopped.
> 
> While the fd would eventually be removed when monitor_fdset_cleanup()
> is called again, the user request should be honored and the fd
> actually removed. Calling remove-fd + query-fdset shows a recently
> removed fd still present.
> 
> The runstate_is_running() check was introduced by commit ebe52b592d
> ("monitor: Prevent removing fd from set during init"), which by the
> shortlog indicates that they were trying to avoid removing an
> yet-unduplicated fd too early.

IMHO that should be reverted. The justification says

  "If an fd is added to an fd set via the command line, and it is not
   referenced by another command line option (ie. -drive), then clean
   it up after QEMU initialization is complete"

which I think is pretty weak. Why should QEMU forceably stop an app
from passing in an FD to be used by a QMP command issued just after
the VM starts running ?  While it could just use QMP to pass in the
FD set, the mgmt app might have its own reason for wanting QEMU to
own the passed FD from the very start of the process execve().

Implicitly this cleanup is attempting to "fix" a bug where the mgmt
app passes in an FD that it never needed. If any such bug were ever
found, then the mgmt app should just be fixed to not pass it in. I
don't think QEMU needs to be trying to fix mgmt app bugs.

IOW, this commit is imposing an arbitrary & unecessary usage policy
on passed in FD sets, and as your commit explains has further
unhelpful (& undocumented) side effects on the 'remove-fd' QMP command.

Just revert it IMHO.

> 
> I don't see why an fd explicitly removed with qmp_remove_fd() should
> be under runstate_is_running(). I'm assuming this was a mistake when
> adding the parenthesis around the expression.
> 
> Move the runstate_is_running() check to apply only to the
> QLIST_EMPTY(dup_fds) side of the expression and ignore it when
> mon_fdset_fd->removed has been explicitly set.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  monitor/fds.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/monitor/fds.c b/monitor/fds.c
> index d86c2c674c..4ec3b7eea9 100644
> --- a/monitor/fds.c
> +++ b/monitor/fds.c
> @@ -173,9 +173,9 @@ static void monitor_fdset_cleanup(MonFdset *mon_fdset)
>      MonFdsetFd *mon_fdset_fd_next;
>  
>      QLIST_FOREACH_SAFE(mon_fdset_fd, &mon_fdset->fds, next, mon_fdset_fd_next) {
> -        if ((mon_fdset_fd->removed ||
> -                (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0)) &&
> -                runstate_is_running()) {
> +        if (mon_fdset_fd->removed ||
> +            (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0 &&
> +             runstate_is_running())) {
>              close(mon_fdset_fd->fd);
>              g_free(mon_fdset_fd->opaque);
>              QLIST_REMOVE(mon_fdset_fd, next);
> -- 
> 2.35.3
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/9] migration: Fix file migration with fdset
  2024-04-26 14:20 ` [PATCH 2/9] migration: Fix file migration with fdset Fabiano Rosas
  2024-05-03 16:23   ` Peter Xu
@ 2024-05-08  8:00   ` Daniel P. Berrangé
  2024-05-08 20:45     ` Fabiano Rosas
  1 sibling, 1 reply; 57+ messages in thread
From: Daniel P. Berrangé @ 2024-05-08  8:00 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, armbru, Peter Xu, Claudio Fontana, Jim Fehlig

On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
> When the migration using the "file:" URI was implemented, I don't
> think any of us noticed that if you pass in a file name with the
> format "/dev/fdset/N", this allows a file descriptor to be passed in
> to QEMU and that behaves just like the "fd:" URI. So the "file:"
> support has been added without regard for the fdset part and we got
> some things wrong.
> 
> The first issue is that we should not truncate the migration file if
> we're allowing an fd + offset. We need to leave the file contents
> untouched.
> 
> The second issue is that there's an expectation that QEMU removes the
> fd after the migration has finished. That's what the "fd:" code
> does. Otherwise a second migration on the same VM could attempt to
> provide an fdset with the same name and QEMU would reject it.
> 
> We can fix the first issue by detecting when we're using the fdset
> vs. the plain file name. This requires storing the fdset_id
> somewhere. We can then use this stored fdset_id to do cleanup at the
> end and also fix the second issue.

The use of /dev/fdset is supposed to be transparent to code in
QEMU, so modifying migration to learn about FD sets to do manual
cleanup is breaking that API facade.

IMHO the transparency of the design points towards the mgmt app
calling 'remove-fd' set after migration has started, in order
that a later migraiton can use the same fdset name.

Ideally the truncation issue needs to be transparent too.

Rather than detecting use of fdset, we can not use O_TRUNC
at all. Instead we can call ftruncate(fd, offset), which
should work in both normal and fdset scenarios.

> 
> Fixes: 385f510df5 ("migration: file URI offset")
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/file.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 46 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/file.c b/migration/file.c
> index ab18ba505a..8f30999400 100644
> --- a/migration/file.c
> +++ b/migration/file.c
> @@ -10,6 +10,7 @@
>  #include "qemu/cutils.h"
>  #include "qemu/error-report.h"
>  #include "qapi/error.h"
> +#include "qapi/qapi-commands-misc.h"
>  #include "channel.h"
>  #include "file.h"
>  #include "migration.h"
> @@ -23,6 +24,7 @@
>  
>  static struct FileOutgoingArgs {
>      char *fname;
> +    int64_t fdset_id;
>  } outgoing_args;
>  
>  /* Remove the offset option from @filespec and return it in @offsetp. */
> @@ -44,10 +46,39 @@ int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp)
>      return 0;
>  }
>  
> +static void file_remove_fdset(void)
> +{
> +    if (outgoing_args.fdset_id != -1) {
> +        qmp_remove_fd(outgoing_args.fdset_id, false, -1, NULL);
> +        outgoing_args.fdset_id = -1;
> +    }
> +}
> +
> +static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
> +                             Error **errp)
> +{
> +    const char *fdset_id_str;
> +
> +    *fdset_id = -1;
> +
> +    if (!strstart(filename, "/dev/fdset/", &fdset_id_str)) {
> +        return true;
> +    }
> +
> +    *fdset_id = qemu_parse_fd(fdset_id_str);
> +    if (*fdset_id == -1) {
> +        error_setg_errno(errp, EINVAL, "Could not parse fdset %s", fdset_id_str);
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
>  void file_cleanup_outgoing_migration(void)
>  {
>      g_free(outgoing_args.fname);
>      outgoing_args.fname = NULL;
> +    file_remove_fdset();
>  }
>  
>  bool file_send_channel_create(gpointer opaque, Error **errp)
> @@ -81,11 +112,24 @@ void file_start_outgoing_migration(MigrationState *s,
>      g_autofree char *filename = g_strdup(file_args->filename);
>      uint64_t offset = file_args->offset;
>      QIOChannel *ioc;
> +    int flags = O_CREAT | O_WRONLY;
>  
>      trace_migration_file_outgoing(filename);
>  
> -    fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY | O_TRUNC,
> -                                     0600, errp);
> +    if (!file_parse_fdset(filename, &outgoing_args.fdset_id, errp)) {
> +        return;
> +    }
> +
> +    /*
> +     * Only truncate if it's QEMU opening the file. If an fd has been
> +     * passed in the file will already contain data written by the
> +     * management layer.
> +     */
> +    if (outgoing_args.fdset_id == -1) {
> +        flags |= O_TRUNC;
> +    }
> +
> +    fioc = qio_channel_file_new_path(filename, flags, 0600, errp);
>      if (!fioc) {
>          return;
>      }
> -- 
> 2.35.3
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/9] migration: Fix file migration with fdset
  2024-05-03 16:23   ` Peter Xu
  2024-05-03 19:56     ` Fabiano Rosas
@ 2024-05-08  8:02     ` Daniel P. Berrangé
  2024-05-08 12:49       ` Peter Xu
  1 sibling, 1 reply; 57+ messages in thread
From: Daniel P. Berrangé @ 2024-05-08  8:02 UTC (permalink / raw)
  To: Peter Xu; +Cc: Fabiano Rosas, qemu-devel, armbru, Claudio Fontana, Jim Fehlig

On Fri, May 03, 2024 at 12:23:51PM -0400, Peter Xu wrote:
> On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
> > When the migration using the "file:" URI was implemented, I don't
> > think any of us noticed that if you pass in a file name with the
> > format "/dev/fdset/N", this allows a file descriptor to be passed in
> > to QEMU and that behaves just like the "fd:" URI. So the "file:"
> > support has been added without regard for the fdset part and we got
> > some things wrong.
> > 
> > The first issue is that we should not truncate the migration file if
> > we're allowing an fd + offset. We need to leave the file contents
> > untouched.
> 
> I'm wondering whether we can use fallocate() instead on the ranges so that
> we always don't open() with O_TRUNC.  Before that..  could you remind me
> why do we need to truncate in the first place?  I definitely missed
> something else here too.

You're mixing distinct concepts here. fallocate makes a file region
non-sparse, while O_TRUNC removes all existing allocation, making it
sparse if we write at non-contiguous offsets. I don't think we would
want to call fallocate, since we /want/ a sparse file so that we
don't needlessly store large regions of all-zeros as RAM maps.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 3/9] tests/qtest/migration: Fix file migration offset check
  2024-05-03 20:36     ` Fabiano Rosas
  2024-05-03 21:08       ` Peter Xu
@ 2024-05-08  8:10       ` Daniel P. Berrangé
  1 sibling, 0 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2024-05-08  8:10 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: Peter Xu, qemu-devel, armbru, Claudio Fontana, Jim Fehlig,
	Thomas Huth, Laurent Vivier, Paolo Bonzini

On Fri, May 03, 2024 at 05:36:59PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Fri, Apr 26, 2024 at 11:20:36AM -0300, Fabiano Rosas wrote:
> >> When doing file migration, QEMU accepts an offset that should be
> >> skipped when writing the migration stream to the file. The purpose of
> >> the offset is to allow the management layer to put its own metadata at
> >> the start of the file.
> >> 
> >> We have tests for this in migration-test, but only testing that the
> >> migration stream starts at the correct offset and not that it actually
> >> leaves the data intact. Unsurprisingly, there's been a bug in that
> >> area that the tests didn't catch.
> >> 
> >> Fix the tests to write some data to the offset region and check that
> >> it's actually there after the migration.
> >> 
> >> Fixes: 3dc35470c8 ("tests/qtest: migration-test: Add tests for file-based migration")
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> ---
> >>  tests/qtest/migration-test.c | 70 +++++++++++++++++++++++++++++++++---
> >>  1 file changed, 65 insertions(+), 5 deletions(-)
> >> 
> >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> >> index 5d6d8cd634..7b177686b4 100644
> >> --- a/tests/qtest/migration-test.c
> >> +++ b/tests/qtest/migration-test.c
> >> @@ -2081,6 +2081,63 @@ static void test_precopy_file(void)
> >>      test_file_common(&args, true);
> >>  }
> >>  
> >> +#ifndef _WIN32
> >> +static void file_dirty_offset_region(void)
> >> +{
> >> +#if defined(__linux__)
> >
> > Hmm, what's the case to cover when !_WIN32 && __linux__?  Can we remove one
> > layer of ifdef?
> >
> > I'm also wondering why it can't work on win32?  I thought win32 has all
> > these stuff we used here, but I may miss something.
> >
> 
> __linux__ is because of mmap, !_WIN32 is because of the passing of
> fds. We might be able to keep !_WIN32 only, I'll check.
> 
> >> +    g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, FILE_TEST_FILENAME);
> >> +    size_t size = FILE_TEST_OFFSET;
> >> +    uintptr_t *addr, *p;
> >> +    int fd;
> >> +
> >> +    fd = open(path, O_CREAT | O_RDWR, 0660);
> >> +    g_assert(fd != -1);
> >> +
> >> +    g_assert(!ftruncate(fd, size));
> >> +
> >> +    addr = mmap(NULL, size, PROT_WRITE, MAP_SHARED, fd, 0);
> >> +    g_assert(addr != MAP_FAILED);
> >> +
> >> +    /* ensure the skipped offset contains some data */
> >> +    p = addr;
> >> +    while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
> >> +        *p = (unsigned long) FILE_TEST_FILENAME;
> >
> > This is fine, but not as clear what is assigned..  I think here we assigned
> > is the pointer pointing to the binary's RO section (rather than the chars).
> 
> Haha you're right, I was assigning the FILE_TEST_OFFSET previously and
> just switched to the FILENAME without thinking. I'll fix it up.
> 
> > Maybe using some random numbers would be more straightforward, but no
> > strong opinions.
> >
> >> +        p++;
> >> +    }
> >> +
> >> +    munmap(addr, size);
> >> +    fsync(fd);
> >> +    close(fd);
> >> +#endif
> >> +}


Use of mmap and this loop looks like overkill to me, when we can do
it in a fully portable manner with:

   g_autofree char *data = g_new0(char *, offset);
   memset(data, 0x44, offset);
   g_file_set_contents(path, data, offset, NULL);

and I checked that g_file_set_contents' impl also takes care of fsync.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/9] migration: Add direct-io parameter
  2024-04-26 14:20 ` [PATCH 4/9] migration: Add direct-io parameter Fabiano Rosas
  2024-04-26 14:33   ` Markus Armbruster
  2024-05-03 18:05   ` Peter Xu
@ 2024-05-08  8:25   ` Daniel P. Berrangé
  2 siblings, 0 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2024-05-08  8:25 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, armbru, Peter Xu, Claudio Fontana, Jim Fehlig, Eric Blake

On Fri, Apr 26, 2024 at 11:20:37AM -0300, Fabiano Rosas wrote:
> Add the direct-io migration parameter that tells the migration code to
> use O_DIRECT when opening the migration stream file whenever possible.
> 
> This is currently only used with the mapped-ram migration that has a
> clear window guaranteed to perform aligned writes.
> 
> Acked-by: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  include/qemu/osdep.h           |  2 ++
>  migration/migration-hmp-cmds.c | 11 +++++++++++
>  migration/options.c            | 30 ++++++++++++++++++++++++++++++
>  migration/options.h            |  1 +
>  qapi/migration.json            | 18 +++++++++++++++---
>  util/osdep.c                   |  9 +++++++++
>  6 files changed, 68 insertions(+), 3 deletions(-)
> 
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index c7053cdc2b..645c14a65d 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -612,6 +612,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive);
>  bool qemu_has_ofd_lock(void);
>  #endif
>  
> +bool qemu_has_direct_io(void);
> +
>  #if defined(__HAIKU__) && defined(__i386__)
>  #define FMT_pid "%ld"
>  #elif defined(WIN64)
> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> index 7e96ae6ffd..8496a2b34e 100644
> --- a/migration/migration-hmp-cmds.c
> +++ b/migration/migration-hmp-cmds.c
> @@ -397,6 +397,13 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
>          monitor_printf(mon, "%s: %s\n",
>              MigrationParameter_str(MIGRATION_PARAMETER_MODE),
>              qapi_enum_lookup(&MigMode_lookup, params->mode));
> +
> +        if (params->has_direct_io) {
> +            monitor_printf(mon, "%s: %s\n",
> +                           MigrationParameter_str(
> +                               MIGRATION_PARAMETER_DIRECT_IO),
> +                           params->direct_io ? "on" : "off");
> +        }
>      }
>  
>      qapi_free_MigrationParameters(params);
> @@ -690,6 +697,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
>          p->has_mode = true;
>          visit_type_MigMode(v, param, &p->mode, &err);
>          break;
> +    case MIGRATION_PARAMETER_DIRECT_IO:
> +        p->has_direct_io = true;
> +        visit_type_bool(v, param, &p->direct_io, &err);
> +        break;
>      default:
>          assert(0);
>      }
> diff --git a/migration/options.c b/migration/options.c
> index 239f5ecfb4..ae464aa4f2 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -826,6 +826,22 @@ int migrate_decompress_threads(void)
>      return s->parameters.decompress_threads;
>  }
>  
> +bool migrate_direct_io(void)
> +{
> +    MigrationState *s = migrate_get_current();
> +
> +    /* For now O_DIRECT is only supported with mapped-ram */
> +    if (!s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM]) {
> +        return false;
> +    }
> +
> +    if (s->parameters.has_direct_io) {
> +        return s->parameters.direct_io;
> +    }
> +
> +    return false;
> +}
> +
>  uint64_t migrate_downtime_limit(void)
>  {
>      MigrationState *s = migrate_get_current();
> @@ -1061,6 +1077,11 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
>      params->has_zero_page_detection = true;
>      params->zero_page_detection = s->parameters.zero_page_detection;
>  
> +    if (s->parameters.has_direct_io) {
> +        params->has_direct_io = true;
> +        params->direct_io = s->parameters.direct_io;
> +    }
> +
>      return params;
>  }
>  
> @@ -1097,6 +1118,7 @@ void migrate_params_init(MigrationParameters *params)
>      params->has_vcpu_dirty_limit = true;
>      params->has_mode = true;
>      params->has_zero_page_detection = true;
> +    params->has_direct_io = qemu_has_direct_io();
>  }
>  
>  /*
> @@ -1416,6 +1438,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
>      if (params->has_zero_page_detection) {
>          dest->zero_page_detection = params->zero_page_detection;
>      }
> +
> +    if (params->has_direct_io) {
> +        dest->direct_io = params->direct_io;
> +    }
>  }
>  
>  static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
> @@ -1570,6 +1596,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
>      if (params->has_zero_page_detection) {
>          s->parameters.zero_page_detection = params->zero_page_detection;
>      }
> +
> +    if (params->has_direct_io) {
> +        s->parameters.direct_io = params->direct_io;
> +    }
>  }

I would expect to see something added to migrat_params_check() that
calls qemu_has_direct_io() and reports an error if the platform
lacks O_DIRECT, so mgmt apps see when they're trying to use O_DIRECT
on a bad platform straightaway, rather than only when migration
starts later.

Alternatively, and perhaps better would be for use to have a meson.build
check for O_DIRECT, and then make all the QAPI features have a condition
on CONFIG_O_DIRECT, so QAPI rejects any use of 'direct-io' feature at
input time.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 5/9] migration/multifd: Add direct-io support
  2024-04-26 14:20 ` [PATCH 5/9] migration/multifd: Add direct-io support Fabiano Rosas
  2024-05-03 18:29   ` Peter Xu
@ 2024-05-08  8:27   ` Daniel P. Berrangé
  1 sibling, 0 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2024-05-08  8:27 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, armbru, Peter Xu, Claudio Fontana, Jim Fehlig

On Fri, Apr 26, 2024 at 11:20:38AM -0300, Fabiano Rosas wrote:
> When multifd is used along with mapped-ram, we can take benefit of a
> filesystem that supports the O_DIRECT flag and perform direct I/O in
> the multifd threads. This brings a significant performance improvement
> because direct-io writes bypass the page cache which would otherwise
> be thrashed by the multifd data which is unlikely to be needed again
> in a short period of time.
> 
> To be able to use a multifd channel opened with O_DIRECT, we must
> ensure that a certain aligment is used. Filesystems usually require a
> block-size alignment for direct I/O. The way to achieve this is by
> enabling the mapped-ram feature, which already aligns its I/O properly
> (see MAPPED_RAM_FILE_OFFSET_ALIGNMENT at ram.c).
> 
> By setting O_DIRECT on the multifd channels, all writes to the same
> file descriptor need to be aligned as well, even the ones that come
> from outside multifd, such as the QEMUFile I/O from the main migration
> code. This makes it impossible to use the same file descriptor for the
> QEMUFile and for the multifd channels. The various flags and metadata
> written by the main migration code will always be unaligned by virtue
> of their small size. To workaround this issue, we'll require a second
> file descriptor to be used exclusively for direct I/O.
> 
> The second file descriptor can be obtained by QEMU by re-opening the
> migration file (already possible), or by being provided by the user or
> management application (support to be added in future patches).
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/file.c      | 22 +++++++++++++++++++---
>  migration/migration.c | 23 +++++++++++++++++++++++
>  2 files changed, 42 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/file.c b/migration/file.c
> index 8f30999400..b9265b14dd 100644
> --- a/migration/file.c
> +++ b/migration/file.c
> @@ -83,17 +83,33 @@ void file_cleanup_outgoing_migration(void)
>  
>  bool file_send_channel_create(gpointer opaque, Error **errp)
>  {
> -    QIOChannelFile *ioc;
> +    QIOChannelFile *ioc = NULL;
>      int flags = O_WRONLY;
> -    bool ret = true;
> +    bool ret = false;
> +
> +    if (migrate_direct_io()) {
> +#ifdef O_DIRECT
> +        /*
> +         * Enable O_DIRECT for the secondary channels. These are used
> +         * for sending ram pages and writes should be guaranteed to be
> +         * aligned to at least page size.
> +         */
> +        flags |= O_DIRECT;
> +#else
> +        error_setg(errp, "System does not support O_DIRECT");
> +        error_append_hint(errp,
> +                          "Try disabling direct-io migration capability\n");
> +        goto out;
> +#endif

If we conditionalize existance of 'direct-io' feature in the QAPI
schema, then the '#else' clause no longer needs to even exist as
it will be unreachable - it could be a g_assert_not_reached();

> +    }
>  
>      ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
>      if (!ioc) {
> -        ret = false;
>          goto out;
>      }
>  
>      multifd_channel_connect(opaque, QIO_CHANNEL(ioc));
> +    ret = true;
>  
>  out:
>      /*
> diff --git a/migration/migration.c b/migration/migration.c
> index b5af6b5105..cb923a3f62 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -155,6 +155,16 @@ static bool migration_needs_seekable_channel(void)
>      return migrate_mapped_ram();
>  }
>  
> +static bool migration_needs_multiple_fds(void)
> +{
> +    /*
> +     * When doing direct-io, multifd requires two different,
> +     * non-duplicated file descriptors so we can use one of them for
> +     * unaligned IO.
> +     */
> +    return migrate_multifd() && migrate_direct_io();
> +}
> +
>  static bool transport_supports_seeking(MigrationAddress *addr)
>  {
>      if (addr->transport == MIGRATION_ADDRESS_TYPE_FILE) {
> @@ -164,6 +174,12 @@ static bool transport_supports_seeking(MigrationAddress *addr)
>      return false;
>  }
>  
> +static bool transport_supports_multiple_fds(MigrationAddress *addr)
> +{
> +    /* file: works because QEMU can open it multiple times */
> +    return addr->transport == MIGRATION_ADDRESS_TYPE_FILE;
> +}
> +
>  static bool
>  migration_channels_and_transport_compatible(MigrationAddress *addr,
>                                              Error **errp)
> @@ -180,6 +196,13 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
>          return false;
>      }
>  
> +    if (migration_needs_multiple_fds() &&
> +        !transport_supports_multiple_fds(addr)) {
> +        error_setg(errp,
> +                   "Migration requires a transport that allows for multiple fds (e.g. file)");
> +        return false;
> +    }
> +
>      return true;
>  }
>  
> -- 
> 2.35.3
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 6/9] tests/qtest/migration: Add tests for file migration with direct-io
  2024-04-26 14:20 ` [PATCH 6/9] tests/qtest/migration: Add tests for file migration with direct-io Fabiano Rosas
  2024-05-03 18:38   ` Peter Xu
@ 2024-05-08  8:34   ` Daniel P. Berrangé
  1 sibling, 0 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2024-05-08  8:34 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, armbru, Peter Xu, Claudio Fontana, Jim Fehlig,
	Thomas Huth, Laurent Vivier, Paolo Bonzini

On Fri, Apr 26, 2024 at 11:20:39AM -0300, Fabiano Rosas wrote:
> The tests are only allowed to run in systems that know about the
> O_DIRECT flag and in filesystems which support it.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  tests/qtest/migration-helpers.c | 42 +++++++++++++++++++++++++++++++++
>  tests/qtest/migration-helpers.h |  1 +
>  tests/qtest/migration-test.c    | 42 +++++++++++++++++++++++++++++++++
>  3 files changed, 85 insertions(+)
> 
> diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
> index ce6d6615b5..356cd4fa8c 100644
> --- a/tests/qtest/migration-helpers.c
> +++ b/tests/qtest/migration-helpers.c
> @@ -473,3 +473,45 @@ void migration_test_add(const char *path, void (*fn)(void))
>      qtest_add_data_func_full(path, test, migration_test_wrapper,
>                               migration_test_destroy);
>  }
> +
> +#ifdef O_DIRECT
> +/*
> + * Probe for O_DIRECT support on the filesystem. Since this is used
> + * for tests, be conservative, if anything fails, assume it's
> + * unsupported.
> + */
> +bool probe_o_direct_support(const char *tmpfs)
> +{
> +    g_autofree char *filename = g_strdup_printf("%s/probe-o-direct", tmpfs);
> +    int fd, flags = O_CREAT | O_RDWR | O_TRUNC | O_DIRECT;
> +    void *buf;
> +    ssize_t ret, len;
> +    uint64_t offset;
> +
> +    fd = open(filename, flags, 0660);
> +    if (fd < 0) {
> +        unlink(filename);
> +        return false;
> +    }
> +
> +    /*
> +     * Assuming 4k should be enough to satisfy O_DIRECT alignment
> +     * requirements. The migration code uses 1M to be conservative.
> +     */
> +    len = 0x100000;
> +    offset = 0x100000;

4k is unlikely insufficient for architectures with a 64k small
page size, and filesystem constraints also play a part. Suggest
rewording to

  /*
   * Using 1MB alignment as conservative choice to satisfy
   * any plausible architecture default page size, and/or
   * filesystem alignment restrictions.
   */

> +
> +    buf = aligned_alloc(len, len);
> +    g_assert(buf);
> +
> +    ret = pwrite(fd, buf, len, offset);
> +    unlink(filename);
> +    g_free(buf);
> +
> +    if (ret < 0) {
> +        return false;
> +    }
> +
> +    return true;
> +}
> +#endif
> diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
> index 1339835698..d827e16145 100644
> --- a/tests/qtest/migration-helpers.h
> +++ b/tests/qtest/migration-helpers.h
> @@ -54,5 +54,6 @@ char *find_common_machine_version(const char *mtype, const char *var1,
>                                    const char *var2);
>  char *resolve_machine_version(const char *alias, const char *var1,
>                                const char *var2);
> +bool probe_o_direct_support(const char *tmpfs);
>  void migration_test_add(const char *path, void (*fn)(void));
>  #endif /* MIGRATION_HELPERS_H */
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 7b177686b4..512b7ede8b 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -2295,6 +2295,43 @@ static void test_multifd_file_mapped_ram(void)
>      test_file_common(&args, true);
>  }
>  
> +#ifdef O_DIRECT
> +static void *migrate_mapped_ram_dio_start(QTestState *from,
> +                                                 QTestState *to)
> +{
> +    migrate_mapped_ram_start(from, to);
> +    migrate_set_parameter_bool(from, "direct-io", true);
> +    migrate_set_parameter_bool(to, "direct-io", true);
> +
> +    return NULL;
> +}
> +
> +static void *migrate_multifd_mapped_ram_dio_start(QTestState *from,
> +                                                 QTestState *to)
> +{
> +    migrate_multifd_mapped_ram_start(from, to);
> +    return migrate_mapped_ram_dio_start(from, to);
> +}
> +
> +static void test_multifd_file_mapped_ram_dio(void)
> +{
> +    g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
> +                                           FILE_TEST_FILENAME);
> +    MigrateCommon args = {
> +        .connect_uri = uri,
> +        .listen_uri = "defer",
> +        .start_hook = migrate_multifd_mapped_ram_dio_start,
> +    };
> +
> +    if (!probe_o_direct_support(tmpfs)) {
> +        g_test_skip("Filesystem does not support O_DIRECT");
> +        return;
> +    }
> +
> +    test_file_common(&args, true);
> +}
> +
> +#endif /* O_DIRECT */
>  
>  static void test_precopy_tcp_plain(void)
>  {
> @@ -3719,6 +3756,11 @@ int main(int argc, char **argv)
>      migration_test_add("/migration/multifd/file/mapped-ram/live",
>                         test_multifd_file_mapped_ram_live);
>  
> +#ifdef O_DIRECT
> +    migration_test_add("/migration/multifd/file/mapped-ram/dio",
> +                       test_multifd_file_mapped_ram_dio);
> +#endif
> +
>  #ifdef CONFIG_GNUTLS
>      migration_test_add("/migration/precopy/unix/tls/psk",
>                         test_precopy_unix_tls_psk);
> -- 
> 2.35.3
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 8/9] migration: Add support for fdset with multifd + file
  2024-04-26 14:20 ` [PATCH 8/9] migration: Add support for fdset with multifd + file Fabiano Rosas
@ 2024-05-08  8:53   ` Daniel P. Berrangé
  2024-05-08 18:23     ` Peter Xu
  0 siblings, 1 reply; 57+ messages in thread
From: Daniel P. Berrangé @ 2024-05-08  8:53 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, armbru, Peter Xu, Claudio Fontana, Jim Fehlig

On Fri, Apr 26, 2024 at 11:20:41AM -0300, Fabiano Rosas wrote:
> Allow multifd to use an fdset when migrating to a file. This is useful
> for the scenario where the management layer wants to have control over
> the migration file.
> 
> By receiving the file descriptors directly, QEMU can delegate some
> high level operating system operations to the management layer (such
> as mandatory access control). The management layer might also want to
> add its own headers before the migration stream.
> 
> Enable the "file:/dev/fdset/#" syntax for the multifd migration with
> mapped-ram. The requirements for the fdset mechanism are:
> 
> On the migration source side:
> 
> - the fdset must contain two fds that are not duplicates between
>   themselves;
> - if direct-io is to be used, exactly one of the fds must have the
>   O_DIRECT flag set;
> - the file must be opened with WRONLY both times.
> 
> On the migration destination side:
> 
> - the fdset must contain one fd;
> - the file must be opened with RDONLY.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  docs/devel/migration/main.rst       | 18 ++++++++++++++
>  docs/devel/migration/mapped-ram.rst |  6 ++++-
>  migration/file.c                    | 38 ++++++++++++++++++++++++++++-
>  3 files changed, 60 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
> index 54385a23e5..50f6096470 100644
> --- a/docs/devel/migration/main.rst
> +++ b/docs/devel/migration/main.rst
> @@ -47,6 +47,24 @@ over any transport.
>    QEMU interference. Note that QEMU does not flush cached file
>    data/metadata at the end of migration.
>  
> +  The file migration also supports using a file that has already been
> +  opened. A set of file descriptors is passed to QEMU via an "fdset"
> +  (see add-fd QMP command documentation). This method allows a
> +  management application to have control over the migration file
> +  opening operation. There are, however, strict requirements to this
> +  interface:
> +
> +  On the migration source side:
> +    - if the multifd capability is to be used, the fdset must contain
> +      two file descriptors that are not duplicates between themselves;
> +    - if the direct-io capability is to be used, exactly one of the
> +      file descriptors must have the O_DIRECT flag set;
> +    - the file must be opened with WRONLY.
> +
> +  On the migration destination side:
> +    - the fdset must contain one file descriptor;
> +    - the file must be opened with RDONLY.
> +
>  In addition, support is included for migration using RDMA, which
>  transports the page data using ``RDMA``, where the hardware takes care of
>  transporting the pages, and the load on the CPU is much lower.  While the
> diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst
> index fa4cefd9fc..e6505511f0 100644
> --- a/docs/devel/migration/mapped-ram.rst
> +++ b/docs/devel/migration/mapped-ram.rst
> @@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a
>  sequential stream. Having the pages at fixed offsets also allows the
>  usage of O_DIRECT for save/restore of the migration stream as the
>  pages are ensured to be written respecting O_DIRECT alignment
> -restrictions (direct-io support not yet implemented).
> +restrictions.
>  
>  Usage
>  -----
> @@ -35,6 +35,10 @@ Use a ``file:`` URL for migration:
>  Mapped-ram migration is best done non-live, i.e. by stopping the VM on
>  the source side before migrating.
>  
> +For best performance enable the ``direct-io`` capability as well:
> +
> +    ``migrate_set_capability direct-io on``
> +
>  Use-cases
>  ---------
>  
> diff --git a/migration/file.c b/migration/file.c
> index b9265b14dd..3bc8bc7463 100644
> --- a/migration/file.c
> +++ b/migration/file.c
> @@ -17,6 +17,7 @@
>  #include "io/channel-file.h"
>  #include "io/channel-socket.h"
>  #include "io/channel-util.h"
> +#include "monitor/monitor.h"
>  #include "options.h"
>  #include "trace.h"
>  
> @@ -54,10 +55,18 @@ static void file_remove_fdset(void)
>      }
>  }
>  
> +/*
> + * With multifd, due to the behavior of the dup() system call, we need
> + * the fdset to have two non-duplicate fds so we can enable direct IO
> + * in the secondary channels without affecting the main channel.
> + */
>  static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
>                               Error **errp)
>  {
> +    FdsetInfoList *fds_info;
> +    FdsetFdInfoList *fd_info;
>      const char *fdset_id_str;
> +    int nfds = 0;
>  
>      *fdset_id = -1;
>  
> @@ -71,6 +80,32 @@ static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
>          return false;
>      }
>  
> +    if (!migrate_multifd() || !migrate_direct_io()) {
> +        return true;
> +    }
> +
> +    for (fds_info = qmp_query_fdsets(NULL); fds_info;
> +         fds_info = fds_info->next) {
> +
> +        if (*fdset_id != fds_info->value->fdset_id) {
> +            continue;
> +        }
> +
> +        for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) {
> +            if (nfds++ > 2) {
> +                break;
> +            }
> +        }
> +    }
> +
> +    if (nfds != 2) {
> +        error_setg(errp, "Outgoing migration needs two fds in the fdset, "
> +                   "got %d", nfds);
> +        qmp_remove_fd(*fdset_id, false, -1, NULL);
> +        *fdset_id = -1;
> +        return false;
> +    }
> +
>      return true;
>  }

Related to my thoughts in an earlier patch, where I say that use of fdsets
ought to be transparent to QEMU code, I'm not a fan of having this logic
in migration code.

IIUC, the migration code will call  qio_channel_file_new_path twice,
once with O_DIRECT and once without. This should trigger two calls
into monitor_fdset_dup_fd_add with different flags. If we're matching
flags in that monitor_fdset_dup_fd_add(), then if only 1 FD was
provided, are we not able to report an error there ?

>  
> @@ -209,10 +244,11 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp)
>      g_autofree char *filename = g_strdup(file_args->filename);
>      QIOChannelFile *fioc = NULL;
>      uint64_t offset = file_args->offset;
> +    int flags = O_RDONLY;
>  
>      trace_migration_file_incoming(filename);
>  
> -    fioc = qio_channel_file_new_path(filename, O_RDONLY, 0, errp);
> +    fioc = qio_channel_file_new_path(filename, flags, 0, errp);
>      if (!fioc) {
>          return;
>      }
> -- 
> 2.35.3
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 9/9] tests/qtest/migration: Add a test for mapped-ram with passing of fds
  2024-04-26 14:20 ` [PATCH 9/9] tests/qtest/migration: Add a test for mapped-ram with passing of fds Fabiano Rosas
@ 2024-05-08  8:56   ` Daniel P. Berrangé
  0 siblings, 0 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2024-05-08  8:56 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, armbru, Peter Xu, Claudio Fontana, Jim Fehlig,
	Thomas Huth, Laurent Vivier, Paolo Bonzini

On Fri, Apr 26, 2024 at 11:20:42AM -0300, Fabiano Rosas wrote:
> Add a multifd test for mapped-ram with passing of fds into QEMU. This
> is how libvirt will consume the feature.
> 
> There are a couple of details to the fdset mechanism:
> 
> - multifd needs two distinct file descriptors (not duplicated with
>   dup()) on the outgoing side so it can enable O_DIRECT only on the
>   channels that write with alignment. The dup() system call creates
>   file descriptors that share status flags, of which O_DIRECT is one.
> 
>   the incoming side doesn't set O_DIRECT, so it can dup() fds and
>   therefore can receive only one in the fdset.
> 
> - the open() access mode flags used for the fds passed into QEMU need
>   to match the flags QEMU uses to open the file. Currently O_WRONLY
>   for src and O_RDONLY for dst.
> 
> O_DIRECT is not supported on all systems/filesystems, so run the fdset
> test without O_DIRECT if that's the case. The migration code should
> still work in that scenario.

If O_DIRECT is not supported, then we're not setting 'direct-io',
and thus isn't this test just duplicating coverage of existing
tests ?

If this test is specifically to cover O_DIRECT, then I'd just
#ifdef the entire thing with O_DIRECT.

> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  tests/qtest/migration-test.c | 90 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 90 insertions(+)
> 
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 512b7ede8b..d83f1bdd4f 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -2331,8 +2331,93 @@ static void test_multifd_file_mapped_ram_dio(void)
>      test_file_common(&args, true);
>  }
>  
> +static void migrate_multifd_mapped_ram_fdset_dio_end(QTestState *from,
> +                                                    QTestState *to,
> +                                                    void *opaque)
> +{
> +    QDict *resp;
> +    QList *fdsets;
> +
> +    file_offset_finish_hook(from, to, opaque);
> +
> +    /*
> +     * Check that we removed the fdsets after migration, otherwise a
> +     * second migration would fail due to too many fdsets.
> +     */
> +
> +    resp = qtest_qmp(from, "{'execute': 'query-fdsets', "
> +                     "'arguments': {}}");
> +    g_assert(qdict_haskey(resp, "return"));
> +    fdsets = qdict_get_qlist(resp, "return");
> +    g_assert(fdsets && qlist_empty(fdsets));
> +}
>  #endif /* O_DIRECT */
>  
> +#ifndef _WIN32
> +static void *migrate_multifd_mapped_ram_fdset(QTestState *from, QTestState *to)
> +{
> +    g_autofree char *file = g_strdup_printf("%s/%s", tmpfs, FILE_TEST_FILENAME);
> +    int fds[3];
> +    int src_flags = O_WRONLY;
> +
> +    file_dirty_offset_region();
> +
> +    /* main outgoing channel: no O_DIRECT */
> +    fds[0] = open(file, src_flags, 0660);
> +    assert(fds[0] != -1);
> +
> +    qtest_qmp_fds_assert_success(from, &fds[0], 1, "{'execute': 'add-fd', "
> +                                 "'arguments': {'fdset-id': 1}}");
> +
> +#ifdef O_DIRECT
> +    src_flags |= O_DIRECT;
> +
> +    /* secondary outgoing channels */
> +    fds[1] = open(file, src_flags, 0660);
> +    assert(fds[1] != -1);
> +
> +    qtest_qmp_fds_assert_success(from, &fds[1], 1, "{'execute': 'add-fd', "
> +                                 "'arguments': {'fdset-id': 1}}");
> +
> +    /* incoming channel */
> +    fds[2] = open(file, O_CREAT | O_RDONLY, 0660);
> +    assert(fds[2] != -1);
> +
> +    qtest_qmp_fds_assert_success(to, &fds[2], 1, "{'execute': 'add-fd', "
> +                                 "'arguments': {'fdset-id': 1}}");
> +
> +    migrate_multifd_mapped_ram_dio_start(from, to);
> +#else
> +    migrate_multifd_mapped_ram_start(from, to);
> +#endif
> +
> +    return NULL;
> +}
> +
> +static void test_multifd_file_mapped_ram_fdset(void)
> +{
> +    g_autofree char *uri = g_strdup_printf("file:/dev/fdset/1,offset=%d",
> +                                           FILE_TEST_OFFSET);
> +    MigrateCommon args = {
> +        .connect_uri = uri,
> +        .listen_uri = "defer",
> +        .start_hook = migrate_multifd_mapped_ram_fdset,
> +#ifdef O_DIRECT
> +        .finish_hook = migrate_multifd_mapped_ram_fdset_dio_end,
> +#endif
> +    };
> +
> +#ifdef O_DIRECT
> +    if (!probe_o_direct_support(tmpfs)) {
> +        g_test_skip("Filesystem does not support O_DIRECT");
> +        return;
> +    }
> +#endif
> +
> +    test_file_common(&args, true);
> +}
> +#endif /* _WIN32 */
> +
>  static void test_precopy_tcp_plain(void)
>  {
>      MigrateCommon args = {
> @@ -3761,6 +3846,11 @@ int main(int argc, char **argv)
>                         test_multifd_file_mapped_ram_dio);
>  #endif
>  
> +#ifndef _WIN32
> +    qtest_add_func("/migration/multifd/file/mapped-ram/fdset",
> +                   test_multifd_file_mapped_ram_fdset);
> +#endif
> +
>  #ifdef CONFIG_GNUTLS
>      migration_test_add("/migration/precopy/unix/tls/psk",
>                         test_precopy_unix_tls_psk);
> -- 
> 2.35.3
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/9] migration: Fix file migration with fdset
  2024-05-08  8:02     ` Daniel P. Berrangé
@ 2024-05-08 12:49       ` Peter Xu
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Xu @ 2024-05-08 12:49 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Fabiano Rosas, qemu-devel, armbru, Claudio Fontana, Jim Fehlig

On Wed, May 08, 2024 at 09:02:16AM +0100, Daniel P. Berrangé wrote:
> On Fri, May 03, 2024 at 12:23:51PM -0400, Peter Xu wrote:
> > On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
> > > When the migration using the "file:" URI was implemented, I don't
> > > think any of us noticed that if you pass in a file name with the
> > > format "/dev/fdset/N", this allows a file descriptor to be passed in
> > > to QEMU and that behaves just like the "fd:" URI. So the "file:"
> > > support has been added without regard for the fdset part and we got
> > > some things wrong.
> > > 
> > > The first issue is that we should not truncate the migration file if
> > > we're allowing an fd + offset. We need to leave the file contents
> > > untouched.
> > 
> > I'm wondering whether we can use fallocate() instead on the ranges so that
> > we always don't open() with O_TRUNC.  Before that..  could you remind me
> > why do we need to truncate in the first place?  I definitely missed
> > something else here too.
> 
> You're mixing distinct concepts here. fallocate makes a file region
> non-sparse, while O_TRUNC removes all existing allocation, making it
> sparse if we write at non-contiguous offsets. I don't think we would
> want to call fallocate, since we /want/ a sparse file so that we
> don't needlessly store large regions of all-zeros as RAM maps.

I meant fallocate() with FALLOC_FL_PUNCH_HOLE.  But now I think it'll be
good we avoid both.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 8/9] migration: Add support for fdset with multifd + file
  2024-05-08  8:53   ` Daniel P. Berrangé
@ 2024-05-08 18:23     ` Peter Xu
  2024-05-08 20:39       ` Fabiano Rosas
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Xu @ 2024-05-08 18:23 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Fabiano Rosas, qemu-devel, armbru, Claudio Fontana, Jim Fehlig

On Wed, May 08, 2024 at 09:53:48AM +0100, Daniel P. Berrangé wrote:
> On Fri, Apr 26, 2024 at 11:20:41AM -0300, Fabiano Rosas wrote:
> > Allow multifd to use an fdset when migrating to a file. This is useful
> > for the scenario where the management layer wants to have control over
> > the migration file.
> > 
> > By receiving the file descriptors directly, QEMU can delegate some
> > high level operating system operations to the management layer (such
> > as mandatory access control). The management layer might also want to
> > add its own headers before the migration stream.
> > 
> > Enable the "file:/dev/fdset/#" syntax for the multifd migration with
> > mapped-ram. The requirements for the fdset mechanism are:
> > 
> > On the migration source side:
> > 
> > - the fdset must contain two fds that are not duplicates between
> >   themselves;
> > - if direct-io is to be used, exactly one of the fds must have the
> >   O_DIRECT flag set;
> > - the file must be opened with WRONLY both times.
> > 
> > On the migration destination side:
> > 
> > - the fdset must contain one fd;
> > - the file must be opened with RDONLY.
> > 
> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
> > ---
> >  docs/devel/migration/main.rst       | 18 ++++++++++++++
> >  docs/devel/migration/mapped-ram.rst |  6 ++++-
> >  migration/file.c                    | 38 ++++++++++++++++++++++++++++-
> >  3 files changed, 60 insertions(+), 2 deletions(-)
> > 
> > diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
> > index 54385a23e5..50f6096470 100644
> > --- a/docs/devel/migration/main.rst
> > +++ b/docs/devel/migration/main.rst
> > @@ -47,6 +47,24 @@ over any transport.
> >    QEMU interference. Note that QEMU does not flush cached file
> >    data/metadata at the end of migration.
> >  
> > +  The file migration also supports using a file that has already been
> > +  opened. A set of file descriptors is passed to QEMU via an "fdset"
> > +  (see add-fd QMP command documentation). This method allows a
> > +  management application to have control over the migration file
> > +  opening operation. There are, however, strict requirements to this
> > +  interface:
> > +
> > +  On the migration source side:
> > +    - if the multifd capability is to be used, the fdset must contain
> > +      two file descriptors that are not duplicates between themselves;
> > +    - if the direct-io capability is to be used, exactly one of the
> > +      file descriptors must have the O_DIRECT flag set;
> > +    - the file must be opened with WRONLY.
> > +
> > +  On the migration destination side:
> > +    - the fdset must contain one file descriptor;
> > +    - the file must be opened with RDONLY.
> > +
> >  In addition, support is included for migration using RDMA, which
> >  transports the page data using ``RDMA``, where the hardware takes care of
> >  transporting the pages, and the load on the CPU is much lower.  While the
> > diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst
> > index fa4cefd9fc..e6505511f0 100644
> > --- a/docs/devel/migration/mapped-ram.rst
> > +++ b/docs/devel/migration/mapped-ram.rst
> > @@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a
> >  sequential stream. Having the pages at fixed offsets also allows the
> >  usage of O_DIRECT for save/restore of the migration stream as the
> >  pages are ensured to be written respecting O_DIRECT alignment
> > -restrictions (direct-io support not yet implemented).
> > +restrictions.
> >  
> >  Usage
> >  -----
> > @@ -35,6 +35,10 @@ Use a ``file:`` URL for migration:
> >  Mapped-ram migration is best done non-live, i.e. by stopping the VM on
> >  the source side before migrating.
> >  
> > +For best performance enable the ``direct-io`` capability as well:
> > +
> > +    ``migrate_set_capability direct-io on``
> > +
> >  Use-cases
> >  ---------
> >  
> > diff --git a/migration/file.c b/migration/file.c
> > index b9265b14dd..3bc8bc7463 100644
> > --- a/migration/file.c
> > +++ b/migration/file.c
> > @@ -17,6 +17,7 @@
> >  #include "io/channel-file.h"
> >  #include "io/channel-socket.h"
> >  #include "io/channel-util.h"
> > +#include "monitor/monitor.h"
> >  #include "options.h"
> >  #include "trace.h"
> >  
> > @@ -54,10 +55,18 @@ static void file_remove_fdset(void)
> >      }
> >  }
> >  
> > +/*
> > + * With multifd, due to the behavior of the dup() system call, we need
> > + * the fdset to have two non-duplicate fds so we can enable direct IO
> > + * in the secondary channels without affecting the main channel.
> > + */
> >  static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
> >                               Error **errp)
> >  {
> > +    FdsetInfoList *fds_info;
> > +    FdsetFdInfoList *fd_info;
> >      const char *fdset_id_str;
> > +    int nfds = 0;
> >  
> >      *fdset_id = -1;
> >  
> > @@ -71,6 +80,32 @@ static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
> >          return false;
> >      }
> >  
> > +    if (!migrate_multifd() || !migrate_direct_io()) {
> > +        return true;
> > +    }
> > +
> > +    for (fds_info = qmp_query_fdsets(NULL); fds_info;
> > +         fds_info = fds_info->next) {
> > +
> > +        if (*fdset_id != fds_info->value->fdset_id) {
> > +            continue;
> > +        }
> > +
> > +        for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) {
> > +            if (nfds++ > 2) {
> > +                break;
> > +            }
> > +        }
> > +    }
> > +
> > +    if (nfds != 2) {
> > +        error_setg(errp, "Outgoing migration needs two fds in the fdset, "
> > +                   "got %d", nfds);
> > +        qmp_remove_fd(*fdset_id, false, -1, NULL);
> > +        *fdset_id = -1;
> > +        return false;
> > +    }
> > +
> >      return true;
> >  }
> 
> Related to my thoughts in an earlier patch, where I say that use of fdsets
> ought to be transparent to QEMU code, I'm not a fan of having this logic
> in migration code.
> 
> IIUC, the migration code will call  qio_channel_file_new_path twice,
> once with O_DIRECT and once without. This should trigger two calls
> into monitor_fdset_dup_fd_add with different flags. If we're matching
> flags in that monitor_fdset_dup_fd_add(), then if only 1 FD was
> provided, are we not able to report an error there ?

Right, this sounds working.

For a real sanity check, we may want to somehow check the two fds returned
from qio_channel_file_new_path() to point to the same file underneath.

What mentioned in the other thread (kcmp with KCMP_FILE) might not work, as
the whole purpose of having two fds is to make sure they have different
struct file to back the fd (and only one of them has O_DIRECT).  fstat()
might work in this case over the st_ino field, etc. maybe fstatfs() too but
perhaps that's over cautious.  Just a pain to use two fds as a start..

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 8/9] migration: Add support for fdset with multifd + file
  2024-05-08 18:23     ` Peter Xu
@ 2024-05-08 20:39       ` Fabiano Rosas
  2024-05-09  8:08         ` Daniel P. Berrangé
  0 siblings, 1 reply; 57+ messages in thread
From: Fabiano Rosas @ 2024-05-08 20:39 UTC (permalink / raw)
  To: Peter Xu, Daniel P. Berrangé
  Cc: qemu-devel, armbru, Claudio Fontana, Jim Fehlig

Peter Xu <peterx@redhat.com> writes:

> On Wed, May 08, 2024 at 09:53:48AM +0100, Daniel P. Berrangé wrote:
>> On Fri, Apr 26, 2024 at 11:20:41AM -0300, Fabiano Rosas wrote:
>> > Allow multifd to use an fdset when migrating to a file. This is useful
>> > for the scenario where the management layer wants to have control over
>> > the migration file.
>> > 
>> > By receiving the file descriptors directly, QEMU can delegate some
>> > high level operating system operations to the management layer (such
>> > as mandatory access control). The management layer might also want to
>> > add its own headers before the migration stream.
>> > 
>> > Enable the "file:/dev/fdset/#" syntax for the multifd migration with
>> > mapped-ram. The requirements for the fdset mechanism are:
>> > 
>> > On the migration source side:
>> > 
>> > - the fdset must contain two fds that are not duplicates between
>> >   themselves;
>> > - if direct-io is to be used, exactly one of the fds must have the
>> >   O_DIRECT flag set;
>> > - the file must be opened with WRONLY both times.
>> > 
>> > On the migration destination side:
>> > 
>> > - the fdset must contain one fd;
>> > - the file must be opened with RDONLY.
>> > 
>> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> > ---
>> >  docs/devel/migration/main.rst       | 18 ++++++++++++++
>> >  docs/devel/migration/mapped-ram.rst |  6 ++++-
>> >  migration/file.c                    | 38 ++++++++++++++++++++++++++++-
>> >  3 files changed, 60 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
>> > index 54385a23e5..50f6096470 100644
>> > --- a/docs/devel/migration/main.rst
>> > +++ b/docs/devel/migration/main.rst
>> > @@ -47,6 +47,24 @@ over any transport.
>> >    QEMU interference. Note that QEMU does not flush cached file
>> >    data/metadata at the end of migration.
>> >  
>> > +  The file migration also supports using a file that has already been
>> > +  opened. A set of file descriptors is passed to QEMU via an "fdset"
>> > +  (see add-fd QMP command documentation). This method allows a
>> > +  management application to have control over the migration file
>> > +  opening operation. There are, however, strict requirements to this
>> > +  interface:
>> > +
>> > +  On the migration source side:
>> > +    - if the multifd capability is to be used, the fdset must contain
>> > +      two file descriptors that are not duplicates between themselves;
>> > +    - if the direct-io capability is to be used, exactly one of the
>> > +      file descriptors must have the O_DIRECT flag set;
>> > +    - the file must be opened with WRONLY.
>> > +
>> > +  On the migration destination side:
>> > +    - the fdset must contain one file descriptor;
>> > +    - the file must be opened with RDONLY.
>> > +
>> >  In addition, support is included for migration using RDMA, which
>> >  transports the page data using ``RDMA``, where the hardware takes care of
>> >  transporting the pages, and the load on the CPU is much lower.  While the
>> > diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst
>> > index fa4cefd9fc..e6505511f0 100644
>> > --- a/docs/devel/migration/mapped-ram.rst
>> > +++ b/docs/devel/migration/mapped-ram.rst
>> > @@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a
>> >  sequential stream. Having the pages at fixed offsets also allows the
>> >  usage of O_DIRECT for save/restore of the migration stream as the
>> >  pages are ensured to be written respecting O_DIRECT alignment
>> > -restrictions (direct-io support not yet implemented).
>> > +restrictions.
>> >  
>> >  Usage
>> >  -----
>> > @@ -35,6 +35,10 @@ Use a ``file:`` URL for migration:
>> >  Mapped-ram migration is best done non-live, i.e. by stopping the VM on
>> >  the source side before migrating.
>> >  
>> > +For best performance enable the ``direct-io`` capability as well:
>> > +
>> > +    ``migrate_set_capability direct-io on``
>> > +
>> >  Use-cases
>> >  ---------
>> >  
>> > diff --git a/migration/file.c b/migration/file.c
>> > index b9265b14dd..3bc8bc7463 100644
>> > --- a/migration/file.c
>> > +++ b/migration/file.c
>> > @@ -17,6 +17,7 @@
>> >  #include "io/channel-file.h"
>> >  #include "io/channel-socket.h"
>> >  #include "io/channel-util.h"
>> > +#include "monitor/monitor.h"
>> >  #include "options.h"
>> >  #include "trace.h"
>> >  
>> > @@ -54,10 +55,18 @@ static void file_remove_fdset(void)
>> >      }
>> >  }
>> >  
>> > +/*
>> > + * With multifd, due to the behavior of the dup() system call, we need
>> > + * the fdset to have two non-duplicate fds so we can enable direct IO
>> > + * in the secondary channels without affecting the main channel.
>> > + */
>> >  static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
>> >                               Error **errp)
>> >  {
>> > +    FdsetInfoList *fds_info;
>> > +    FdsetFdInfoList *fd_info;
>> >      const char *fdset_id_str;
>> > +    int nfds = 0;
>> >  
>> >      *fdset_id = -1;
>> >  
>> > @@ -71,6 +80,32 @@ static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
>> >          return false;
>> >      }
>> >  
>> > +    if (!migrate_multifd() || !migrate_direct_io()) {
>> > +        return true;
>> > +    }
>> > +
>> > +    for (fds_info = qmp_query_fdsets(NULL); fds_info;
>> > +         fds_info = fds_info->next) {
>> > +
>> > +        if (*fdset_id != fds_info->value->fdset_id) {
>> > +            continue;
>> > +        }
>> > +
>> > +        for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) {
>> > +            if (nfds++ > 2) {
>> > +                break;
>> > +            }
>> > +        }
>> > +    }
>> > +
>> > +    if (nfds != 2) {
>> > +        error_setg(errp, "Outgoing migration needs two fds in the fdset, "
>> > +                   "got %d", nfds);
>> > +        qmp_remove_fd(*fdset_id, false, -1, NULL);
>> > +        *fdset_id = -1;
>> > +        return false;
>> > +    }
>> > +
>> >      return true;
>> >  }
>> 
>> Related to my thoughts in an earlier patch, where I say that use of fdsets
>> ought to be transparent to QEMU code, I'm not a fan of having this logic
>> in migration code.
>> 
>> IIUC, the migration code will call  qio_channel_file_new_path twice,
>> once with O_DIRECT and once without. This should trigger two calls
>> into monitor_fdset_dup_fd_add with different flags. If we're matching
>> flags in that monitor_fdset_dup_fd_add(), then if only 1 FD was
>> provided, are we not able to report an error there ?
>
> Right, this sounds working.

It works, but due to how low-level fdset is, it's difficult to match the
low level error to anything meaningful we can report to the user. I'll
have to add an errp to monitor_fdset_dup_fd_add(). Its returns are not
very useful.

-1 with no errno
-1 with EACCES (should actually be EBADF)
-1 with ENOENT

There has been some discusstion around this before actually:

https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg02544.html

Or, you know, let the management layer figure it out. We seem to be
heading in this direction already. I imagine once the code is written to
interact with QEMU, it would not have any reason to change, so it might
be ok to replace some of the code I'm adding in this series with
documentation and call it a day. I don't like this approach very much,
but it would definitely make this series way simpler.

>
> For a real sanity check, we may want to somehow check the two fds returned
> from qio_channel_file_new_path() to point to the same file underneath.
>
> What mentioned in the other thread (kcmp with KCMP_FILE) might not work, as
> the whole purpose of having two fds is to make sure they have different
> struct file to back the fd (and only one of them has O_DIRECT).  fstat()
> might work in this case over the st_ino field, etc. maybe fstatfs() too but
> perhaps that's over cautious.  Just a pain to use two fds as a start..
>
> Thanks,


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/9] migration: Fix file migration with fdset
  2024-05-08  8:00   ` Daniel P. Berrangé
@ 2024-05-08 20:45     ` Fabiano Rosas
  0 siblings, 0 replies; 57+ messages in thread
From: Fabiano Rosas @ 2024-05-08 20:45 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, armbru, Peter Xu, Claudio Fontana, Jim Fehlig

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
>> When the migration using the "file:" URI was implemented, I don't
>> think any of us noticed that if you pass in a file name with the
>> format "/dev/fdset/N", this allows a file descriptor to be passed in
>> to QEMU and that behaves just like the "fd:" URI. So the "file:"
>> support has been added without regard for the fdset part and we got
>> some things wrong.
>> 
>> The first issue is that we should not truncate the migration file if
>> we're allowing an fd + offset. We need to leave the file contents
>> untouched.
>> 
>> The second issue is that there's an expectation that QEMU removes the
>> fd after the migration has finished. That's what the "fd:" code
>> does. Otherwise a second migration on the same VM could attempt to
>> provide an fdset with the same name and QEMU would reject it.
>> 
>> We can fix the first issue by detecting when we're using the fdset
>> vs. the plain file name. This requires storing the fdset_id
>> somewhere. We can then use this stored fdset_id to do cleanup at the
>> end and also fix the second issue.
>
> The use of /dev/fdset is supposed to be transparent to code in
> QEMU, so modifying migration to learn about FD sets to do manual
> cleanup is breaking that API facade.
>
> IMHO the transparency of the design points towards the mgmt app
> calling 'remove-fd' set after migration has started, in order
> that a later migraiton can use the same fdset name.

I got this slightly wrong, QEMU doesn't reject the creation of the
fdset, it just reuses the old one and adds the new fd to it. That is
somewhat worse because then we'd choose the wrong fd when migrating. But
I guess we could just require the management layer to do proper
management of the fds/fdset.

>
> Ideally the truncation issue needs to be transparent too.
>
> Rather than detecting use of fdset, we can not use O_TRUNC
> at all. Instead we can call ftruncate(fd, offset), which
> should work in both normal and fdset scenarios.
>

Good idea.

>> 
>> Fixes: 385f510df5 ("migration: file URI offset")
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  migration/file.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 46 insertions(+), 2 deletions(-)
>> 
>> diff --git a/migration/file.c b/migration/file.c
>> index ab18ba505a..8f30999400 100644
>> --- a/migration/file.c
>> +++ b/migration/file.c
>> @@ -10,6 +10,7 @@
>>  #include "qemu/cutils.h"
>>  #include "qemu/error-report.h"
>>  #include "qapi/error.h"
>> +#include "qapi/qapi-commands-misc.h"
>>  #include "channel.h"
>>  #include "file.h"
>>  #include "migration.h"
>> @@ -23,6 +24,7 @@
>>  
>>  static struct FileOutgoingArgs {
>>      char *fname;
>> +    int64_t fdset_id;
>>  } outgoing_args;
>>  
>>  /* Remove the offset option from @filespec and return it in @offsetp. */
>> @@ -44,10 +46,39 @@ int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp)
>>      return 0;
>>  }
>>  
>> +static void file_remove_fdset(void)
>> +{
>> +    if (outgoing_args.fdset_id != -1) {
>> +        qmp_remove_fd(outgoing_args.fdset_id, false, -1, NULL);
>> +        outgoing_args.fdset_id = -1;
>> +    }
>> +}
>> +
>> +static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
>> +                             Error **errp)
>> +{
>> +    const char *fdset_id_str;
>> +
>> +    *fdset_id = -1;
>> +
>> +    if (!strstart(filename, "/dev/fdset/", &fdset_id_str)) {
>> +        return true;
>> +    }
>> +
>> +    *fdset_id = qemu_parse_fd(fdset_id_str);
>> +    if (*fdset_id == -1) {
>> +        error_setg_errno(errp, EINVAL, "Could not parse fdset %s", fdset_id_str);
>> +        return false;
>> +    }
>> +
>> +    return true;
>> +}
>> +
>>  void file_cleanup_outgoing_migration(void)
>>  {
>>      g_free(outgoing_args.fname);
>>      outgoing_args.fname = NULL;
>> +    file_remove_fdset();
>>  }
>>  
>>  bool file_send_channel_create(gpointer opaque, Error **errp)
>> @@ -81,11 +112,24 @@ void file_start_outgoing_migration(MigrationState *s,
>>      g_autofree char *filename = g_strdup(file_args->filename);
>>      uint64_t offset = file_args->offset;
>>      QIOChannel *ioc;
>> +    int flags = O_CREAT | O_WRONLY;
>>  
>>      trace_migration_file_outgoing(filename);
>>  
>> -    fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY | O_TRUNC,
>> -                                     0600, errp);
>> +    if (!file_parse_fdset(filename, &outgoing_args.fdset_id, errp)) {
>> +        return;
>> +    }
>> +
>> +    /*
>> +     * Only truncate if it's QEMU opening the file. If an fd has been
>> +     * passed in the file will already contain data written by the
>> +     * management layer.
>> +     */
>> +    if (outgoing_args.fdset_id == -1) {
>> +        flags |= O_TRUNC;
>> +    }
>> +
>> +    fioc = qio_channel_file_new_path(filename, flags, 0600, errp);
>>      if (!fioc) {
>>          return;
>>      }
>> -- 
>> 2.35.3
>> 
>
> With regards,
> Daniel


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 8/9] migration: Add support for fdset with multifd + file
  2024-05-08 20:39       ` Fabiano Rosas
@ 2024-05-09  8:08         ` Daniel P. Berrangé
  2024-05-17 22:43           ` Fabiano Rosas
  0 siblings, 1 reply; 57+ messages in thread
From: Daniel P. Berrangé @ 2024-05-09  8:08 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: Peter Xu, qemu-devel, armbru, Claudio Fontana, Jim Fehlig

On Wed, May 08, 2024 at 05:39:53PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Wed, May 08, 2024 at 09:53:48AM +0100, Daniel P. Berrangé wrote:
> >> On Fri, Apr 26, 2024 at 11:20:41AM -0300, Fabiano Rosas wrote:
> >> > Allow multifd to use an fdset when migrating to a file. This is useful
> >> > for the scenario where the management layer wants to have control over
> >> > the migration file.
> >> > 
> >> > By receiving the file descriptors directly, QEMU can delegate some
> >> > high level operating system operations to the management layer (such
> >> > as mandatory access control). The management layer might also want to
> >> > add its own headers before the migration stream.
> >> > 
> >> > Enable the "file:/dev/fdset/#" syntax for the multifd migration with
> >> > mapped-ram. The requirements for the fdset mechanism are:
> >> > 
> >> > On the migration source side:
> >> > 
> >> > - the fdset must contain two fds that are not duplicates between
> >> >   themselves;
> >> > - if direct-io is to be used, exactly one of the fds must have the
> >> >   O_DIRECT flag set;
> >> > - the file must be opened with WRONLY both times.
> >> > 
> >> > On the migration destination side:
> >> > 
> >> > - the fdset must contain one fd;
> >> > - the file must be opened with RDONLY.
> >> > 
> >> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> > ---
> >> >  docs/devel/migration/main.rst       | 18 ++++++++++++++
> >> >  docs/devel/migration/mapped-ram.rst |  6 ++++-
> >> >  migration/file.c                    | 38 ++++++++++++++++++++++++++++-
> >> >  3 files changed, 60 insertions(+), 2 deletions(-)
> >> > 
> >> > diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
> >> > index 54385a23e5..50f6096470 100644
> >> > --- a/docs/devel/migration/main.rst
> >> > +++ b/docs/devel/migration/main.rst
> >> > @@ -47,6 +47,24 @@ over any transport.
> >> >    QEMU interference. Note that QEMU does not flush cached file
> >> >    data/metadata at the end of migration.
> >> >  
> >> > +  The file migration also supports using a file that has already been
> >> > +  opened. A set of file descriptors is passed to QEMU via an "fdset"
> >> > +  (see add-fd QMP command documentation). This method allows a
> >> > +  management application to have control over the migration file
> >> > +  opening operation. There are, however, strict requirements to this
> >> > +  interface:
> >> > +
> >> > +  On the migration source side:
> >> > +    - if the multifd capability is to be used, the fdset must contain
> >> > +      two file descriptors that are not duplicates between themselves;
> >> > +    - if the direct-io capability is to be used, exactly one of the
> >> > +      file descriptors must have the O_DIRECT flag set;
> >> > +    - the file must be opened with WRONLY.
> >> > +
> >> > +  On the migration destination side:
> >> > +    - the fdset must contain one file descriptor;
> >> > +    - the file must be opened with RDONLY.
> >> > +
> >> >  In addition, support is included for migration using RDMA, which
> >> >  transports the page data using ``RDMA``, where the hardware takes care of
> >> >  transporting the pages, and the load on the CPU is much lower.  While the
> >> > diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst
> >> > index fa4cefd9fc..e6505511f0 100644
> >> > --- a/docs/devel/migration/mapped-ram.rst
> >> > +++ b/docs/devel/migration/mapped-ram.rst
> >> > @@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a
> >> >  sequential stream. Having the pages at fixed offsets also allows the
> >> >  usage of O_DIRECT for save/restore of the migration stream as the
> >> >  pages are ensured to be written respecting O_DIRECT alignment
> >> > -restrictions (direct-io support not yet implemented).
> >> > +restrictions.
> >> >  
> >> >  Usage
> >> >  -----
> >> > @@ -35,6 +35,10 @@ Use a ``file:`` URL for migration:
> >> >  Mapped-ram migration is best done non-live, i.e. by stopping the VM on
> >> >  the source side before migrating.
> >> >  
> >> > +For best performance enable the ``direct-io`` capability as well:
> >> > +
> >> > +    ``migrate_set_capability direct-io on``
> >> > +
> >> >  Use-cases
> >> >  ---------
> >> >  
> >> > diff --git a/migration/file.c b/migration/file.c
> >> > index b9265b14dd..3bc8bc7463 100644
> >> > --- a/migration/file.c
> >> > +++ b/migration/file.c
> >> > @@ -17,6 +17,7 @@
> >> >  #include "io/channel-file.h"
> >> >  #include "io/channel-socket.h"
> >> >  #include "io/channel-util.h"
> >> > +#include "monitor/monitor.h"
> >> >  #include "options.h"
> >> >  #include "trace.h"
> >> >  
> >> > @@ -54,10 +55,18 @@ static void file_remove_fdset(void)
> >> >      }
> >> >  }
> >> >  
> >> > +/*
> >> > + * With multifd, due to the behavior of the dup() system call, we need
> >> > + * the fdset to have two non-duplicate fds so we can enable direct IO
> >> > + * in the secondary channels without affecting the main channel.
> >> > + */
> >> >  static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
> >> >                               Error **errp)
> >> >  {
> >> > +    FdsetInfoList *fds_info;
> >> > +    FdsetFdInfoList *fd_info;
> >> >      const char *fdset_id_str;
> >> > +    int nfds = 0;
> >> >  
> >> >      *fdset_id = -1;
> >> >  
> >> > @@ -71,6 +80,32 @@ static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
> >> >          return false;
> >> >      }
> >> >  
> >> > +    if (!migrate_multifd() || !migrate_direct_io()) {
> >> > +        return true;
> >> > +    }
> >> > +
> >> > +    for (fds_info = qmp_query_fdsets(NULL); fds_info;
> >> > +         fds_info = fds_info->next) {
> >> > +
> >> > +        if (*fdset_id != fds_info->value->fdset_id) {
> >> > +            continue;
> >> > +        }
> >> > +
> >> > +        for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) {
> >> > +            if (nfds++ > 2) {
> >> > +                break;
> >> > +            }
> >> > +        }
> >> > +    }
> >> > +
> >> > +    if (nfds != 2) {
> >> > +        error_setg(errp, "Outgoing migration needs two fds in the fdset, "
> >> > +                   "got %d", nfds);
> >> > +        qmp_remove_fd(*fdset_id, false, -1, NULL);
> >> > +        *fdset_id = -1;
> >> > +        return false;
> >> > +    }
> >> > +
> >> >      return true;
> >> >  }
> >> 
> >> Related to my thoughts in an earlier patch, where I say that use of fdsets
> >> ought to be transparent to QEMU code, I'm not a fan of having this logic
> >> in migration code.
> >> 
> >> IIUC, the migration code will call  qio_channel_file_new_path twice,
> >> once with O_DIRECT and once without. This should trigger two calls
> >> into monitor_fdset_dup_fd_add with different flags. If we're matching
> >> flags in that monitor_fdset_dup_fd_add(), then if only 1 FD was
> >> provided, are we not able to report an error there ?
> >
> > Right, this sounds working.
> 
> It works, but due to how low-level fdset is, it's difficult to match the
> low level error to anything meaningful we can report to the user. I'll
> have to add an errp to monitor_fdset_dup_fd_add(). Its returns are not
> very useful.
> 
> -1 with no errno
> -1 with EACCES (should actually be EBADF)
> -1 with ENOENT
> 
> There has been some discusstion around this before actually:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg02544.html

The only caller of monitor_fdset_dup_fd_add is qemu_open_internal
and that has a "Error **errp" parameter.  We should rewrite
monitor_fdset_dup_fd_add to also have an "Error **errp" at which
point we can actually report useful, actionable error messages
from it. Errnos be gone !

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/9] migration: Add direct-io parameter
  2024-05-03 21:16       ` Peter Xu
@ 2024-05-14 14:10         ` Markus Armbruster
  2024-05-14 17:57           ` Fabiano Rosas
  0 siblings, 1 reply; 57+ messages in thread
From: Markus Armbruster @ 2024-05-14 14:10 UTC (permalink / raw)
  To: Peter Xu
  Cc: Fabiano Rosas, qemu-devel, berrange, armbru, Claudio Fontana,
	Jim Fehlig, Eric Blake

Peter Xu <peterx@redhat.com> writes:

> On Fri, May 03, 2024 at 05:49:32PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Fri, Apr 26, 2024 at 11:20:37AM -0300, Fabiano Rosas wrote:
>> >> Add the direct-io migration parameter that tells the migration code to
>> >> use O_DIRECT when opening the migration stream file whenever possible.
>> >> 
>> >> This is currently only used with the mapped-ram migration that has a
>> >> clear window guaranteed to perform aligned writes.
>> >> 
>> >> Acked-by: Markus Armbruster <armbru@redhat.com>
>> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> >> ---
>> >>  include/qemu/osdep.h           |  2 ++
>> >>  migration/migration-hmp-cmds.c | 11 +++++++++++
>> >>  migration/options.c            | 30 ++++++++++++++++++++++++++++++
>> >>  migration/options.h            |  1 +
>> >>  qapi/migration.json            | 18 +++++++++++++++---
>> >>  util/osdep.c                   |  9 +++++++++
>> >>  6 files changed, 68 insertions(+), 3 deletions(-)
>> >> 
>> >> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
>> >> index c7053cdc2b..645c14a65d 100644
>> >> --- a/include/qemu/osdep.h
>> >> +++ b/include/qemu/osdep.h
>> >> @@ -612,6 +612,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive);
>> >>  bool qemu_has_ofd_lock(void);
>> >>  #endif
>> >>  
>> >> +bool qemu_has_direct_io(void);
>> >> +
>> >>  #if defined(__HAIKU__) && defined(__i386__)
>> >>  #define FMT_pid "%ld"
>> >>  #elif defined(WIN64)
>> >> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
>> >> index 7e96ae6ffd..8496a2b34e 100644
>> >> --- a/migration/migration-hmp-cmds.c
>> >> +++ b/migration/migration-hmp-cmds.c
>> >> @@ -397,6 +397,13 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
>> >>          monitor_printf(mon, "%s: %s\n",
>> >>              MigrationParameter_str(MIGRATION_PARAMETER_MODE),
>> >>              qapi_enum_lookup(&MigMode_lookup, params->mode));
>> >> +
>> >> +        if (params->has_direct_io) {
>> >> +            monitor_printf(mon, "%s: %s\n",
>> >> +                           MigrationParameter_str(
>> >> +                               MIGRATION_PARAMETER_DIRECT_IO),
>> >> +                           params->direct_io ? "on" : "off");
>> >> +        }
>> >
>> > This will be the first parameter to optionally display here.  I think it's
>> > a sign of misuse of has_direct_io field..

Yes.  For similar existing parameters, we do

               assert(params->has_FOO);
               monitor_printf(mon, "%s: '%s'\n",
                              MigrationParameter_str(MIGRATION_PARAMETER_FOO),
                              ... params->FOO as string ...)

>> > IMHO has_direct_io should be best to be kept as "whether direct_io field is
>> > valid" and that's all of it.  It hopefully shouldn't contain more
>> > information than that, or otherwise it'll be another small challenge we
>> > need to overcome when we can remove all these has_* fields, and can also be
>> > easily overlooked.
>> 
>> I don't think I understand why we have those has_* fields. I thought my
>> usage of 'params->has_direct_io = qemu_has_direct_io()' was the correct
>> one, i.e. checking whether QEMU has any support for that parameter. Can
>> you help me out here?
>
> Here params is the pointer to "struct MigrationParameters", which is
> defined in qapi/migration.json.  And we have had "has_*" only because we
> allow optional fields with asterisks:
>
>   { 'struct': 'MigrationParameters',
>     'data': { '*announce-initial': 'size',
>               ...
>               } }
>
> So that's why it better only means "whether this field existed", because
> it's how it is defined.
>
> IIRC we (or say, Markus) used to have some attempts deduplicates those
> *MigrationParameter* things, and if success we have chance to drop has_*
> fields (in which case we simply always have them; that "has_" makes more
> sense only if in a QMP session to allow user only specify one or more
> things if not all).

qapi/migration.json:

    ##
    # @MigrationParameters:
    #
--> # The optional members aren't actually optional.
    #

In other words, the has_FOO generated for the members of
MigrationParameters must all be true.

These members became optional when we attempted to de-duplicate
MigrationParameters and MigrateSetParameters, but failed (see commit
de63ab61241 "migrate: Share common MigrationParameters struct" and
commit 1bda8b3c695 "migration: Unshare MigrationParameters struct for
now").



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/9] migration: Add direct-io parameter
  2024-05-14 14:10         ` Markus Armbruster
@ 2024-05-14 17:57           ` Fabiano Rosas
  2024-05-15  7:17             ` Markus Armbruster
  0 siblings, 1 reply; 57+ messages in thread
From: Fabiano Rosas @ 2024-05-14 17:57 UTC (permalink / raw)
  To: Markus Armbruster, Peter Xu
  Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig, Eric Blake

Markus Armbruster <armbru@redhat.com> writes:

> Peter Xu <peterx@redhat.com> writes:
>
>> On Fri, May 03, 2024 at 05:49:32PM -0300, Fabiano Rosas wrote:
>>> Peter Xu <peterx@redhat.com> writes:
>>> 
>>> > On Fri, Apr 26, 2024 at 11:20:37AM -0300, Fabiano Rosas wrote:
>>> >> Add the direct-io migration parameter that tells the migration code to
>>> >> use O_DIRECT when opening the migration stream file whenever possible.
>>> >> 
>>> >> This is currently only used with the mapped-ram migration that has a
>>> >> clear window guaranteed to perform aligned writes.
>>> >> 
>>> >> Acked-by: Markus Armbruster <armbru@redhat.com>
>>> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>>> >> ---
>>> >>  include/qemu/osdep.h           |  2 ++
>>> >>  migration/migration-hmp-cmds.c | 11 +++++++++++
>>> >>  migration/options.c            | 30 ++++++++++++++++++++++++++++++
>>> >>  migration/options.h            |  1 +
>>> >>  qapi/migration.json            | 18 +++++++++++++++---
>>> >>  util/osdep.c                   |  9 +++++++++
>>> >>  6 files changed, 68 insertions(+), 3 deletions(-)
>>> >> 
>>> >> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
>>> >> index c7053cdc2b..645c14a65d 100644
>>> >> --- a/include/qemu/osdep.h
>>> >> +++ b/include/qemu/osdep.h
>>> >> @@ -612,6 +612,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive);
>>> >>  bool qemu_has_ofd_lock(void);
>>> >>  #endif
>>> >>  
>>> >> +bool qemu_has_direct_io(void);
>>> >> +
>>> >>  #if defined(__HAIKU__) && defined(__i386__)
>>> >>  #define FMT_pid "%ld"
>>> >>  #elif defined(WIN64)
>>> >> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
>>> >> index 7e96ae6ffd..8496a2b34e 100644
>>> >> --- a/migration/migration-hmp-cmds.c
>>> >> +++ b/migration/migration-hmp-cmds.c
>>> >> @@ -397,6 +397,13 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
>>> >>          monitor_printf(mon, "%s: %s\n",
>>> >>              MigrationParameter_str(MIGRATION_PARAMETER_MODE),
>>> >>              qapi_enum_lookup(&MigMode_lookup, params->mode));
>>> >> +
>>> >> +        if (params->has_direct_io) {
>>> >> +            monitor_printf(mon, "%s: %s\n",
>>> >> +                           MigrationParameter_str(
>>> >> +                               MIGRATION_PARAMETER_DIRECT_IO),
>>> >> +                           params->direct_io ? "on" : "off");
>>> >> +        }
>>> >
>>> > This will be the first parameter to optionally display here.  I think it's
>>> > a sign of misuse of has_direct_io field..
>
> Yes.  For similar existing parameters, we do
>
>                assert(params->has_FOO);
>                monitor_printf(mon, "%s: '%s'\n",
>                               MigrationParameter_str(MIGRATION_PARAMETER_FOO),
>                               ... params->FOO as string ...)
>
>>> > IMHO has_direct_io should be best to be kept as "whether direct_io field is
>>> > valid" and that's all of it.  It hopefully shouldn't contain more
>>> > information than that, or otherwise it'll be another small challenge we
>>> > need to overcome when we can remove all these has_* fields, and can also be
>>> > easily overlooked.
>>> 
>>> I don't think I understand why we have those has_* fields. I thought my
>>> usage of 'params->has_direct_io = qemu_has_direct_io()' was the correct
>>> one, i.e. checking whether QEMU has any support for that parameter. Can
>>> you help me out here?
>>
>> Here params is the pointer to "struct MigrationParameters", which is
>> defined in qapi/migration.json.  And we have had "has_*" only because we
>> allow optional fields with asterisks:
>>
>>   { 'struct': 'MigrationParameters',
>>     'data': { '*announce-initial': 'size',
>>               ...
>>               } }
>>
>> So that's why it better only means "whether this field existed", because
>> it's how it is defined.
>>
>> IIRC we (or say, Markus) used to have some attempts deduplicates those
>> *MigrationParameter* things, and if success we have chance to drop has_*
>> fields (in which case we simply always have them; that "has_" makes more
>> sense only if in a QMP session to allow user only specify one or more
>> things if not all).
>
> qapi/migration.json:
>
>     ##
>     # @MigrationParameters:
>     #
> --> # The optional members aren't actually optional.
>     #
>
> In other words, the has_FOO generated for the members of
> MigrationParameters must all be true.
>
> These members became optional when we attempted to de-duplicate
> MigrationParameters and MigrateSetParameters, but failed (see commit
> de63ab61241 "migrate: Share common MigrationParameters struct" and
> commit 1bda8b3c695 "migration: Unshare MigrationParameters struct for
> now").

So what's the blocker for merging MigrationSetParameters and
MigrationParameters? Just the tls_* options?


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/9] migration: Add direct-io parameter
  2024-05-14 17:57           ` Fabiano Rosas
@ 2024-05-15  7:17             ` Markus Armbruster
  2024-05-15 12:51               ` Fabiano Rosas
  0 siblings, 1 reply; 57+ messages in thread
From: Markus Armbruster @ 2024-05-15  7:17 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: Peter Xu, qemu-devel, berrange, Claudio Fontana, Jim Fehlig, Eric Blake

Fabiano Rosas <farosas@suse.de> writes:

> Markus Armbruster <armbru@redhat.com> writes:
>
>> Peter Xu <peterx@redhat.com> writes:
>>
>>> On Fri, May 03, 2024 at 05:49:32PM -0300, Fabiano Rosas wrote:
>>>> Peter Xu <peterx@redhat.com> writes:
>>>> 
>>>> > On Fri, Apr 26, 2024 at 11:20:37AM -0300, Fabiano Rosas wrote:
>>>> >> Add the direct-io migration parameter that tells the migration code to
>>>> >> use O_DIRECT when opening the migration stream file whenever possible.
>>>> >> 
>>>> >> This is currently only used with the mapped-ram migration that has a
>>>> >> clear window guaranteed to perform aligned writes.
>>>> >> 
>>>> >> Acked-by: Markus Armbruster <armbru@redhat.com>
>>>> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>>>> >> ---
>>>> >>  include/qemu/osdep.h           |  2 ++
>>>> >>  migration/migration-hmp-cmds.c | 11 +++++++++++
>>>> >>  migration/options.c            | 30 ++++++++++++++++++++++++++++++
>>>> >>  migration/options.h            |  1 +
>>>> >>  qapi/migration.json            | 18 +++++++++++++++---
>>>> >>  util/osdep.c                   |  9 +++++++++
>>>> >>  6 files changed, 68 insertions(+), 3 deletions(-)
>>>> >> 
>>>> >> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
>>>> >> index c7053cdc2b..645c14a65d 100644
>>>> >> --- a/include/qemu/osdep.h
>>>> >> +++ b/include/qemu/osdep.h
>>>> >> @@ -612,6 +612,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive);
>>>> >>  bool qemu_has_ofd_lock(void);
>>>> >>  #endif
>>>> >>  
>>>> >> +bool qemu_has_direct_io(void);
>>>> >> +
>>>> >>  #if defined(__HAIKU__) && defined(__i386__)
>>>> >>  #define FMT_pid "%ld"
>>>> >>  #elif defined(WIN64)
>>>> >> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
>>>> >> index 7e96ae6ffd..8496a2b34e 100644
>>>> >> --- a/migration/migration-hmp-cmds.c
>>>> >> +++ b/migration/migration-hmp-cmds.c
>>>> >> @@ -397,6 +397,13 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
>>>> >>          monitor_printf(mon, "%s: %s\n",
>>>> >>              MigrationParameter_str(MIGRATION_PARAMETER_MODE),
>>>> >>              qapi_enum_lookup(&MigMode_lookup, params->mode));
>>>> >> +
>>>> >> +        if (params->has_direct_io) {
>>>> >> +            monitor_printf(mon, "%s: %s\n",
>>>> >> +                           MigrationParameter_str(
>>>> >> +                               MIGRATION_PARAMETER_DIRECT_IO),
>>>> >> +                           params->direct_io ? "on" : "off");
>>>> >> +        }
>>>> >
>>>> > This will be the first parameter to optionally display here.  I think it's
>>>> > a sign of misuse of has_direct_io field..
>>
>> Yes.  For similar existing parameters, we do
>>
>>                assert(params->has_FOO);
>>                monitor_printf(mon, "%s: '%s'\n",
>>                               MigrationParameter_str(MIGRATION_PARAMETER_FOO),
>>                               ... params->FOO as string ...)
>>
>>>> > IMHO has_direct_io should be best to be kept as "whether direct_io field is
>>>> > valid" and that's all of it.  It hopefully shouldn't contain more
>>>> > information than that, or otherwise it'll be another small challenge we
>>>> > need to overcome when we can remove all these has_* fields, and can also be
>>>> > easily overlooked.
>>>> 
>>>> I don't think I understand why we have those has_* fields. I thought my
>>>> usage of 'params->has_direct_io = qemu_has_direct_io()' was the correct
>>>> one, i.e. checking whether QEMU has any support for that parameter. Can
>>>> you help me out here?
>>>
>>> Here params is the pointer to "struct MigrationParameters", which is
>>> defined in qapi/migration.json.  And we have had "has_*" only because we
>>> allow optional fields with asterisks:
>>>
>>>   { 'struct': 'MigrationParameters',
>>>     'data': { '*announce-initial': 'size',
>>>               ...
>>>               } }
>>>
>>> So that's why it better only means "whether this field existed", because
>>> it's how it is defined.
>>>
>>> IIRC we (or say, Markus) used to have some attempts deduplicates those
>>> *MigrationParameter* things, and if success we have chance to drop has_*
>>> fields (in which case we simply always have them; that "has_" makes more
>>> sense only if in a QMP session to allow user only specify one or more
>>> things if not all).
>>
>> qapi/migration.json:
>>
>>     ##
>>     # @MigrationParameters:
>>     #
>> --> # The optional members aren't actually optional.
>>     #
>>
>> In other words, the has_FOO generated for the members of
>> MigrationParameters must all be true.
>>
>> These members became optional when we attempted to de-duplicate
>> MigrationParameters and MigrateSetParameters, but failed (see commit
>> de63ab61241 "migrate: Share common MigrationParameters struct" and
>> commit 1bda8b3c695 "migration: Unshare MigrationParameters struct for
>> now").
>
> So what's the blocker for merging MigrationSetParameters and
> MigrationParameters? Just the tls_* options?

I believe the latest attempt was Peter's "[PATCH v3 0/4] qapi/migration:
Dedup migration parameter objects and fix tls-authz crash" last
September.

I can't recall offhand what exactly blocked its merge.  Suggest you read
the review thread[*], and if that leads to questions, ask them either in
replies to the old thread, or right here.

One additional issue hasn't been discussed much so far, I think: merging
the two copies of the doc comments.  They are big and quite similar
(that's why we want to deduplicate), but they're not identical.  In
particular, MigrationSetParameters' doc comment talks more about
defaults.  That's because talking about defaults makes no sense
whatsoever for MigrationParameters (we do it anyway, of course, just
less).  Merging the two will require some careful word-smithing, and
perhaps some compromises on doc quality.



[*] https://lore.kernel.org/qemu-devel/20230905162335.235619-1-peterx@redhat.com/



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 4/9] migration: Add direct-io parameter
  2024-05-15  7:17             ` Markus Armbruster
@ 2024-05-15 12:51               ` Fabiano Rosas
  0 siblings, 0 replies; 57+ messages in thread
From: Fabiano Rosas @ 2024-05-15 12:51 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Peter Xu, qemu-devel, berrange, Claudio Fontana, Jim Fehlig, Eric Blake

Markus Armbruster <armbru@redhat.com> writes:

> Fabiano Rosas <farosas@suse.de> writes:
>
>> Markus Armbruster <armbru@redhat.com> writes:
>>
>>> Peter Xu <peterx@redhat.com> writes:
>>>
>>>> On Fri, May 03, 2024 at 05:49:32PM -0300, Fabiano Rosas wrote:
>>>>> Peter Xu <peterx@redhat.com> writes:
>>>>> 
>>>>> > On Fri, Apr 26, 2024 at 11:20:37AM -0300, Fabiano Rosas wrote:
>>>>> >> Add the direct-io migration parameter that tells the migration code to
>>>>> >> use O_DIRECT when opening the migration stream file whenever possible.
>>>>> >> 
>>>>> >> This is currently only used with the mapped-ram migration that has a
>>>>> >> clear window guaranteed to perform aligned writes.
>>>>> >> 
>>>>> >> Acked-by: Markus Armbruster <armbru@redhat.com>
>>>>> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>>>>> >> ---
>>>>> >>  include/qemu/osdep.h           |  2 ++
>>>>> >>  migration/migration-hmp-cmds.c | 11 +++++++++++
>>>>> >>  migration/options.c            | 30 ++++++++++++++++++++++++++++++
>>>>> >>  migration/options.h            |  1 +
>>>>> >>  qapi/migration.json            | 18 +++++++++++++++---
>>>>> >>  util/osdep.c                   |  9 +++++++++
>>>>> >>  6 files changed, 68 insertions(+), 3 deletions(-)
>>>>> >> 
>>>>> >> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
>>>>> >> index c7053cdc2b..645c14a65d 100644
>>>>> >> --- a/include/qemu/osdep.h
>>>>> >> +++ b/include/qemu/osdep.h
>>>>> >> @@ -612,6 +612,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive);
>>>>> >>  bool qemu_has_ofd_lock(void);
>>>>> >>  #endif
>>>>> >>  
>>>>> >> +bool qemu_has_direct_io(void);
>>>>> >> +
>>>>> >>  #if defined(__HAIKU__) && defined(__i386__)
>>>>> >>  #define FMT_pid "%ld"
>>>>> >>  #elif defined(WIN64)
>>>>> >> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
>>>>> >> index 7e96ae6ffd..8496a2b34e 100644
>>>>> >> --- a/migration/migration-hmp-cmds.c
>>>>> >> +++ b/migration/migration-hmp-cmds.c
>>>>> >> @@ -397,6 +397,13 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
>>>>> >>          monitor_printf(mon, "%s: %s\n",
>>>>> >>              MigrationParameter_str(MIGRATION_PARAMETER_MODE),
>>>>> >>              qapi_enum_lookup(&MigMode_lookup, params->mode));
>>>>> >> +
>>>>> >> +        if (params->has_direct_io) {
>>>>> >> +            monitor_printf(mon, "%s: %s\n",
>>>>> >> +                           MigrationParameter_str(
>>>>> >> +                               MIGRATION_PARAMETER_DIRECT_IO),
>>>>> >> +                           params->direct_io ? "on" : "off");
>>>>> >> +        }
>>>>> >
>>>>> > This will be the first parameter to optionally display here.  I think it's
>>>>> > a sign of misuse of has_direct_io field..
>>>
>>> Yes.  For similar existing parameters, we do
>>>
>>>                assert(params->has_FOO);
>>>                monitor_printf(mon, "%s: '%s'\n",
>>>                               MigrationParameter_str(MIGRATION_PARAMETER_FOO),
>>>                               ... params->FOO as string ...)
>>>
>>>>> > IMHO has_direct_io should be best to be kept as "whether direct_io field is
>>>>> > valid" and that's all of it.  It hopefully shouldn't contain more
>>>>> > information than that, or otherwise it'll be another small challenge we
>>>>> > need to overcome when we can remove all these has_* fields, and can also be
>>>>> > easily overlooked.
>>>>> 
>>>>> I don't think I understand why we have those has_* fields. I thought my
>>>>> usage of 'params->has_direct_io = qemu_has_direct_io()' was the correct
>>>>> one, i.e. checking whether QEMU has any support for that parameter. Can
>>>>> you help me out here?
>>>>
>>>> Here params is the pointer to "struct MigrationParameters", which is
>>>> defined in qapi/migration.json.  And we have had "has_*" only because we
>>>> allow optional fields with asterisks:
>>>>
>>>>   { 'struct': 'MigrationParameters',
>>>>     'data': { '*announce-initial': 'size',
>>>>               ...
>>>>               } }
>>>>
>>>> So that's why it better only means "whether this field existed", because
>>>> it's how it is defined.
>>>>
>>>> IIRC we (or say, Markus) used to have some attempts deduplicates those
>>>> *MigrationParameter* things, and if success we have chance to drop has_*
>>>> fields (in which case we simply always have them; that "has_" makes more
>>>> sense only if in a QMP session to allow user only specify one or more
>>>> things if not all).
>>>
>>> qapi/migration.json:
>>>
>>>     ##
>>>     # @MigrationParameters:
>>>     #
>>> --> # The optional members aren't actually optional.
>>>     #
>>>
>>> In other words, the has_FOO generated for the members of
>>> MigrationParameters must all be true.
>>>
>>> These members became optional when we attempted to de-duplicate
>>> MigrationParameters and MigrateSetParameters, but failed (see commit
>>> de63ab61241 "migrate: Share common MigrationParameters struct" and
>>> commit 1bda8b3c695 "migration: Unshare MigrationParameters struct for
>>> now").
>>
>> So what's the blocker for merging MigrationSetParameters and
>> MigrationParameters? Just the tls_* options?
>
> I believe the latest attempt was Peter's "[PATCH v3 0/4] qapi/migration:
> Dedup migration parameter objects and fix tls-authz crash" last
> September.

I had a vague memory of this, thanks for bringing it up. I'll go over
that series and see what can be done.

>
> I can't recall offhand what exactly blocked its merge.  Suggest you read
> the review thread[*], and if that leads to questions, ask them either in
> replies to the old thread, or right here.
>
> One additional issue hasn't been discussed much so far, I think: merging
> the two copies of the doc comments.  They are big and quite similar
> (that's why we want to deduplicate), but they're not identical.  In
> particular, MigrationSetParameters' doc comment talks more about
> defaults.  That's because talking about defaults makes no sense
> whatsoever for MigrationParameters (we do it anyway, of course, just
> less).  Merging the two will require some careful word-smithing, and
> perhaps some compromises on doc quality.
>

I was playing with this yesterday and it occurred to me as well that the
docs are not quite the same. Maybe we can have a general description
common for both use-cases and a few extra words for what happens
differently when writing vs. reading.

>
>
> [*] https://lore.kernel.org/qemu-devel/20230905162335.235619-1-peterx@redhat.com/


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/9] monitor: Honor QMP request for fd removal immediately
  2024-05-03 16:02   ` Peter Xu
@ 2024-05-16 21:46     ` Fabiano Rosas
  0 siblings, 0 replies; 57+ messages in thread
From: Fabiano Rosas @ 2024-05-16 21:46 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, berrange, armbru, Claudio Fontana, Jim Fehlig,
	Corey Bryant, Eric Blake, Kevin Wolf

Hi all, sorry to have been away from this thread for a while, I was
trying to catch up on my reviews queue.

Peter Xu <peterx@redhat.com> writes:

> On Fri, Apr 26, 2024 at 11:20:34AM -0300, Fabiano Rosas wrote:
>> We're enabling using the fdset interface to pass file descriptors for
>> use in the migration code. Since migrations can happen more than once
>> during the VMs lifetime, we need a way to remove an fd from the fdset
>> at the end of migration.
>> 
>> The current code only removes an fd from the fdset if the VM is
>> running. This causes a QMP call to "remove-fd" to not actually remove
>> the fd if the VM happens to be stopped.
>> 
>> While the fd would eventually be removed when monitor_fdset_cleanup()
>> is called again, the user request should be honored and the fd
>> actually removed. Calling remove-fd + query-fdset shows a recently
>> removed fd still present.
>> 
>> The runstate_is_running() check was introduced by commit ebe52b592d
>> ("monitor: Prevent removing fd from set during init"), which by the
>> shortlog indicates that they were trying to avoid removing an
>> yet-unduplicated fd too early.
>> 
>> I don't see why an fd explicitly removed with qmp_remove_fd() should
>> be under runstate_is_running(). I'm assuming this was a mistake when
>> adding the parenthesis around the expression.
>> 
>> Move the runstate_is_running() check to apply only to the
>> QLIST_EMPTY(dup_fds) side of the expression and ignore it when
>> mon_fdset_fd->removed has been explicitly set.
>
> I am confused on why the fdset removal is as complicated.  I'm also
> wondering here whether it's dropped because we checked against
> "mon_refcount == 0", and maybe monitor_fdset_cleanup() is simply called
> _before_ a monitor is created?  Why do we need such check on the first
> place?
>

It seems the intention was to reuse monitor_fdset_cleanup() to do
cleanup when all monitors connections are closed:

efb87c1697 ("monitor: Clean up fd sets on monitor disconnect")
Author: Corey Bryant <coreyb@linux.vnet.ibm.com>
Date:   Tue Aug 14 16:43:48 2012 -0400

    monitor: Clean up fd sets on monitor disconnect
    
    Fd sets are shared by all monitor connections.  Fd sets are considered
    to be in use while at least one monitor is connected.  When the last
    monitor disconnects, all fds that are members of an fd set with no
    outstanding dup references are closed.  This prevents any fd leakage
    associated with a client disconnect prior to using a passed fd.
    
    Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
    Signed-off-by: Kevin Wolf <kwolf@redhat.com>

This could have been done differently at monitor_qmp_event():

    case CHR_EVENT_CLOSED:
        ...
        mon_refcount--;
        if (mon_refcount == 0) {
            monitor_fdsets_cleanup();
        }

But maybe there was a concern about making sure the empty fdsets (last
block in monitor_fdset_cleanup) were removed at every refcount decrement
and not only when mon_refcount == 0 for some reason.

> I'm thinking one case where the only QMP monitor got (for some reason)
> disconnected, and reconnected again during VM running.  Won't current code
> already lead to unwanted removal of mostly all fds due to mon_refcount==0?

I think that's the case that the patch in question was trying to
avoid. If the last monitor connects and disconnects while fds have not
been dup'ed yet, the mon_fdset->dup_fds list will be empty and what you
say will happen. There seems to be an assumption that after the guest
starts running all fds that need to be dup'ed will already have been
dup'ed.

So I think we cannot simply revert the patch as Daniel suggested,
because that might regress the original block layer use-case if a
monitor open->close causes the removal of all the yet undup'ed fds[1].

For the migration use-case, the dup() only happens after the migrate
command has been issued, so the runstate_is_running() check doesn't help
us. But it also doesn't hurt. However, we're still exposed to a monitor
disconnection removing the undup'ed fds. So AFAICS, we either stop
calling monitor_fdset_cleanup() at monitor close or declare that this
issue is unlikely to occur (because it is) and leave a comment in the
code.

===========
1- I ran a quick test:

connect()         // monitor opened: refcnt: 1

{"execute": "add-fd", "arguments": {"fdset-id": 1}}
{"return": {"fd": 9, "fdset-id": 1}}

{"execute": "add-fd", "arguments": {"fdset-id": 1}}
{"return": {"fd": 21, "fdset-id": 1}}

close()           // monitor closed: refcnt: 0

connect()         // monitor opened: refcnt: 1

{"execute": "migrate", "arguments": {"uri": "file:/dev/fdset/1,offset=4096"}}
{
    "error": {
        "class": "GenericError",
        "desc": "Outgoing migration needs two fds in the fdset, got 0"
    }
}

>
> I also am confused why ->removed flags is ever needed, and why we can't
> already remove the fdsets fds if found matching.
>

Prior to commit efb87c1697 ("monitor: Clean up fd sets on monitor
disconnect") we only called monitor_fdset_cleanup() from
qmp_remove_fd(), so we effectively always removed immediately after
setting ->removed = true. I don't see a reason to have the flag either.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/9] monitor: Honor QMP request for fd removal immediately
  2024-05-08  7:17   ` Daniel P. Berrangé
@ 2024-05-16 22:00     ` Fabiano Rosas
  2024-05-17  7:33       ` Daniel P. Berrangé
  0 siblings, 1 reply; 57+ messages in thread
From: Fabiano Rosas @ 2024-05-16 22:00 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, armbru, Peter Xu, Claudio Fontana, Jim Fehlig

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Fri, Apr 26, 2024 at 11:20:34AM -0300, Fabiano Rosas wrote:
>> We're enabling using the fdset interface to pass file descriptors for
>> use in the migration code. Since migrations can happen more than once
>> during the VMs lifetime, we need a way to remove an fd from the fdset
>> at the end of migration.
>> 
>> The current code only removes an fd from the fdset if the VM is
>> running. This causes a QMP call to "remove-fd" to not actually remove
>> the fd if the VM happens to be stopped.
>> 
>> While the fd would eventually be removed when monitor_fdset_cleanup()
>> is called again, the user request should be honored and the fd
>> actually removed. Calling remove-fd + query-fdset shows a recently
>> removed fd still present.
>> 
>> The runstate_is_running() check was introduced by commit ebe52b592d
>> ("monitor: Prevent removing fd from set during init"), which by the
>> shortlog indicates that they were trying to avoid removing an
>> yet-unduplicated fd too early.
>
> IMHO that should be reverted. The justification says
>
>   "If an fd is added to an fd set via the command line, and it is not
>    referenced by another command line option (ie. -drive), then clean
>    it up after QEMU initialization is complete"
>
> which I think is pretty weak. Why should QEMU forceably stop an app
> from passing in an FD to be used by a QMP command issued just after
> the VM starts running ?  While it could just use QMP to pass in the
> FD set, the mgmt app might have its own reason for wanting QEMU to
> own the passed FD from the very start of the process execve().

I don't think that's what that patch does. That description is
misleading. I read it as:

   "If an fd is added to an fd set via the command line, and it is not
    referenced by another command line option (ie. -drive), then clean
    it up ONLY after QEMU initialization is complete"
          ^

By the subject ("monitor: Prevent removing fd from set during init") and
the fact that this function is only called when the monitor connection
closes, I believe the idea was to *save* the fds until after the VM
starts running, i.e. some fd was being lost because
monitor_fdset_cleanup() was being called before the dup().

See my reply to Peter in this same patch (PATCH 1/9).

>
> Implicitly this cleanup is attempting to "fix" a bug where the mgmt
> app passes in an FD that it never needed. If any such bug were ever
> found, then the mgmt app should just be fixed to not pass it in. I
> don't think QEMU needs to be trying to fix mgmt app bugs.
>
> IOW, this commit is imposing an arbitrary & unecessary usage policy
> on passed in FD sets, and as your commit explains has further
> unhelpful (& undocumented) side effects on the 'remove-fd' QMP command.
>
> Just revert it IMHO.
>
>> 
>> I don't see why an fd explicitly removed with qmp_remove_fd() should
>> be under runstate_is_running(). I'm assuming this was a mistake when
>> adding the parenthesis around the expression.
>> 
>> Move the runstate_is_running() check to apply only to the
>> QLIST_EMPTY(dup_fds) side of the expression and ignore it when
>> mon_fdset_fd->removed has been explicitly set.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  monitor/fds.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>> 
>> diff --git a/monitor/fds.c b/monitor/fds.c
>> index d86c2c674c..4ec3b7eea9 100644
>> --- a/monitor/fds.c
>> +++ b/monitor/fds.c
>> @@ -173,9 +173,9 @@ static void monitor_fdset_cleanup(MonFdset *mon_fdset)
>>      MonFdsetFd *mon_fdset_fd_next;
>>  
>>      QLIST_FOREACH_SAFE(mon_fdset_fd, &mon_fdset->fds, next, mon_fdset_fd_next) {
>> -        if ((mon_fdset_fd->removed ||
>> -                (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0)) &&
>> -                runstate_is_running()) {
>> +        if (mon_fdset_fd->removed ||
>> +            (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0 &&
>> +             runstate_is_running())) {
>>              close(mon_fdset_fd->fd);
>>              g_free(mon_fdset_fd->opaque);
>>              QLIST_REMOVE(mon_fdset_fd, next);
>> -- 
>> 2.35.3
>> 
>
> With regards,
> Daniel


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/9] monitor: Honor QMP request for fd removal immediately
  2024-05-16 22:00     ` Fabiano Rosas
@ 2024-05-17  7:33       ` Daniel P. Berrangé
  0 siblings, 0 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2024-05-17  7:33 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, armbru, Peter Xu, Claudio Fontana, Jim Fehlig

On Thu, May 16, 2024 at 07:00:11PM -0300, Fabiano Rosas wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Fri, Apr 26, 2024 at 11:20:34AM -0300, Fabiano Rosas wrote:
> >> We're enabling using the fdset interface to pass file descriptors for
> >> use in the migration code. Since migrations can happen more than once
> >> during the VMs lifetime, we need a way to remove an fd from the fdset
> >> at the end of migration.
> >> 
> >> The current code only removes an fd from the fdset if the VM is
> >> running. This causes a QMP call to "remove-fd" to not actually remove
> >> the fd if the VM happens to be stopped.
> >> 
> >> While the fd would eventually be removed when monitor_fdset_cleanup()
> >> is called again, the user request should be honored and the fd
> >> actually removed. Calling remove-fd + query-fdset shows a recently
> >> removed fd still present.
> >> 
> >> The runstate_is_running() check was introduced by commit ebe52b592d
> >> ("monitor: Prevent removing fd from set during init"), which by the
> >> shortlog indicates that they were trying to avoid removing an
> >> yet-unduplicated fd too early.
> >
> > IMHO that should be reverted. The justification says
> >
> >   "If an fd is added to an fd set via the command line, and it is not
> >    referenced by another command line option (ie. -drive), then clean
> >    it up after QEMU initialization is complete"
> >
> > which I think is pretty weak. Why should QEMU forceably stop an app
> > from passing in an FD to be used by a QMP command issued just after
> > the VM starts running ?  While it could just use QMP to pass in the
> > FD set, the mgmt app might have its own reason for wanting QEMU to
> > own the passed FD from the very start of the process execve().
> 
> I don't think that's what that patch does. That description is
> misleading. I read it as:
> 
>    "If an fd is added to an fd set via the command line, and it is not
>     referenced by another command line option (ie. -drive), then clean
>     it up ONLY after QEMU initialization is complete"
>           ^
> 
> By the subject ("monitor: Prevent removing fd from set during init") and
> the fact that this function is only called when the monitor connection
> closes, I believe the idea was to *save* the fds until after the VM
> starts running, i.e. some fd was being lost because
> monitor_fdset_cleanup() was being called before the dup().

I know that, but I'm saying QEMU should not be doing *any* generic cleanup
of passed in FDs at any point. 

A passed in FD should be taken by whatever part of the QEMU configuration
is told to use it when needed, and this takes responsibility for closing
it. If nothing is told to use the fdset /yet/, then it should stay in the
fdset untouched for later use.

If an application accidentally passes in a FD that it doesn't reference
in any configuration, that's simply a application bug to fix. QEMU does
not need to secondguess the app's intent and decide to arbitrarily close
it.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 8/9] migration: Add support for fdset with multifd + file
  2024-05-09  8:08         ` Daniel P. Berrangé
@ 2024-05-17 22:43           ` Fabiano Rosas
  2024-05-18  8:36             ` Daniel P. Berrangé
  0 siblings, 1 reply; 57+ messages in thread
From: Fabiano Rosas @ 2024-05-17 22:43 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Peter Xu, qemu-devel, armbru, Claudio Fontana, Jim Fehlig

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Wed, May 08, 2024 at 05:39:53PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Wed, May 08, 2024 at 09:53:48AM +0100, Daniel P. Berrangé wrote:
>> >> On Fri, Apr 26, 2024 at 11:20:41AM -0300, Fabiano Rosas wrote:
>> >> > Allow multifd to use an fdset when migrating to a file. This is useful
>> >> > for the scenario where the management layer wants to have control over
>> >> > the migration file.
>> >> > 
>> >> > By receiving the file descriptors directly, QEMU can delegate some
>> >> > high level operating system operations to the management layer (such
>> >> > as mandatory access control). The management layer might also want to
>> >> > add its own headers before the migration stream.
>> >> > 
>> >> > Enable the "file:/dev/fdset/#" syntax for the multifd migration with
>> >> > mapped-ram. The requirements for the fdset mechanism are:
>> >> > 
>> >> > On the migration source side:
>> >> > 
>> >> > - the fdset must contain two fds that are not duplicates between
>> >> >   themselves;
>> >> > - if direct-io is to be used, exactly one of the fds must have the
>> >> >   O_DIRECT flag set;
>> >> > - the file must be opened with WRONLY both times.
>> >> > 
>> >> > On the migration destination side:
>> >> > 
>> >> > - the fdset must contain one fd;
>> >> > - the file must be opened with RDONLY.
>> >> > 
>> >> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> >> > ---
>> >> >  docs/devel/migration/main.rst       | 18 ++++++++++++++
>> >> >  docs/devel/migration/mapped-ram.rst |  6 ++++-
>> >> >  migration/file.c                    | 38 ++++++++++++++++++++++++++++-
>> >> >  3 files changed, 60 insertions(+), 2 deletions(-)
>> >> > 
>> >> > diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
>> >> > index 54385a23e5..50f6096470 100644
>> >> > --- a/docs/devel/migration/main.rst
>> >> > +++ b/docs/devel/migration/main.rst
>> >> > @@ -47,6 +47,24 @@ over any transport.
>> >> >    QEMU interference. Note that QEMU does not flush cached file
>> >> >    data/metadata at the end of migration.
>> >> >  
>> >> > +  The file migration also supports using a file that has already been
>> >> > +  opened. A set of file descriptors is passed to QEMU via an "fdset"
>> >> > +  (see add-fd QMP command documentation). This method allows a
>> >> > +  management application to have control over the migration file
>> >> > +  opening operation. There are, however, strict requirements to this
>> >> > +  interface:
>> >> > +
>> >> > +  On the migration source side:
>> >> > +    - if the multifd capability is to be used, the fdset must contain
>> >> > +      two file descriptors that are not duplicates between themselves;
>> >> > +    - if the direct-io capability is to be used, exactly one of the
>> >> > +      file descriptors must have the O_DIRECT flag set;
>> >> > +    - the file must be opened with WRONLY.
>> >> > +
>> >> > +  On the migration destination side:
>> >> > +    - the fdset must contain one file descriptor;
>> >> > +    - the file must be opened with RDONLY.
>> >> > +
>> >> >  In addition, support is included for migration using RDMA, which
>> >> >  transports the page data using ``RDMA``, where the hardware takes care of
>> >> >  transporting the pages, and the load on the CPU is much lower.  While the
>> >> > diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst
>> >> > index fa4cefd9fc..e6505511f0 100644
>> >> > --- a/docs/devel/migration/mapped-ram.rst
>> >> > +++ b/docs/devel/migration/mapped-ram.rst
>> >> > @@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a
>> >> >  sequential stream. Having the pages at fixed offsets also allows the
>> >> >  usage of O_DIRECT for save/restore of the migration stream as the
>> >> >  pages are ensured to be written respecting O_DIRECT alignment
>> >> > -restrictions (direct-io support not yet implemented).
>> >> > +restrictions.
>> >> >  
>> >> >  Usage
>> >> >  -----
>> >> > @@ -35,6 +35,10 @@ Use a ``file:`` URL for migration:
>> >> >  Mapped-ram migration is best done non-live, i.e. by stopping the VM on
>> >> >  the source side before migrating.
>> >> >  
>> >> > +For best performance enable the ``direct-io`` capability as well:
>> >> > +
>> >> > +    ``migrate_set_capability direct-io on``
>> >> > +
>> >> >  Use-cases
>> >> >  ---------
>> >> >  
>> >> > diff --git a/migration/file.c b/migration/file.c
>> >> > index b9265b14dd..3bc8bc7463 100644
>> >> > --- a/migration/file.c
>> >> > +++ b/migration/file.c
>> >> > @@ -17,6 +17,7 @@
>> >> >  #include "io/channel-file.h"
>> >> >  #include "io/channel-socket.h"
>> >> >  #include "io/channel-util.h"
>> >> > +#include "monitor/monitor.h"
>> >> >  #include "options.h"
>> >> >  #include "trace.h"
>> >> >  
>> >> > @@ -54,10 +55,18 @@ static void file_remove_fdset(void)
>> >> >      }
>> >> >  }
>> >> >  
>> >> > +/*
>> >> > + * With multifd, due to the behavior of the dup() system call, we need
>> >> > + * the fdset to have two non-duplicate fds so we can enable direct IO
>> >> > + * in the secondary channels without affecting the main channel.
>> >> > + */
>> >> >  static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
>> >> >                               Error **errp)
>> >> >  {
>> >> > +    FdsetInfoList *fds_info;
>> >> > +    FdsetFdInfoList *fd_info;
>> >> >      const char *fdset_id_str;
>> >> > +    int nfds = 0;
>> >> >  
>> >> >      *fdset_id = -1;
>> >> >  
>> >> > @@ -71,6 +80,32 @@ static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
>> >> >          return false;
>> >> >      }
>> >> >  
>> >> > +    if (!migrate_multifd() || !migrate_direct_io()) {
>> >> > +        return true;
>> >> > +    }
>> >> > +
>> >> > +    for (fds_info = qmp_query_fdsets(NULL); fds_info;
>> >> > +         fds_info = fds_info->next) {
>> >> > +
>> >> > +        if (*fdset_id != fds_info->value->fdset_id) {
>> >> > +            continue;
>> >> > +        }
>> >> > +
>> >> > +        for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) {
>> >> > +            if (nfds++ > 2) {
>> >> > +                break;
>> >> > +            }
>> >> > +        }
>> >> > +    }
>> >> > +
>> >> > +    if (nfds != 2) {
>> >> > +        error_setg(errp, "Outgoing migration needs two fds in the fdset, "
>> >> > +                   "got %d", nfds);
>> >> > +        qmp_remove_fd(*fdset_id, false, -1, NULL);
>> >> > +        *fdset_id = -1;
>> >> > +        return false;
>> >> > +    }
>> >> > +
>> >> >      return true;
>> >> >  }
>> >> 
>> >> Related to my thoughts in an earlier patch, where I say that use of fdsets
>> >> ought to be transparent to QEMU code, I'm not a fan of having this logic
>> >> in migration code.
>> >> 
>> >> IIUC, the migration code will call  qio_channel_file_new_path twice,
>> >> once with O_DIRECT and once without. This should trigger two calls
>> >> into monitor_fdset_dup_fd_add with different flags. If we're matching
>> >> flags in that monitor_fdset_dup_fd_add(), then if only 1 FD was
>> >> provided, are we not able to report an error there ?
>> >
>> > Right, this sounds working.
>> 
>> It works, but due to how low-level fdset is, it's difficult to match the
>> low level error to anything meaningful we can report to the user. I'll
>> have to add an errp to monitor_fdset_dup_fd_add(). Its returns are not
>> very useful.
>> 
>> -1 with no errno
>> -1 with EACCES (should actually be EBADF)
>> -1 with ENOENT
>> 
>> There has been some discusstion around this before actually:
>> 
>> https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg02544.html
>
> The only caller of monitor_fdset_dup_fd_add is qemu_open_internal
> and that has a "Error **errp" parameter.  We should rewrite
> monitor_fdset_dup_fd_add to also have an "Error **errp" at which
> point we can actually report useful, actionable error messages
> from it. Errnos be gone !

I can do that, but qemu_open_old() does not pass the error in. Please
see if this works for you:

-->8--
From 16e333cc5aeca1fab3f75f79048c0ab0d62d5b08 Mon Sep 17 00:00:00 2001
From: Fabiano Rosas <farosas@suse.de>
Date: Fri, 17 May 2024 19:30:39 -0300
Subject: [PATCH] io: Stop using qemu_open_old in channel-file

We want to make use of the Error object to report fdset errors from
qemu_open_internal() and passing the error pointer to qemu_open_old()
would require changing all callers. Move the file channel to the new
API instead.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 io/channel-file.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/io/channel-file.c b/io/channel-file.c
index 6436cfb6ae..2ea8d08360 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -68,11 +68,13 @@ qio_channel_file_new_path(const char *path,
 
     ioc = QIO_CHANNEL_FILE(object_new(TYPE_QIO_CHANNEL_FILE));
 
-    ioc->fd = qemu_open_old(path, flags, mode);
+    if (flags & O_CREAT) {
+        ioc->fd = qemu_create(path, flags & ~O_CREAT, mode, errp);
+    } else {
+        ioc->fd = qemu_open(path, flags, errp);
+    }
     if (ioc->fd < 0) {
         object_unref(OBJECT(ioc));
-        error_setg_errno(errp, errno,
-                         "Unable to open %s", path);
         return NULL;
     }
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 8/9] migration: Add support for fdset with multifd + file
  2024-05-17 22:43           ` Fabiano Rosas
@ 2024-05-18  8:36             ` Daniel P. Berrangé
  0 siblings, 0 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2024-05-18  8:36 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: Peter Xu, qemu-devel, armbru, Claudio Fontana, Jim Fehlig

On Fri, May 17, 2024 at 07:43:35PM -0300, Fabiano Rosas wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> >
> > The only caller of monitor_fdset_dup_fd_add is qemu_open_internal
> > and that has a "Error **errp" parameter.  We should rewrite
> > monitor_fdset_dup_fd_add to also have an "Error **errp" at which
> > point we can actually report useful, actionable error messages
> > from it. Errnos be gone !
> 
> I can do that, but qemu_open_old() does not pass the error in. Please
> see if this works for you:
> 
> -->8--
> From 16e333cc5aeca1fab3f75f79048c0ab0d62d5b08 Mon Sep 17 00:00:00 2001
> From: Fabiano Rosas <farosas@suse.de>
> Date: Fri, 17 May 2024 19:30:39 -0300
> Subject: [PATCH] io: Stop using qemu_open_old in channel-file
> 
> We want to make use of the Error object to report fdset errors from
> qemu_open_internal() and passing the error pointer to qemu_open_old()
> would require changing all callers. Move the file channel to the new
> API instead.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  io/channel-file.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/io/channel-file.c b/io/channel-file.c
> index 6436cfb6ae..2ea8d08360 100644
> --- a/io/channel-file.c
> +++ b/io/channel-file.c
> @@ -68,11 +68,13 @@ qio_channel_file_new_path(const char *path,
>  
>      ioc = QIO_CHANNEL_FILE(object_new(TYPE_QIO_CHANNEL_FILE));
>  
> -    ioc->fd = qemu_open_old(path, flags, mode);
> +    if (flags & O_CREAT) {
> +        ioc->fd = qemu_create(path, flags & ~O_CREAT, mode, errp);
> +    } else {
> +        ioc->fd = qemu_open(path, flags, errp);
> +    }
>      if (ioc->fd < 0) {
>          object_unref(OBJECT(ioc));
> -        error_setg_errno(errp, errno,
> -                         "Unable to open %s", path);
>          return NULL;
>      }

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2024-05-18  8:37 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-26 14:20 [PATCH 0/9] migration/mapped-ram: Add direct-io support Fabiano Rosas
2024-04-26 14:20 ` [PATCH 1/9] monitor: Honor QMP request for fd removal immediately Fabiano Rosas
2024-05-03 16:02   ` Peter Xu
2024-05-16 21:46     ` Fabiano Rosas
2024-05-08  7:17   ` Daniel P. Berrangé
2024-05-16 22:00     ` Fabiano Rosas
2024-05-17  7:33       ` Daniel P. Berrangé
2024-04-26 14:20 ` [PATCH 2/9] migration: Fix file migration with fdset Fabiano Rosas
2024-05-03 16:23   ` Peter Xu
2024-05-03 19:56     ` Fabiano Rosas
2024-05-03 21:04       ` Peter Xu
2024-05-03 21:31         ` Fabiano Rosas
2024-05-03 21:56           ` Peter Xu
2024-05-08  8:02     ` Daniel P. Berrangé
2024-05-08 12:49       ` Peter Xu
2024-05-08  8:00   ` Daniel P. Berrangé
2024-05-08 20:45     ` Fabiano Rosas
2024-04-26 14:20 ` [PATCH 3/9] tests/qtest/migration: Fix file migration offset check Fabiano Rosas
2024-05-03 16:47   ` Peter Xu
2024-05-03 20:36     ` Fabiano Rosas
2024-05-03 21:08       ` Peter Xu
2024-05-08  8:10       ` Daniel P. Berrangé
2024-04-26 14:20 ` [PATCH 4/9] migration: Add direct-io parameter Fabiano Rosas
2024-04-26 14:33   ` Markus Armbruster
2024-05-03 18:05   ` Peter Xu
2024-05-03 20:49     ` Fabiano Rosas
2024-05-03 21:16       ` Peter Xu
2024-05-14 14:10         ` Markus Armbruster
2024-05-14 17:57           ` Fabiano Rosas
2024-05-15  7:17             ` Markus Armbruster
2024-05-15 12:51               ` Fabiano Rosas
2024-05-08  8:25   ` Daniel P. Berrangé
2024-04-26 14:20 ` [PATCH 5/9] migration/multifd: Add direct-io support Fabiano Rosas
2024-05-03 18:29   ` Peter Xu
2024-05-03 20:54     ` Fabiano Rosas
2024-05-03 21:18       ` Peter Xu
2024-05-08  8:27   ` Daniel P. Berrangé
2024-04-26 14:20 ` [PATCH 6/9] tests/qtest/migration: Add tests for file migration with direct-io Fabiano Rosas
2024-05-03 18:38   ` Peter Xu
2024-05-03 21:05     ` Fabiano Rosas
2024-05-03 21:25       ` Peter Xu
2024-05-08  8:34   ` Daniel P. Berrangé
2024-04-26 14:20 ` [PATCH 7/9] monitor: fdset: Match against O_DIRECT Fabiano Rosas
2024-05-03 18:53   ` Peter Xu
2024-05-03 21:19     ` Fabiano Rosas
2024-05-03 22:16       ` Peter Xu
2024-04-26 14:20 ` [PATCH 8/9] migration: Add support for fdset with multifd + file Fabiano Rosas
2024-05-08  8:53   ` Daniel P. Berrangé
2024-05-08 18:23     ` Peter Xu
2024-05-08 20:39       ` Fabiano Rosas
2024-05-09  8:08         ` Daniel P. Berrangé
2024-05-17 22:43           ` Fabiano Rosas
2024-05-18  8:36             ` Daniel P. Berrangé
2024-04-26 14:20 ` [PATCH 9/9] tests/qtest/migration: Add a test for mapped-ram with passing of fds Fabiano Rosas
2024-05-08  8:56   ` Daniel P. Berrangé
2024-05-02 20:01 ` [PATCH 0/9] migration/mapped-ram: Add direct-io support Peter Xu
2024-05-02 20:34   ` Fabiano Rosas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).