All of lore.kernel.org
 help / color / mirror / Atom feed
* [PULL 0/7] Migration.next patches
@ 2021-10-19  9:29 Juan Quintela
  2021-10-19  9:29 ` [PULL 1/7] multifd: Implement yank for multifd send side Juan Quintela
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Juan Quintela @ 2021-10-19  9:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr. David Alan Gilbert, Juan Quintela

The following changes since commit 362534a643b4a34bcb223996538ce9de5cdab946:

  Merge remote-tracking branch 'remotes/bsdimp/tags/pull-bsd-user-20211018-pull-request' into staging (2021-10-18 12:17:24 -0700)

are available in the Git repository at:

  https://github.com/juanquintela/qemu.git tags/migration.next-pull-request

for you to fetch changes up to 911965ace9386e35ca022a65bb45a32fd421af3e:

  migration/rdma: advise prefetch write for ODP region (2021-10-19 08:39:04 +0200)

----------------------------------------------------------------
Migration Pull request (3rd try)

Hi

This should fix all the freebsd problems.

Please apply,

----------------------------------------------------------------

David Hildenbrand (1):
  migration/ram: Don't passs RAMState to
    migration_clear_memory_region_dirty_bitmap_*()

Li Zhijian (4):
  migration: allow multifd for socket protocol only
  migration: allow enabling mutilfd for specific protocol only
  migration/rdma: Try to register On-Demand Paging memory region
  migration/rdma: advise prefetch write for ODP region

Lukas Straub (2):
  multifd: Implement yank for multifd send side
  multifd: Unconditionally unregister yank function

 meson.build            |   6 +++
 migration/multifd.h    |   4 ++
 migration/migration.c  |  12 +++++
 migration/multifd.c    |  35 ++++++++++---
 migration/ram.c        |  13 ++---
 migration/rdma.c       | 113 ++++++++++++++++++++++++++++++++++-------
 migration/trace-events |   2 +
 7 files changed, 151 insertions(+), 34 deletions(-)

-- 
2.31.1




^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PULL 1/7] multifd: Implement yank for multifd send side
  2021-10-19  9:29 [PULL 0/7] Migration.next patches Juan Quintela
@ 2021-10-19  9:29 ` Juan Quintela
  2021-10-19  9:29 ` [PULL 2/7] multifd: Unconditionally unregister yank function Juan Quintela
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Juan Quintela @ 2021-10-19  9:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Lukas Straub, Dr. David Alan Gilbert, Leonardo Bras, Juan Quintela

From: Lukas Straub <lukasstraub2@web.de>

To: qemu-devel <qemu-devel@nongnu.org>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela
 <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Leonardo Bras Soares
 Passos <lsoaresp@redhat.com>
Date: Wed, 1 Sep 2021 17:58:57 +0200 (1 week, 15 hours, 17 minutes ago)

[[PGP Signed Part:No public key for 35AB0B289C5DB258 created at 2021-09-01T17:58:57+0200 using RSA]]
When introducing yank functionality in the migration code I forgot
to cover the multifd send side.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/multifd.h | 2 ++
 migration/multifd.c | 6 +++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 8d6751f5ed..16c4d112d1 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -85,6 +85,8 @@ typedef struct {
     bool running;
     /* should this thread finish */
     bool quit;
+    /* is the yank function registered */
+    bool registered_yank;
     /* thread has work to do */
     int pending_job;
     /* array of pages to sent */
diff --git a/migration/multifd.c b/migration/multifd.c
index 377da78f5b..5a4f158f3c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -546,6 +546,9 @@ void multifd_save_cleanup(void)
         MultiFDSendParams *p = &multifd_send_state->params[i];
         Error *local_err = NULL;
 
+        if (p->registered_yank) {
+            migration_ioc_unregister_yank(p->c);
+        }
         socket_send_channel_destroy(p->c);
         p->c = NULL;
         qemu_mutex_destroy(&p->mutex);
@@ -813,7 +816,8 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
                 return false;
             }
         } else {
-            /* update for tls qio channel */
+            migration_ioc_register_yank(ioc);
+            p->registered_yank = true;
             p->c = ioc;
             qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
                                    QEMU_THREAD_JOINABLE);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 2/7] multifd: Unconditionally unregister yank function
  2021-10-19  9:29 [PULL 0/7] Migration.next patches Juan Quintela
  2021-10-19  9:29 ` [PULL 1/7] multifd: Implement yank for multifd send side Juan Quintela
@ 2021-10-19  9:29 ` Juan Quintela
  2021-10-19  9:29 ` [PULL 3/7] migration/ram: Don't passs RAMState to migration_clear_memory_region_dirty_bitmap_*() Juan Quintela
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Juan Quintela @ 2021-10-19  9:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Lukas Straub, Dr. David Alan Gilbert, Juan Quintela

From: Lukas Straub <lukasstraub2@web.de>

To: qemu-devel <qemu-devel@nongnu.org>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela
 <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Leonardo Bras Soares
 Passos <lsoaresp@redhat.com>
Date: Wed, 4 Aug 2021 21:26:32 +0200 (5 weeks, 11 hours, 52 minutes ago)

[[PGP Signed Part:No public key for 35AB0B289C5DB258 created at 2021-08-04T21:26:32+0200 using RSA]]
Unconditionally unregister yank function in multifd_load_cleanup().
If it is not unregistered here, it will leak and cause a crash
in yank_unregister_instance(). Now if the ioc is still in use
afterwards, it will only lead to qemu not being able to recover
from a hang related to that ioc.

After checking the code, i am pretty sure that ref is always 1
when arriving here. So all this currently does is remove the
unneeded check.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/multifd.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 5a4f158f3c..efd424bc97 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -991,10 +991,7 @@ int multifd_load_cleanup(Error **errp)
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDRecvParams *p = &multifd_recv_state->params[i];
 
-        if (OBJECT(p->c)->ref == 1) {
-            migration_ioc_unregister_yank(p->c);
-        }
-
+        migration_ioc_unregister_yank(p->c);
         object_unref(OBJECT(p->c));
         p->c = NULL;
         qemu_mutex_destroy(&p->mutex);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 3/7] migration/ram: Don't passs RAMState to migration_clear_memory_region_dirty_bitmap_*()
  2021-10-19  9:29 [PULL 0/7] Migration.next patches Juan Quintela
  2021-10-19  9:29 ` [PULL 1/7] multifd: Implement yank for multifd send side Juan Quintela
  2021-10-19  9:29 ` [PULL 2/7] multifd: Unconditionally unregister yank function Juan Quintela
@ 2021-10-19  9:29 ` Juan Quintela
  2021-10-19  9:29 ` [PULL 4/7] migration: allow multifd for socket protocol only Juan Quintela
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Juan Quintela @ 2021-10-19  9:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert, Peter Xu, Juan Quintela

From: David Hildenbrand <david@redhat.com>

The parameter is unused, let's drop it.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7a43bfd7af..bb908822d5 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -789,8 +789,7 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
     return find_next_bit(bitmap, size, start);
 }
 
-static void migration_clear_memory_region_dirty_bitmap(RAMState *rs,
-                                                       RAMBlock *rb,
+static void migration_clear_memory_region_dirty_bitmap(RAMBlock *rb,
                                                        unsigned long page)
 {
     uint8_t shift;
@@ -818,8 +817,7 @@ static void migration_clear_memory_region_dirty_bitmap(RAMState *rs,
 }
 
 static void
-migration_clear_memory_region_dirty_bitmap_range(RAMState *rs,
-                                                 RAMBlock *rb,
+migration_clear_memory_region_dirty_bitmap_range(RAMBlock *rb,
                                                  unsigned long start,
                                                  unsigned long npages)
 {
@@ -832,7 +830,7 @@ migration_clear_memory_region_dirty_bitmap_range(RAMState *rs,
      * exclusive.
      */
     for (i = chunk_start; i < chunk_end; i += chunk_pages) {
-        migration_clear_memory_region_dirty_bitmap(rs, rb, i);
+        migration_clear_memory_region_dirty_bitmap(rb, i);
     }
 }
 
@@ -850,7 +848,7 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
      * the page in the chunk we clear the remote dirty bitmap for all.
      * Clearing it earlier won't be a problem, but too late will.
      */
-    migration_clear_memory_region_dirty_bitmap(rs, rb, page);
+    migration_clear_memory_region_dirty_bitmap(rb, page);
 
     ret = test_and_clear_bit(page, rb->bmap);
     if (ret) {
@@ -2777,8 +2775,7 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
          * are initially set. Otherwise those skipped pages will be sent in
          * the next round after syncing from the memory region bitmap.
          */
-        migration_clear_memory_region_dirty_bitmap_range(ram_state, block,
-                                                         start, npages);
+        migration_clear_memory_region_dirty_bitmap_range(block, start, npages);
         ram_state->migration_dirty_pages -=
                       bitmap_count_one_with_offset(block->bmap, start, npages);
         bitmap_clear(block->bmap, start, npages);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 4/7] migration: allow multifd for socket protocol only
  2021-10-19  9:29 [PULL 0/7] Migration.next patches Juan Quintela
                   ` (2 preceding siblings ...)
  2021-10-19  9:29 ` [PULL 3/7] migration/ram: Don't passs RAMState to migration_clear_memory_region_dirty_bitmap_*() Juan Quintela
@ 2021-10-19  9:29 ` Juan Quintela
  2021-10-19  9:29 ` [PULL 5/7] migration: allow enabling mutilfd for specific " Juan Quintela
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Juan Quintela @ 2021-10-19  9:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr. David Alan Gilbert, Li Zhijian, Juan Quintela

From: Li Zhijian <lizhijian@cn.fujitsu.com>

To: <quintela@redhat.com>, <dgilbert@redhat.com>, <qemu-devel@nongnu.org>
CC: Li Zhijian <lizhijian@cn.fujitsu.com>
Date: Sat, 31 Jul 2021 22:05:51 +0800 (5 weeks, 4 days, 17 hours ago)

multifd with unsupported protocol will cause a segment fault.
(gdb) bt
 #0  0x0000563b4a93faf8 in socket_connect (addr=0x0, errp=0x7f7f02675410) at ../util/qemu-sockets.c:1190
 #1 0x0000563b4a797a03 in qio_channel_socket_connect_sync
(ioc=0x563b4d16e8c0, addr=0x0, errp=0x7f7f02675410) at
../io/channel-socket.c:145
 #2  0x0000563b4a797abf in qio_channel_socket_connect_worker (task=0x563b4cd86c30, opaque=0x0) at ../io/channel-socket.c:168
 #3  0x0000563b4a792631 in qio_task_thread_worker (opaque=0x563b4cd86c30) at ../io/task.c:124
 #4  0x0000563b4a91da69 in qemu_thread_start (args=0x563b4c44bb80) at ../util/qemu-thread-posix.c:541
 #5  0x00007f7fe9b5b3f9 in ?? ()
 #6  0x0000000000000000 in ?? ()

It's enough to check migrate_multifd_is_allowed() in multifd cleanup() and
multifd setup() though there are so many other places using migrate_use_multifd().

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/multifd.h   |  2 ++
 migration/migration.c |  4 ++++
 migration/multifd.c   | 24 ++++++++++++++++++++++--
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 16c4d112d1..15c50ca0b2 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -13,6 +13,8 @@
 #ifndef QEMU_MIGRATION_MULTIFD_H
 #define QEMU_MIGRATION_MULTIFD_H
 
+bool migrate_multifd_is_allowed(void);
+void migrate_protocol_allow_multifd(bool allow);
 int multifd_save_setup(Error **errp);
 void multifd_save_cleanup(void);
 int multifd_load_setup(Error **errp);
diff --git a/migration/migration.c b/migration/migration.c
index 6ac807ef3d..f13b07c638 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -453,10 +453,12 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p = NULL;
 
+    migrate_protocol_allow_multifd(false); /* reset it anyway */
     qapi_event_send_migration(MIGRATION_STATUS_SETUP);
     if (strstart(uri, "tcp:", &p) ||
         strstart(uri, "unix:", NULL) ||
         strstart(uri, "vsock:", NULL)) {
+        migrate_protocol_allow_multifd(true);
         socket_start_incoming_migration(p ? p : uri, errp);
 #ifdef CONFIG_RDMA
     } else if (strstart(uri, "rdma:", &p)) {
@@ -2280,9 +2282,11 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         }
     }
 
+    migrate_protocol_allow_multifd(false);
     if (strstart(uri, "tcp:", &p) ||
         strstart(uri, "unix:", NULL) ||
         strstart(uri, "vsock:", NULL)) {
+        migrate_protocol_allow_multifd(true);
         socket_start_outgoing_migration(s, p ? p : uri, &local_err);
 #ifdef CONFIG_RDMA
     } else if (strstart(uri, "rdma:", &p)) {
diff --git a/migration/multifd.c b/migration/multifd.c
index efd424bc97..283f672bf0 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -531,7 +531,7 @@ void multifd_save_cleanup(void)
 {
     int i;
 
-    if (!migrate_use_multifd()) {
+    if (!migrate_use_multifd() || !migrate_multifd_is_allowed()) {
         return;
     }
     multifd_send_terminate_threads(NULL);
@@ -868,6 +868,17 @@ cleanup:
     multifd_new_send_channel_cleanup(p, sioc, local_err);
 }
 
+static bool migrate_allow_multifd;
+void migrate_protocol_allow_multifd(bool allow)
+{
+    migrate_allow_multifd = allow;
+}
+
+bool migrate_multifd_is_allowed(void)
+{
+    return migrate_allow_multifd;
+}
+
 int multifd_save_setup(Error **errp)
 {
     int thread_count;
@@ -878,6 +889,11 @@ int multifd_save_setup(Error **errp)
     if (!migrate_use_multifd()) {
         return 0;
     }
+    if (!migrate_multifd_is_allowed()) {
+        error_setg(errp, "multifd is not supported by current protocol");
+        return -1;
+    }
+
     s = migrate_get_current();
     thread_count = migrate_multifd_channels();
     multifd_send_state = g_malloc0(sizeof(*multifd_send_state));
@@ -971,7 +987,7 @@ int multifd_load_cleanup(Error **errp)
 {
     int i;
 
-    if (!migrate_use_multifd()) {
+    if (!migrate_use_multifd() || !migrate_multifd_is_allowed()) {
         return 0;
     }
     multifd_recv_terminate_threads(NULL);
@@ -1120,6 +1136,10 @@ int multifd_load_setup(Error **errp)
     if (!migrate_use_multifd()) {
         return 0;
     }
+    if (!migrate_multifd_is_allowed()) {
+        error_setg(errp, "multifd is not supported by current protocol");
+        return -1;
+    }
     thread_count = migrate_multifd_channels();
     multifd_recv_state = g_malloc0(sizeof(*multifd_recv_state));
     multifd_recv_state->params = g_new0(MultiFDRecvParams, thread_count);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 5/7] migration: allow enabling mutilfd for specific protocol only
  2021-10-19  9:29 [PULL 0/7] Migration.next patches Juan Quintela
                   ` (3 preceding siblings ...)
  2021-10-19  9:29 ` [PULL 4/7] migration: allow multifd for socket protocol only Juan Quintela
@ 2021-10-19  9:29 ` Juan Quintela
  2021-10-19  9:29 ` [PULL 6/7] migration/rdma: Try to register On-Demand Paging memory region Juan Quintela
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Juan Quintela @ 2021-10-19  9:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr. David Alan Gilbert, Li Zhijian, Juan Quintela

From: Li Zhijian <lizhijian@cn.fujitsu.com>

To: <quintela@redhat.com>, <dgilbert@redhat.com>, <qemu-devel@nongnu.org>
CC: Li Zhijian <lizhijian@cn.fujitsu.com>
Date: Sat, 31 Jul 2021 22:05:52 +0800 (5 weeks, 4 days, 17 hours ago)

And change the default to true so that in '-incoming defer' case, user is able
to change multifd capability.

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/migration.c | 8 ++++++++
 migration/multifd.c   | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index f13b07c638..9172686b89 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1237,6 +1237,14 @@ static bool migrate_caps_check(bool *cap_list,
         }
     }
 
+    /* incoming side only */
+    if (runstate_check(RUN_STATE_INMIGRATE) &&
+        !migrate_multifd_is_allowed() &&
+        cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
+        error_setg(errp, "multifd is not supported by current protocol");
+        return false;
+    }
+
     return true;
 }
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 283f672bf0..7c9deb1921 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -868,7 +868,7 @@ cleanup:
     multifd_new_send_channel_cleanup(p, sioc, local_err);
 }
 
-static bool migrate_allow_multifd;
+static bool migrate_allow_multifd = true;
 void migrate_protocol_allow_multifd(bool allow)
 {
     migrate_allow_multifd = allow;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 6/7] migration/rdma: Try to register On-Demand Paging memory region
  2021-10-19  9:29 [PULL 0/7] Migration.next patches Juan Quintela
                   ` (4 preceding siblings ...)
  2021-10-19  9:29 ` [PULL 5/7] migration: allow enabling mutilfd for specific " Juan Quintela
@ 2021-10-19  9:29 ` Juan Quintela
  2021-10-19  9:29 ` [PULL 7/7] migration/rdma: advise prefetch write for ODP region Juan Quintela
  2021-10-19 16:55 ` [PULL 0/7] Migration.next patches Richard Henderson
  7 siblings, 0 replies; 10+ messages in thread
From: Juan Quintela @ 2021-10-19  9:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr. David Alan Gilbert, Li Zhijian, Juan Quintela

From: Li Zhijian <lizhijian@cn.fujitsu.com>

Previously, for the fsdax mem-backend-file, it will register failed with
Operation not supported. In this case, we can try to register it with
On-Demand Paging[1] like what rpma_mr_reg() does on rpma[2].

[1]: https://community.mellanox.com/s/article/understanding-on-demand-paging--odp-x
[2]: http://pmem.io/rpma/manpages/v0.9.0/rpma_mr_reg.3

CC: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/rdma.c       | 73 ++++++++++++++++++++++++++++++------------
 migration/trace-events |  1 +
 2 files changed, 54 insertions(+), 20 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 5c2d113aa9..eb80431aae 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1117,19 +1117,47 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma)
     return 0;
 }
 
+/* Check whether On-Demand Paging is supported by RDAM device */
+static bool rdma_support_odp(struct ibv_context *dev)
+{
+    struct ibv_device_attr_ex attr = {0};
+    int ret = ibv_query_device_ex(dev, NULL, &attr);
+    if (ret) {
+        return false;
+    }
+
+    if (attr.odp_caps.general_caps & IBV_ODP_SUPPORT) {
+        return true;
+    }
+
+    return false;
+}
+
 static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
 {
     int i;
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
 
     for (i = 0; i < local->nb_blocks; i++) {
+        int access = IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE;
+
         local->block[i].mr =
             ibv_reg_mr(rdma->pd,
                     local->block[i].local_host_addr,
-                    local->block[i].length,
-                    IBV_ACCESS_LOCAL_WRITE |
-                    IBV_ACCESS_REMOTE_WRITE
+                    local->block[i].length, access
                     );
+
+        if (!local->block[i].mr &&
+            errno == ENOTSUP && rdma_support_odp(rdma->verbs)) {
+                access |= IBV_ACCESS_ON_DEMAND;
+                /* register ODP mr */
+                local->block[i].mr =
+                    ibv_reg_mr(rdma->pd,
+                               local->block[i].local_host_addr,
+                               local->block[i].length, access);
+                trace_qemu_rdma_register_odp_mr(local->block[i].block_name);
+        }
+
         if (!local->block[i].mr) {
             perror("Failed to register local dest ram block!");
             break;
@@ -1215,28 +1243,33 @@ static int qemu_rdma_register_and_get_keys(RDMAContext *rdma,
      */
     if (!block->pmr[chunk]) {
         uint64_t len = chunk_end - chunk_start;
+        int access = rkey ? IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE :
+                     0;
 
         trace_qemu_rdma_register_and_get_keys(len, chunk_start);
 
-        block->pmr[chunk] = ibv_reg_mr(rdma->pd,
-                chunk_start, len,
-                (rkey ? (IBV_ACCESS_LOCAL_WRITE |
-                        IBV_ACCESS_REMOTE_WRITE) : 0));
-
-        if (!block->pmr[chunk]) {
-            perror("Failed to register chunk!");
-            fprintf(stderr, "Chunk details: block: %d chunk index %d"
-                            " start %" PRIuPTR " end %" PRIuPTR
-                            " host %" PRIuPTR
-                            " local %" PRIuPTR " registrations: %d\n",
-                            block->index, chunk, (uintptr_t)chunk_start,
-                            (uintptr_t)chunk_end, host_addr,
-                            (uintptr_t)block->local_host_addr,
-                            rdma->total_registrations);
-            return -1;
+        block->pmr[chunk] = ibv_reg_mr(rdma->pd, chunk_start, len, access);
+        if (!block->pmr[chunk] &&
+            errno == ENOTSUP && rdma_support_odp(rdma->verbs)) {
+            access |= IBV_ACCESS_ON_DEMAND;
+            /* register ODP mr */
+            block->pmr[chunk] = ibv_reg_mr(rdma->pd, chunk_start, len, access);
+            trace_qemu_rdma_register_odp_mr(block->block_name);
         }
-        rdma->total_registrations++;
     }
+    if (!block->pmr[chunk]) {
+        perror("Failed to register chunk!");
+        fprintf(stderr, "Chunk details: block: %d chunk index %d"
+                        " start %" PRIuPTR " end %" PRIuPTR
+                        " host %" PRIuPTR
+                        " local %" PRIuPTR " registrations: %d\n",
+                        block->index, chunk, (uintptr_t)chunk_start,
+                        (uintptr_t)chunk_end, host_addr,
+                        (uintptr_t)block->local_host_addr,
+                        rdma->total_registrations);
+        return -1;
+    }
+    rdma->total_registrations++;
 
     if (lkey) {
         *lkey = block->pmr[chunk]->lkey;
diff --git a/migration/trace-events b/migration/trace-events
index a1c0f034ab..5f6aa580de 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -212,6 +212,7 @@ qemu_rdma_poll_write(const char *compstr, int64_t comp, int left, uint64_t block
 qemu_rdma_poll_other(const char *compstr, int64_t comp, int left) "other completion %s (%" PRId64 ") received left %d"
 qemu_rdma_post_send_control(const char *desc) "CONTROL: sending %s.."
 qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %" PRIu64 " bytes @ %p"
+qemu_rdma_register_odp_mr(const char *name) "Try to register On-Demand Paging memory region: %s"
 qemu_rdma_registration_handle_compress(int64_t length, int index, int64_t offset) "Zapping zero chunk: %" PRId64 " bytes, index %d, offset %" PRId64
 qemu_rdma_registration_handle_finished(void) ""
 qemu_rdma_registration_handle_ram_blocks(void) ""
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 7/7] migration/rdma: advise prefetch write for ODP region
  2021-10-19  9:29 [PULL 0/7] Migration.next patches Juan Quintela
                   ` (5 preceding siblings ...)
  2021-10-19  9:29 ` [PULL 6/7] migration/rdma: Try to register On-Demand Paging memory region Juan Quintela
@ 2021-10-19  9:29 ` Juan Quintela
  2021-10-19 16:55 ` [PULL 0/7] Migration.next patches Richard Henderson
  7 siblings, 0 replies; 10+ messages in thread
From: Juan Quintela @ 2021-10-19  9:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr. David Alan Gilbert, Li Zhijian, Juan Quintela

From: Li Zhijian <lizhijian@cn.fujitsu.com>

The responder mr registering with ODP will sent RNR NAK back to
the requester in the face of the page fault.
---------
ibv_poll_cq wc.status=13 RNR retry counter exceeded!
ibv_poll_cq wrid=WRITE RDMA!
---------
ibv_advise_mr(3) helps to make pages present before the actual IO is
conducted so that the responder does page fault as little as possible.

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 meson.build            |  6 ++++++
 migration/rdma.c       | 42 ++++++++++++++++++++++++++++++++++++++++++
 migration/trace-events |  1 +
 3 files changed, 49 insertions(+)

diff --git a/meson.build b/meson.build
index 5e7946776d..9ed9a993e2 100644
--- a/meson.build
+++ b/meson.build
@@ -1530,6 +1530,12 @@ config_host_data.set('HAVE_COPY_FILE_RANGE', cc.has_function('copy_file_range'))
 config_host_data.set('HAVE_OPENPTY', cc.has_function('openpty', dependencies: util))
 config_host_data.set('HAVE_STRCHRNUL', cc.has_function('strchrnul'))
 config_host_data.set('HAVE_SYSTEM_FUNCTION', cc.has_function('system', prefix: '#include <stdlib.h>'))
+if rdma.found()
+  config_host_data.set('HAVE_IBV_ADVISE_MR',
+                       cc.has_function('ibv_advise_mr',
+                                       args: config_host['RDMA_LIBS'].split(),
+                                       prefix: '#include <infiniband/verbs.h>'))
+endif
 
 # has_header_symbol
 config_host_data.set('CONFIG_BYTESWAP_H',
diff --git a/migration/rdma.c b/migration/rdma.c
index eb80431aae..2a3c7889b9 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1133,6 +1133,32 @@ static bool rdma_support_odp(struct ibv_context *dev)
     return false;
 }
 
+/*
+ * ibv_advise_mr to avoid RNR NAK error as far as possible.
+ * The responder mr registering with ODP will sent RNR NAK back to
+ * the requester in the face of the page fault.
+ */
+static void qemu_rdma_advise_prefetch_mr(struct ibv_pd *pd, uint64_t addr,
+                                         uint32_t len,  uint32_t lkey,
+                                         const char *name, bool wr)
+{
+#ifdef HAVE_IBV_ADVISE_MR
+    int ret;
+    int advice = wr ? IBV_ADVISE_MR_ADVICE_PREFETCH_WRITE :
+                 IBV_ADVISE_MR_ADVICE_PREFETCH;
+    struct ibv_sge sg_list = {.lkey = lkey, .addr = addr, .length = len};
+
+    ret = ibv_advise_mr(pd, advice,
+                        IBV_ADVISE_MR_FLAG_FLUSH, &sg_list, 1);
+    /* ignore the error */
+    if (ret) {
+        trace_qemu_rdma_advise_mr(name, len, addr, strerror(errno));
+    } else {
+        trace_qemu_rdma_advise_mr(name, len, addr, "successed");
+    }
+#endif
+}
+
 static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
 {
     int i;
@@ -1156,6 +1182,15 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
                                local->block[i].local_host_addr,
                                local->block[i].length, access);
                 trace_qemu_rdma_register_odp_mr(local->block[i].block_name);
+
+                if (local->block[i].mr) {
+                    qemu_rdma_advise_prefetch_mr(rdma->pd,
+                                    (uintptr_t)local->block[i].local_host_addr,
+                                    local->block[i].length,
+                                    local->block[i].mr->lkey,
+                                    local->block[i].block_name,
+                                    true);
+                }
         }
 
         if (!local->block[i].mr) {
@@ -1255,6 +1290,13 @@ static int qemu_rdma_register_and_get_keys(RDMAContext *rdma,
             /* register ODP mr */
             block->pmr[chunk] = ibv_reg_mr(rdma->pd, chunk_start, len, access);
             trace_qemu_rdma_register_odp_mr(block->block_name);
+
+            if (block->pmr[chunk]) {
+                qemu_rdma_advise_prefetch_mr(rdma->pd, (uintptr_t)chunk_start,
+                                            len, block->pmr[chunk]->lkey,
+                                            block->block_name, rkey);
+
+            }
         }
     }
     if (!block->pmr[chunk]) {
diff --git a/migration/trace-events b/migration/trace-events
index 5f6aa580de..a8ae163707 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -213,6 +213,7 @@ qemu_rdma_poll_other(const char *compstr, int64_t comp, int left) "other complet
 qemu_rdma_post_send_control(const char *desc) "CONTROL: sending %s.."
 qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %" PRIu64 " bytes @ %p"
 qemu_rdma_register_odp_mr(const char *name) "Try to register On-Demand Paging memory region: %s"
+qemu_rdma_advise_mr(const char *name, uint32_t len, uint64_t addr, const char *res) "Try to advise block %s prefetch at %" PRIu32 "@0x%" PRIx64 ": %s"
 qemu_rdma_registration_handle_compress(int64_t length, int index, int64_t offset) "Zapping zero chunk: %" PRId64 " bytes, index %d, offset %" PRId64
 qemu_rdma_registration_handle_finished(void) ""
 qemu_rdma_registration_handle_ram_blocks(void) ""
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PULL 0/7] Migration.next patches
  2021-10-19  9:29 [PULL 0/7] Migration.next patches Juan Quintela
                   ` (6 preceding siblings ...)
  2021-10-19  9:29 ` [PULL 7/7] migration/rdma: advise prefetch write for ODP region Juan Quintela
@ 2021-10-19 16:55 ` Richard Henderson
  7 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2021-10-19 16:55 UTC (permalink / raw)
  To: Juan Quintela, qemu-devel; +Cc: Dr. David Alan Gilbert

On 10/19/21 2:29 AM, Juan Quintela wrote:
> The following changes since commit 362534a643b4a34bcb223996538ce9de5cdab946:
> 
>    Merge remote-tracking branch 'remotes/bsdimp/tags/pull-bsd-user-20211018-pull-request' into staging (2021-10-18 12:17:24 -0700)
> 
> are available in the Git repository at:
> 
>    https://github.com/juanquintela/qemu.git tags/migration.next-pull-request
> 
> for you to fetch changes up to 911965ace9386e35ca022a65bb45a32fd421af3e:
> 
>    migration/rdma: advise prefetch write for ODP region (2021-10-19 08:39:04 +0200)
> 
> ----------------------------------------------------------------
> Migration Pull request (3rd try)
> 
> Hi
> 
> This should fix all the freebsd problems.
> 
> Please apply,
> 
> ----------------------------------------------------------------
> 
> David Hildenbrand (1):
>    migration/ram: Don't passs RAMState to
>      migration_clear_memory_region_dirty_bitmap_*()
> 
> Li Zhijian (4):
>    migration: allow multifd for socket protocol only
>    migration: allow enabling mutilfd for specific protocol only
>    migration/rdma: Try to register On-Demand Paging memory region
>    migration/rdma: advise prefetch write for ODP region
> 
> Lukas Straub (2):
>    multifd: Implement yank for multifd send side
>    multifd: Unconditionally unregister yank function
> 
>   meson.build            |   6 +++
>   migration/multifd.h    |   4 ++
>   migration/migration.c  |  12 +++++
>   migration/multifd.c    |  35 ++++++++++---
>   migration/ram.c        |  13 ++---
>   migration/rdma.c       | 113 ++++++++++++++++++++++++++++++++++-------
>   migration/trace-events |   2 +
>   7 files changed, 151 insertions(+), 34 deletions(-)

Applied, thanks.

r~



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PULL 1/7] multifd: Implement yank for multifd send side
  2021-09-09 10:33 Juan Quintela
@ 2021-09-09 10:33 ` Juan Quintela
  0 siblings, 0 replies; 10+ messages in thread
From: Juan Quintela @ 2021-09-09 10:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Lukas Straub, Dr. David Alan Gilbert, Leonardo Bras, Juan Quintela

From: Lukas Straub <lukasstraub2@web.de>

To: qemu-devel <qemu-devel@nongnu.org>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela
 <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Leonardo Bras Soares
 Passos <lsoaresp@redhat.com>
Date: Wed, 1 Sep 2021 17:58:57 +0200 (1 week, 15 hours, 17 minutes ago)

[[PGP Signed Part:No public key for 35AB0B289C5DB258 created at 2021-09-01T17:58:57+0200 using RSA]]
When introducing yank functionality in the migration code I forgot
to cover the multifd send side.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/multifd.h | 2 ++
 migration/multifd.c | 6 +++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 8d6751f5ed..16c4d112d1 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -85,6 +85,8 @@ typedef struct {
     bool running;
     /* should this thread finish */
     bool quit;
+    /* is the yank function registered */
+    bool registered_yank;
     /* thread has work to do */
     int pending_job;
     /* array of pages to sent */
diff --git a/migration/multifd.c b/migration/multifd.c
index 377da78f5b..5a4f158f3c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -546,6 +546,9 @@ void multifd_save_cleanup(void)
         MultiFDSendParams *p = &multifd_send_state->params[i];
         Error *local_err = NULL;
 
+        if (p->registered_yank) {
+            migration_ioc_unregister_yank(p->c);
+        }
         socket_send_channel_destroy(p->c);
         p->c = NULL;
         qemu_mutex_destroy(&p->mutex);
@@ -813,7 +816,8 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
                 return false;
             }
         } else {
-            /* update for tls qio channel */
+            migration_ioc_register_yank(ioc);
+            p->registered_yank = true;
             p->c = ioc;
             qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
                                    QEMU_THREAD_JOINABLE);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-10-19 17:26 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-19  9:29 [PULL 0/7] Migration.next patches Juan Quintela
2021-10-19  9:29 ` [PULL 1/7] multifd: Implement yank for multifd send side Juan Quintela
2021-10-19  9:29 ` [PULL 2/7] multifd: Unconditionally unregister yank function Juan Quintela
2021-10-19  9:29 ` [PULL 3/7] migration/ram: Don't passs RAMState to migration_clear_memory_region_dirty_bitmap_*() Juan Quintela
2021-10-19  9:29 ` [PULL 4/7] migration: allow multifd for socket protocol only Juan Quintela
2021-10-19  9:29 ` [PULL 5/7] migration: allow enabling mutilfd for specific " Juan Quintela
2021-10-19  9:29 ` [PULL 6/7] migration/rdma: Try to register On-Demand Paging memory region Juan Quintela
2021-10-19  9:29 ` [PULL 7/7] migration/rdma: advise prefetch write for ODP region Juan Quintela
2021-10-19 16:55 ` [PULL 0/7] Migration.next patches Richard Henderson
  -- strict thread matches above, loose matches on Subject: below --
2021-09-09 10:33 Juan Quintela
2021-09-09 10:33 ` [PULL 1/7] multifd: Implement yank for multifd send side Juan Quintela

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.