All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/12] Remove RDMA migration dependence on RAMBlock offset
@ 2015-06-11 17:17 Dr. David Alan Gilbert (git)
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 01/12] Rename RDMA structures to make destination clear Dr. David Alan Gilbert (git)
                   ` (11 more replies)
  0 siblings, 12 replies; 36+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-11 17:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

RDMA migration currently relies on the source and destination RAMBlocks
having the same offsets within ram_addr_t space;  unfortunately that's
just not true when:
   a) You hotplug on the source but then create the device on the command line
     on the destination.

   b) Across two versions of qemu

Thus there are migrations that work with TCP that don't with RDMA.

The changes keep stream compatibility with existing RDMA migration,
so cases that already work (i.e. no hotplug) will keep working.

The new requirements do rely on the block indexes being the same on
both sides (hence we sort to ensure that), my reading of the existing
code is that it also relies on that in various places but doesn't ensure
it's true.

With some light testing this seems to work; hopefully I've got all the
cases that pass offsets back and forward.

v2:
  Keep rdma_delete_block's ability to delete entries from the hash
     (on the source side)
  Fix ram-control-hook modification so they don't break non-rdma migrate
  Added a fix that makes the RDMA migration exit in the case of some mismatched
    configs instead of hanging.
  Clarified comments/commit messages
  Added a pair of typo fixes

Dr. David Alan Gilbert (12):
  Rename RDMA structures to make destination clear
  qemu_ram_foreach_block: pass up error value, and down the ramblock
    name
  Remove unneeded memset
  rdma typos
  Store block name in local blocks structure
  Translate offsets to destination address space
  Rework ram_control_load_hook to hook during block load
  Allow rdma_delete_block to work without the hash
  Rework ram block hash
  Sort destination RAMBlocks to be the same as the source
  Sanity check RDMA remote data
  Fail more cleanly in mismatched RAM cases

 arch_init.c                   |   4 +-
 exec.c                        |  10 +-
 include/exec/cpu-common.h     |   4 +-
 include/migration/migration.h |   2 +-
 include/migration/qemu-file.h |  14 +-
 migration/qemu-file.c         |  16 +-
 migration/rdma.c              | 347 +++++++++++++++++++++++++++++-------------
 trace-events                  |  12 +-
 8 files changed, 279 insertions(+), 130 deletions(-)

-- 
2.4.2

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v2 01/12] Rename RDMA structures to make destination clear
  2015-06-11 17:17 [Qemu-devel] [PATCH v2 00/12] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
@ 2015-06-11 17:17 ` Dr. David Alan Gilbert (git)
  2015-07-01  8:36   ` Juan Quintela
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 02/12] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-11 17:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

RDMA has two data types that are named confusingly;
   RDMALocalBlock (pointed to indirectly by local_ram_blocks)
   RDMARemoteBlock (pointed to by block in RDMAContext)

RDMALocalBlocks, as the name suggests is a data strucuture that
represents the RDMAable RAM Blocks on the current side of the migration
whichever that is.

RDMARemoteBlocks is always the shape of the RAMBlocks on the
destination, even on the destination.

Rename:
     RDMARemoteBlock -> RDMADestBlock
     context->'block' -> context->dest_blocks

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>
---
 migration/rdma.c | 66 ++++++++++++++++++++++++++++----------------------------
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 77e3444..089adcf 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -236,13 +236,13 @@ typedef struct RDMALocalBlock {
  * corresponding RDMALocalBlock with
  * the information needed to perform the actual RDMA.
  */
-typedef struct QEMU_PACKED RDMARemoteBlock {
+typedef struct QEMU_PACKED RDMADestBlock {
     uint64_t remote_host_addr;
     uint64_t offset;
     uint64_t length;
     uint32_t remote_rkey;
     uint32_t padding;
-} RDMARemoteBlock;
+} RDMADestBlock;
 
 static uint64_t htonll(uint64_t v)
 {
@@ -258,20 +258,20 @@ static uint64_t ntohll(uint64_t v) {
     return ((uint64_t)ntohl(u.lv[0]) << 32) | (uint64_t) ntohl(u.lv[1]);
 }
 
-static void remote_block_to_network(RDMARemoteBlock *rb)
+static void dest_block_to_network(RDMADestBlock *db)
 {
-    rb->remote_host_addr = htonll(rb->remote_host_addr);
-    rb->offset = htonll(rb->offset);
-    rb->length = htonll(rb->length);
-    rb->remote_rkey = htonl(rb->remote_rkey);
+    db->remote_host_addr = htonll(db->remote_host_addr);
+    db->offset = htonll(db->offset);
+    db->length = htonll(db->length);
+    db->remote_rkey = htonl(db->remote_rkey);
 }
 
-static void network_to_remote_block(RDMARemoteBlock *rb)
+static void network_to_dest_block(RDMADestBlock *db)
 {
-    rb->remote_host_addr = ntohll(rb->remote_host_addr);
-    rb->offset = ntohll(rb->offset);
-    rb->length = ntohll(rb->length);
-    rb->remote_rkey = ntohl(rb->remote_rkey);
+    db->remote_host_addr = ntohll(db->remote_host_addr);
+    db->offset = ntohll(db->offset);
+    db->length = ntohll(db->length);
+    db->remote_rkey = ntohl(db->remote_rkey);
 }
 
 /*
@@ -350,7 +350,7 @@ typedef struct RDMAContext {
      * Description of ram blocks used throughout the code.
      */
     RDMALocalBlocks local_ram_blocks;
-    RDMARemoteBlock *block;
+    RDMADestBlock  *dest_blocks;
 
     /*
      * Migration on *destination* started.
@@ -590,7 +590,7 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
     memset(local, 0, sizeof *local);
     qemu_ram_foreach_block(qemu_rdma_init_one_block, rdma);
     trace_qemu_rdma_init_ram_blocks(local->nb_blocks);
-    rdma->block = (RDMARemoteBlock *) g_malloc0(sizeof(RDMARemoteBlock) *
+    rdma->dest_blocks = (RDMADestBlock *) g_malloc0(sizeof(RDMADestBlock) *
                         rdma->local_ram_blocks.nb_blocks);
     local->init = true;
     return 0;
@@ -2177,8 +2177,8 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
         rdma->connected = false;
     }
 
-    g_free(rdma->block);
-    rdma->block = NULL;
+    g_free(rdma->dest_blocks);
+    rdma->dest_blocks = NULL;
 
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         if (rdma->wr_data[idx].control_mr) {
@@ -2967,25 +2967,25 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque,
              * their "local" descriptions with what was sent.
              */
             for (i = 0; i < local->nb_blocks; i++) {
-                rdma->block[i].remote_host_addr =
+                rdma->dest_blocks[i].remote_host_addr =
                     (uintptr_t)(local->block[i].local_host_addr);
 
                 if (rdma->pin_all) {
-                    rdma->block[i].remote_rkey = local->block[i].mr->rkey;
+                    rdma->dest_blocks[i].remote_rkey = local->block[i].mr->rkey;
                 }
 
-                rdma->block[i].offset = local->block[i].offset;
-                rdma->block[i].length = local->block[i].length;
+                rdma->dest_blocks[i].offset = local->block[i].offset;
+                rdma->dest_blocks[i].length = local->block[i].length;
 
-                remote_block_to_network(&rdma->block[i]);
+                dest_block_to_network(&rdma->dest_blocks[i]);
             }
 
             blocks.len = rdma->local_ram_blocks.nb_blocks
-                                                * sizeof(RDMARemoteBlock);
+                                                * sizeof(RDMADestBlock);
 
 
             ret = qemu_rdma_post_send_control(rdma,
-                                        (uint8_t *) rdma->block, &blocks);
+                                        (uint8_t *) rdma->dest_blocks, &blocks);
 
             if (ret < 0) {
                 error_report("rdma migration: error sending remote info");
@@ -3141,7 +3141,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
     if (flags == RAM_CONTROL_SETUP) {
         RDMAControlHeader resp = {.type = RDMA_CONTROL_RAM_BLOCKS_RESULT };
         RDMALocalBlocks *local = &rdma->local_ram_blocks;
-        int reg_result_idx, i, j, nb_remote_blocks;
+        int reg_result_idx, i, j, nb_dest_blocks;
 
         head.type = RDMA_CONTROL_RAM_BLOCKS_REQUEST;
         trace_qemu_rdma_registration_stop_ram();
@@ -3162,7 +3162,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
             return ret;
         }
 
-        nb_remote_blocks = resp.len / sizeof(RDMARemoteBlock);
+        nb_dest_blocks = resp.len / sizeof(RDMADestBlock);
 
         /*
          * The protocol uses two different sets of rkeys (mutually exclusive):
@@ -3176,7 +3176,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
          * and then propagates the remote ram block descriptions to his local copy.
          */
 
-        if (local->nb_blocks != nb_remote_blocks) {
+        if (local->nb_blocks != nb_dest_blocks) {
             ERROR(errp, "ram blocks mismatch #1! "
                         "Your QEMU command line parameters are probably "
                         "not identical on both the source and destination.");
@@ -3184,26 +3184,26 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
         }
 
         qemu_rdma_move_header(rdma, reg_result_idx, &resp);
-        memcpy(rdma->block,
+        memcpy(rdma->dest_blocks,
             rdma->wr_data[reg_result_idx].control_curr, resp.len);
-        for (i = 0; i < nb_remote_blocks; i++) {
-            network_to_remote_block(&rdma->block[i]);
+        for (i = 0; i < nb_dest_blocks; i++) {
+            network_to_dest_block(&rdma->dest_blocks[i]);
 
             /* search local ram blocks */
             for (j = 0; j < local->nb_blocks; j++) {
-                if (rdma->block[i].offset != local->block[j].offset) {
+                if (rdma->dest_blocks[i].offset != local->block[j].offset) {
                     continue;
                 }
 
-                if (rdma->block[i].length != local->block[j].length) {
+                if (rdma->dest_blocks[i].length != local->block[j].length) {
                     ERROR(errp, "ram blocks mismatch #2! "
                         "Your QEMU command line parameters are probably "
                         "not identical on both the source and destination.");
                     return -EINVAL;
                 }
                 local->block[j].remote_host_addr =
-                        rdma->block[i].remote_host_addr;
-                local->block[j].remote_rkey = rdma->block[i].remote_rkey;
+                        rdma->dest_blocks[i].remote_host_addr;
+                local->block[j].remote_rkey = rdma->dest_blocks[i].remote_rkey;
                 break;
             }
 
-- 
2.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v2 02/12] qemu_ram_foreach_block: pass up error value, and down the ramblock name
  2015-06-11 17:17 [Qemu-devel] [PATCH v2 00/12] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 01/12] Rename RDMA structures to make destination clear Dr. David Alan Gilbert (git)
@ 2015-06-11 17:17 ` Dr. David Alan Gilbert (git)
  2015-07-01  8:36   ` Juan Quintela
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 03/12] Remove unneeded memset Dr. David Alan Gilbert (git)
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-11 17:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

check the return value of the function it calls and error if it's non-0
Fixup qemu_rdma_init_one_block that is the only current caller,
  and rdma_add_block the only function it calls using it.

Pass the name of the ramblock to the function; helps in debugging.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>
---
 exec.c                    | 10 ++++++++--
 include/exec/cpu-common.h |  4 ++--
 migration/rdma.c          |  4 ++--
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index 487583b..c4df2a4 100644
--- a/exec.c
+++ b/exec.c
@@ -3348,14 +3348,20 @@ bool cpu_physical_memory_is_io(hwaddr phys_addr)
     return res;
 }
 
-void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
+int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
 {
     RAMBlock *block;
+    int ret = 0;
 
     rcu_read_lock();
     QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
-        func(block->host, block->offset, block->used_length, opaque);
+        ret = func(block->idstr, block->host, block->offset,
+                   block->used_length, opaque);
+        if (ret) {
+            break;
+        }
     }
     rcu_read_unlock();
+    return ret;
 }
 #endif
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 43428bd..de8a720 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -126,10 +126,10 @@ void cpu_flush_icache_range(hwaddr start, int len);
 extern struct MemoryRegion io_mem_rom;
 extern struct MemoryRegion io_mem_notdirty;
 
-typedef void (RAMBlockIterFunc)(void *host_addr,
+typedef int (RAMBlockIterFunc)(const char *block_name, void *host_addr,
     ram_addr_t offset, ram_addr_t length, void *opaque);
 
-void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
+int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
 
 #endif
 
diff --git a/migration/rdma.c b/migration/rdma.c
index 089adcf..38e5f44 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -570,10 +570,10 @@ static int rdma_add_block(RDMAContext *rdma, void *host_addr,
  * in advanced before the migration starts. This tells us where the RAM blocks
  * are so that we can register them individually.
  */
-static void qemu_rdma_init_one_block(void *host_addr,
+static int qemu_rdma_init_one_block(const char *block_name, void *host_addr,
     ram_addr_t block_offset, ram_addr_t length, void *opaque)
 {
-    rdma_add_block(opaque, host_addr, block_offset, length);
+    return rdma_add_block(opaque, host_addr, block_offset, length);
 }
 
 /*
-- 
2.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v2 03/12] Remove unneeded memset
  2015-06-11 17:17 [Qemu-devel] [PATCH v2 00/12] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 01/12] Rename RDMA structures to make destination clear Dr. David Alan Gilbert (git)
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 02/12] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
@ 2015-06-11 17:17 ` Dr. David Alan Gilbert (git)
  2015-07-01  8:37   ` Juan Quintela
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 04/12] rdma typos Dr. David Alan Gilbert (git)
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-11 17:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>
---
 migration/rdma.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 38e5f44..bc73ff8 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2445,7 +2445,6 @@ static void *qemu_rdma_data_init(const char *host_port, Error **errp)
 
     if (host_port) {
         rdma = g_malloc0(sizeof(RDMAContext));
-        memset(rdma, 0, sizeof(RDMAContext));
         rdma->current_index = -1;
         rdma->current_chunk = -1;
 
-- 
2.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v2 04/12] rdma typos
  2015-06-11 17:17 [Qemu-devel] [PATCH v2 00/12] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 03/12] Remove unneeded memset Dr. David Alan Gilbert (git)
@ 2015-06-11 17:17 ` Dr. David Alan Gilbert (git)
  2015-06-11 17:56   ` Michael R. Hines
  2015-07-01  8:37   ` Juan Quintela
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 05/12] Store block name in local blocks structure Dr. David Alan Gilbert (git)
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 36+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-11 17:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

A couple of typo fixes.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 6 +++---
 trace-events     | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index bc73ff8..44ed996 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1215,7 +1215,7 @@ const char *print_wrid(int wrid)
 
 /*
  * Perform a non-optimized memory unregistration after every transfer
- * for demonsration purposes, only if pin-all is not requested.
+ * for demonstration purposes, only if pin-all is not requested.
  *
  * Potential optimizations:
  * 1. Start a new thread to run this function continuously
@@ -3279,7 +3279,7 @@ static void rdma_accept_incoming_migration(void *opaque)
     QEMUFile *f;
     Error *local_err = NULL, **errp = &local_err;
 
-    trace_qemu_dma_accept_incoming_migration();
+    trace_qemu_rdma_accept_incoming_migration();
     ret = qemu_rdma_accept(rdma);
 
     if (ret) {
@@ -3287,7 +3287,7 @@ static void rdma_accept_incoming_migration(void *opaque)
         return;
     }
 
-    trace_qemu_dma_accept_incoming_migration_accepted();
+    trace_qemu_rdma_accept_incoming_migration_accepted();
 
     f = qemu_fopen_rdma(rdma, "rb");
     if (f == NULL) {
diff --git a/trace-events b/trace-events
index 2662ffa..8b468fe 100644
--- a/trace-events
+++ b/trace-events
@@ -1398,8 +1398,8 @@ migrate_pending(uint64_t size, uint64_t max) "pending size %" PRIu64 " max %" PR
 migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64
 
 # migration/rdma.c
-qemu_dma_accept_incoming_migration(void) ""
-qemu_dma_accept_incoming_migration_accepted(void) ""
+qemu_rdma_accept_incoming_migration(void) ""
+qemu_rdma_accept_incoming_migration_accepted(void) ""
 qemu_rdma_accept_pin_state(bool pin) "%d"
 qemu_rdma_accept_pin_verbsc(void *verbs) "Verbs context after listen: %p"
 qemu_rdma_block_for_wrid_miss(const char *wcompstr, int wcomp, const char *gcompstr, uint64_t req) "A Wanted wrid %s (%d) but got %s (%" PRIu64 ")"
-- 
2.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v2 05/12] Store block name in local blocks structure
  2015-06-11 17:17 [Qemu-devel] [PATCH v2 00/12] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 04/12] rdma typos Dr. David Alan Gilbert (git)
@ 2015-06-11 17:17 ` Dr. David Alan Gilbert (git)
  2015-07-01  8:38   ` Juan Quintela
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 06/12] Translate offsets to destination address space Dr. David Alan Gilbert (git)
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-11 17:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

In a later patch the block name will be used to match up two views
of the block list.  Keep a copy of the block name with the local block
list.

(At some point it could be argued that it would be best just to let
migration see the innards of RAMBlock and avoid the need to use
foreach).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>
---
 migration/rdma.c | 35 +++++++++++++++++++++--------------
 trace-events     |  2 +-
 2 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 44ed996..9532461 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -214,17 +214,18 @@ static void network_to_caps(RDMACapabilities *cap)
  * the information. It's small anyway, so a list is overkill.
  */
 typedef struct RDMALocalBlock {
-    uint8_t  *local_host_addr; /* local virtual address */
-    uint64_t remote_host_addr; /* remote virtual address */
-    uint64_t offset;
-    uint64_t length;
-    struct   ibv_mr **pmr;     /* MRs for chunk-level registration */
-    struct   ibv_mr *mr;       /* MR for non-chunk-level registration */
-    uint32_t *remote_keys;     /* rkeys for chunk-level registration */
-    uint32_t remote_rkey;      /* rkeys for non-chunk-level registration */
-    int      index;            /* which block are we */
-    bool     is_ram_block;
-    int      nb_chunks;
+    char          *block_name;
+    uint8_t       *local_host_addr; /* local virtual address */
+    uint64_t       remote_host_addr; /* remote virtual address */
+    uint64_t       offset;
+    uint64_t       length;
+    struct         ibv_mr **pmr;    /* MRs for chunk-level registration */
+    struct         ibv_mr *mr;      /* MR for non-chunk-level registration */
+    uint32_t      *remote_keys;     /* rkeys for chunk-level registration */
+    uint32_t       remote_rkey;     /* rkeys for non-chunk-level registration */
+    int            index;           /* which block are we */
+    bool           is_ram_block;
+    int            nb_chunks;
     unsigned long *transit_bitmap;
     unsigned long *unregister_bitmap;
 } RDMALocalBlock;
@@ -510,7 +511,8 @@ static inline uint8_t *ram_chunk_end(const RDMALocalBlock *rdma_ram_block,
     return result;
 }
 
-static int rdma_add_block(RDMAContext *rdma, void *host_addr,
+static int rdma_add_block(RDMAContext *rdma, const char *block_name,
+                         void *host_addr,
                          ram_addr_t block_offset, uint64_t length)
 {
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
@@ -538,6 +540,7 @@ static int rdma_add_block(RDMAContext *rdma, void *host_addr,
 
     block = &local->block[local->nb_blocks];
 
+    block->block_name = g_strdup(block_name);
     block->local_host_addr = host_addr;
     block->offset = block_offset;
     block->length = length;
@@ -553,7 +556,8 @@ static int rdma_add_block(RDMAContext *rdma, void *host_addr,
 
     g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
 
-    trace_rdma_add_block(local->nb_blocks, (uintptr_t) block->local_host_addr,
+    trace_rdma_add_block(block_name, local->nb_blocks,
+                         (uintptr_t) block->local_host_addr,
                          block->offset, block->length,
                          (uintptr_t) (block->local_host_addr + block->length),
                          BITS_TO_LONGS(block->nb_chunks) *
@@ -573,7 +577,7 @@ static int rdma_add_block(RDMAContext *rdma, void *host_addr,
 static int qemu_rdma_init_one_block(const char *block_name, void *host_addr,
     ram_addr_t block_offset, ram_addr_t length, void *opaque)
 {
-    return rdma_add_block(opaque, host_addr, block_offset, length);
+    return rdma_add_block(opaque, block_name, host_addr, block_offset, length);
 }
 
 /*
@@ -635,6 +639,9 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
     g_free(block->remote_keys);
     block->remote_keys = NULL;
 
+    g_free(block->block_name);
+    block->block_name = NULL;
+
     for (x = 0; x < local->nb_blocks; x++) {
         g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)old[x].offset);
     }
diff --git a/trace-events b/trace-events
index 8b468fe..557770c 100644
--- a/trace-events
+++ b/trace-events
@@ -1451,7 +1451,7 @@ qemu_rdma_write_one_recvregres(int mykey, int theirkey, uint64_t chunk) "Receive
 qemu_rdma_write_one_sendreg(uint64_t chunk, int len, int index, int64_t offset) "Sending registration request chunk %" PRIu64 " for %d bytes, index: %d, offset: %" PRId64
 qemu_rdma_write_one_top(uint64_t chunks, uint64_t size) "Writing %" PRIu64 " chunks, (%" PRIu64 " MB)"
 qemu_rdma_write_one_zero(uint64_t chunk, int len, int index, int64_t offset) "Entire chunk is zero, sending compress: %" PRIu64 " for %d bytes, index: %d, offset: %" PRId64
-rdma_add_block(int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Added Block: %d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
+rdma_add_block(const char *block_name, int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Added Block: '%s':%d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
 rdma_delete_block(int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Deleted Block: %d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
 rdma_start_incoming_migration(void) ""
 rdma_start_incoming_migration_after_dest_init(void) ""
-- 
2.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v2 06/12] Translate offsets to destination address space
  2015-06-11 17:17 [Qemu-devel] [PATCH v2 00/12] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 05/12] Store block name in local blocks structure Dr. David Alan Gilbert (git)
@ 2015-06-11 17:17 ` Dr. David Alan Gilbert (git)
  2015-06-11 18:12   ` Michael R. Hines
  2015-07-01  8:43   ` Juan Quintela
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 07/12] Rework ram_control_load_hook to hook during block load Dr. David Alan Gilbert (git)
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 36+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-11 17:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The 'offset' field in RDMACompress and 'current_addr' field
in RDMARegister are commented as being offsets within a particular
RAMBlock, however they appear to actually be offsets within the
ram_addr_t space.

The code currently assumes that the offsets on the source/destination
match, this change removes the need for the assumption for these
structures by translating the addresses into the ram_addr_t space of
the destination host.

Note: An alternative would be to change the fields to actually
take the data they're commented for; this would potentially be
simpler but would break stream compatibility for those cases
that currently work.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 31 ++++++++++++++++++++++++-------
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 9532461..cb66721 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -411,7 +411,7 @@ static void network_to_control(RDMAControlHeader *control)
  */
 typedef struct QEMU_PACKED {
     union QEMU_PACKED {
-        uint64_t current_addr;  /* offset into the ramblock of the chunk */
+        uint64_t current_addr;  /* offset into the ram_addr_t space */
         uint64_t chunk;         /* chunk to lookup if unregistering */
     } key;
     uint32_t current_index; /* which ramblock the chunk belongs to */
@@ -419,8 +419,19 @@ typedef struct QEMU_PACKED {
     uint64_t chunks;            /* how many sequential chunks to register */
 } RDMARegister;
 
-static void register_to_network(RDMARegister *reg)
+static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
 {
+    RDMALocalBlock *local_block;
+    local_block  = &rdma->local_ram_blocks.block[reg->current_index];
+
+    if (local_block->is_ram_block) {
+        /*
+         * current_addr as passed in is an address in the local ram_addr_t
+         * space, we need to translate this for the destination
+         */
+        reg->key.current_addr -= local_block->offset;
+        reg->key.current_addr += rdma->dest_blocks[reg->current_index].offset;
+    }
     reg->key.current_addr = htonll(reg->key.current_addr);
     reg->current_index = htonl(reg->current_index);
     reg->chunks = htonll(reg->chunks);
@@ -436,13 +447,19 @@ static void network_to_register(RDMARegister *reg)
 typedef struct QEMU_PACKED {
     uint32_t value;     /* if zero, we will madvise() */
     uint32_t block_idx; /* which ram block index */
-    uint64_t offset;    /* where in the remote ramblock this chunk */
+    uint64_t offset;    /* Address in remote ram_addr_t space */
     uint64_t length;    /* length of the chunk */
 } RDMACompress;
 
-static void compress_to_network(RDMACompress *comp)
+static void compress_to_network(RDMAContext *rdma, RDMACompress *comp)
 {
     comp->value = htonl(comp->value);
+    /*
+     * comp->offset as passed in is an address in the local ram_addr_t
+     * space, we need to translate this for the destination
+     */
+    comp->offset -= rdma->local_ram_blocks.block[comp->block_idx].offset;
+    comp->offset += rdma->dest_blocks[comp->block_idx].offset;
     comp->block_idx = htonl(comp->block_idx);
     comp->offset = htonll(comp->offset);
     comp->length = htonll(comp->length);
@@ -1288,7 +1305,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
         rdma->total_registrations--;
 
         reg.key.chunk = chunk;
-        register_to_network(&reg);
+        register_to_network(rdma, &reg);
         ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
                                 &resp, NULL, NULL);
         if (ret < 0) {
@@ -1909,7 +1926,7 @@ retry:
                 trace_qemu_rdma_write_one_zero(chunk, sge.length,
                                                current_index, current_addr);
 
-                compress_to_network(&comp);
+                compress_to_network(rdma, &comp);
                 ret = qemu_rdma_exchange_send(rdma, &head,
                                 (uint8_t *) &comp, NULL, NULL, NULL);
 
@@ -1936,7 +1953,7 @@ retry:
             trace_qemu_rdma_write_one_sendreg(chunk, sge.length, current_index,
                                               current_addr);
 
-            register_to_network(&reg);
+            register_to_network(rdma, &reg);
             ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
                                     &resp, &reg_result_idx, NULL);
             if (ret < 0) {
-- 
2.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v2 07/12] Rework ram_control_load_hook to hook during block load
  2015-06-11 17:17 [Qemu-devel] [PATCH v2 00/12] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
                   ` (5 preceding siblings ...)
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 06/12] Translate offsets to destination address space Dr. David Alan Gilbert (git)
@ 2015-06-11 17:17 ` Dr. David Alan Gilbert (git)
  2015-06-11 18:20   ` Michael R. Hines
  2015-07-01  8:49   ` Juan Quintela
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 08/12] Allow rdma_delete_block to work without the hash Dr. David Alan Gilbert (git)
                   ` (4 subsequent siblings)
  11 siblings, 2 replies; 36+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-11 17:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

We need the names of RAMBlocks as they're loaded for RDMA,
reuse a slightly modified ram_control_load_hook:
  a) Pass a 'data' parameter to use for the name in the block-reg
     case
  b) Only some hook types now require the presence of a hook function.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                   |  4 +++-
 include/migration/migration.h |  2 +-
 include/migration/qemu-file.h | 14 +++++++++-----
 migration/qemu-file.c         | 16 +++++++++++-----
 migration/rdma.c              | 28 ++++++++++++++++++++++------
 trace-events                  |  2 +-
 6 files changed, 47 insertions(+), 19 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index d294474..dc9cc7e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1569,6 +1569,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                                 error_report_err(local_err);
                             }
                         }
+                        ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG,
+                                              block->idstr);
                         break;
                     }
                 }
@@ -1637,7 +1639,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             break;
         default:
             if (flags & RAM_SAVE_FLAG_HOOK) {
-                ram_control_load_hook(f, flags);
+                ram_control_load_hook(f, RAM_CONTROL_HOOK, NULL);
             } else {
                 error_report("Unknown combination of migration flags: %#x",
                              flags);
diff --git a/include/migration/migration.h b/include/migration/migration.h
index a6e025a..096e1ea 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -164,7 +164,7 @@ int migrate_decompress_threads(void);
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
-void ram_control_load_hook(QEMUFile *f, uint64_t flags);
+void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data);
 
 /* Whenever this is found in the data stream, the flags
  * will be passed to ram_control_load_hook in the incoming-migration
diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index a01c5b8..7aafe19 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -63,16 +63,20 @@ typedef ssize_t (QEMUFileWritevBufferFunc)(void *opaque, struct iovec *iov,
 /*
  * This function provides hooks around different
  * stages of RAM migration.
+ * 'opaque' is the backend specific data in QEMUFile
+ * 'data' is call specific data associated with the 'flags' value
  */
-typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags);
+typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags,
+                              void *data);
 
 /*
  * Constants used by ram_control_* hooks
  */
-#define RAM_CONTROL_SETUP    0
-#define RAM_CONTROL_ROUND    1
-#define RAM_CONTROL_HOOK     2
-#define RAM_CONTROL_FINISH   3
+#define RAM_CONTROL_SETUP     0
+#define RAM_CONTROL_ROUND     1
+#define RAM_CONTROL_HOOK      2
+#define RAM_CONTROL_FINISH    3
+#define RAM_CONTROL_BLOCK_REG 4
 
 /*
  * This function allows override of where the RAM page
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2750365..5493977 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -128,7 +128,7 @@ void ram_control_before_iterate(QEMUFile *f, uint64_t flags)
     int ret = 0;
 
     if (f->ops->before_ram_iterate) {
-        ret = f->ops->before_ram_iterate(f, f->opaque, flags);
+        ret = f->ops->before_ram_iterate(f, f->opaque, flags, NULL);
         if (ret < 0) {
             qemu_file_set_error(f, ret);
         }
@@ -140,24 +140,30 @@ void ram_control_after_iterate(QEMUFile *f, uint64_t flags)
     int ret = 0;
 
     if (f->ops->after_ram_iterate) {
-        ret = f->ops->after_ram_iterate(f, f->opaque, flags);
+        ret = f->ops->after_ram_iterate(f, f->opaque, flags, NULL);
         if (ret < 0) {
             qemu_file_set_error(f, ret);
         }
     }
 }
 
-void ram_control_load_hook(QEMUFile *f, uint64_t flags)
+void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data)
 {
     int ret = -EINVAL;
 
     if (f->ops->hook_ram_load) {
-        ret = f->ops->hook_ram_load(f, f->opaque, flags);
+        ret = f->ops->hook_ram_load(f, f->opaque, flags, data);
         if (ret < 0) {
             qemu_file_set_error(f, ret);
         }
     } else {
-        qemu_file_set_error(f, ret);
+        /*
+         * Hook is a hook specifically requested by the source sending a flag
+         * that expects there to be a hook on the destination.
+         */
+        if (flags == RAM_CONTROL_HOOK) {
+            qemu_file_set_error(f, ret);
+        }
     }
 }
 
diff --git a/migration/rdma.c b/migration/rdma.c
index cb66721..396329c 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2905,8 +2905,7 @@ err_rdma_dest_wait:
  *
  * Keep doing this until the source tells us to stop.
  */
-static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque,
-                                         uint64_t flags)
+static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
 {
     RDMAControlHeader reg_resp = { .len = sizeof(RDMARegisterResult),
                                .type = RDMA_CONTROL_REGISTER_RESULT,
@@ -2936,7 +2935,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque,
     CHECK_ERROR_STATE();
 
     do {
-        trace_qemu_rdma_registration_handle_wait(flags);
+        trace_qemu_rdma_registration_handle_wait();
 
         ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_NONE);
 
@@ -3124,8 +3123,25 @@ out:
     return ret;
 }
 
+static int rdma_load_hook(QEMUFile *f, void *opaque, uint64_t flags, void *data)
+{
+    switch (flags) {
+    case RAM_CONTROL_BLOCK_REG:
+        /* TODO A later patch */
+        return 0;
+        break;
+
+    case RAM_CONTROL_HOOK:
+        return qemu_rdma_registration_handle(f, opaque);
+
+    default:
+        /* Shouldn't be called with any other values */
+        abort();
+    }
+}
+
 static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
-                                        uint64_t flags)
+                                        uint64_t flags, void *data)
 {
     QEMUFileRDMA *rfile = opaque;
     RDMAContext *rdma = rfile->rdma;
@@ -3144,7 +3160,7 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
  * First, flush writes, if any.
  */
 static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
-                                       uint64_t flags)
+                                       uint64_t flags, void *data)
 {
     Error *local_err = NULL, **errp = &local_err;
     QEMUFileRDMA *rfile = opaque;
@@ -3266,7 +3282,7 @@ static const QEMUFileOps rdma_read_ops = {
     .get_buffer    = qemu_rdma_get_buffer,
     .get_fd        = qemu_rdma_get_fd,
     .close         = qemu_rdma_close,
-    .hook_ram_load = qemu_rdma_registration_handle,
+    .hook_ram_load = rdma_load_hook,
 };
 
 static const QEMUFileOps rdma_write_ops = {
diff --git a/trace-events b/trace-events
index 557770c..0f37a4b 100644
--- a/trace-events
+++ b/trace-events
@@ -1432,7 +1432,7 @@ qemu_rdma_registration_handle_register_rkey(int rkey) "%x"
 qemu_rdma_registration_handle_unregister(int requests) "%d requests"
 qemu_rdma_registration_handle_unregister_loop(int count, int index, uint64_t chunk) "Unregistration request (%d): index %d, chunk %" PRIu64
 qemu_rdma_registration_handle_unregister_success(uint64_t chunk) "%" PRIu64
-qemu_rdma_registration_handle_wait(uint64_t flags) "Waiting for next request %" PRIu64
+qemu_rdma_registration_handle_wait(void) ""
 qemu_rdma_registration_start(uint64_t flags) "%" PRIu64
 qemu_rdma_registration_stop(uint64_t flags) "%" PRIu64
 qemu_rdma_registration_stop_ram(void) ""
-- 
2.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v2 08/12] Allow rdma_delete_block to work without the hash
  2015-06-11 17:17 [Qemu-devel] [PATCH v2 00/12] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
                   ` (6 preceding siblings ...)
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 07/12] Rework ram_control_load_hook to hook during block load Dr. David Alan Gilbert (git)
@ 2015-06-11 17:17 ` Dr. David Alan Gilbert (git)
  2015-06-11 18:36   ` Michael R. Hines
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 09/12] Rework ram block hash Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-11 17:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

In the next patch we remove the hash on the destination,
rdma_delete_block does two things with the hash which can be avoided:
  a) The caller passes the offset and rdma_delete_block looks it up
     in the hash; fixed by getting the caller to pass the block
  b) The hash gets recreated after deletion; fixed by making that
     conditional on the hash being initialised.

While this function is currently only used during cleanup, Michael
asked that we keep it general for future dynamic block registration
work.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 27 ++++++++++++++++-----------
 trace-events     |  2 +-
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 396329c..8d99378 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -617,16 +617,19 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
     return 0;
 }
 
-static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
+/*
+ * Note: If used outside of cleanup, the caller must ensure that the destination
+ * block structures are also updated
+ */
+static int rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
 {
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
-    RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
-        (void *) block_offset);
     RDMALocalBlock *old = local->block;
     int x;
 
-    assert(block);
-
+    if (rdma->blockmap) {
+        g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)block->offset);
+    }
     if (block->pmr) {
         int j;
 
@@ -659,8 +662,11 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
     g_free(block->block_name);
     block->block_name = NULL;
 
-    for (x = 0; x < local->nb_blocks; x++) {
-        g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)old[x].offset);
+    if (rdma->blockmap) {
+        for (x = 0; x < local->nb_blocks; x++) {
+            g_hash_table_remove(rdma->blockmap,
+                                (void *)(uintptr_t)old[x].offset);
+        }
     }
 
     if (local->nb_blocks > 1) {
@@ -682,8 +688,7 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
         local->block = NULL;
     }
 
-    trace_rdma_delete_block(local->nb_blocks,
-                           (uintptr_t)block->local_host_addr,
+    trace_rdma_delete_block(block, (uintptr_t)block->local_host_addr,
                            block->offset, block->length,
                             (uintptr_t)(block->local_host_addr + block->length),
                            BITS_TO_LONGS(block->nb_chunks) *
@@ -693,7 +698,7 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
 
     local->nb_blocks--;
 
-    if (local->nb_blocks) {
+    if (local->nb_blocks && rdma->blockmap) {
         for (x = 0; x < local->nb_blocks; x++) {
             g_hash_table_insert(rdma->blockmap,
                                 (void *)(uintptr_t)local->block[x].offset,
@@ -2214,7 +2219,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
 
     if (rdma->local_ram_blocks.block) {
         while (rdma->local_ram_blocks.nb_blocks) {
-            rdma_delete_block(rdma, rdma->local_ram_blocks.block->offset);
+            rdma_delete_block(rdma, &rdma->local_ram_blocks.block[0]);
         }
     }
 
diff --git a/trace-events b/trace-events
index 0f37a4b..7dff362 100644
--- a/trace-events
+++ b/trace-events
@@ -1452,7 +1452,7 @@ qemu_rdma_write_one_sendreg(uint64_t chunk, int len, int index, int64_t offset)
 qemu_rdma_write_one_top(uint64_t chunks, uint64_t size) "Writing %" PRIu64 " chunks, (%" PRIu64 " MB)"
 qemu_rdma_write_one_zero(uint64_t chunk, int len, int index, int64_t offset) "Entire chunk is zero, sending compress: %" PRIu64 " for %d bytes, index: %d, offset: %" PRId64
 rdma_add_block(const char *block_name, int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Added Block: '%s':%d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
-rdma_delete_block(int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Deleted Block: %d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
+rdma_delete_block(void *block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Deleted Block: %p, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
 rdma_start_incoming_migration(void) ""
 rdma_start_incoming_migration_after_dest_init(void) ""
 rdma_start_incoming_migration_after_rdma_listen(void) ""
-- 
2.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v2 09/12] Rework ram block hash
  2015-06-11 17:17 [Qemu-devel] [PATCH v2 00/12] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
                   ` (7 preceding siblings ...)
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 08/12] Allow rdma_delete_block to work without the hash Dr. David Alan Gilbert (git)
@ 2015-06-11 17:17 ` Dr. David Alan Gilbert (git)
  2015-06-11 18:40   ` Michael R. Hines
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 10/12] Sort destination RAMBlocks to be the same as the source Dr. David Alan Gilbert (git)
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-11 17:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

RDMA uses a hash from block offset->RAM Block; this isn't needed
on the destination, and it becomes harder to maintain after the next
patch in the series that sorts the block list.

Split the hash so that it's only generated on the source.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 32 ++++++++++++++++++++------------
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 8d99378..f541586 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -533,23 +533,22 @@ static int rdma_add_block(RDMAContext *rdma, const char *block_name,
                          ram_addr_t block_offset, uint64_t length)
 {
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
-    RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
-        (void *)(uintptr_t)block_offset);
+    RDMALocalBlock *block;
     RDMALocalBlock *old = local->block;
 
-    assert(block == NULL);
-
     local->block = g_malloc0(sizeof(RDMALocalBlock) * (local->nb_blocks + 1));
 
     if (local->nb_blocks) {
         int x;
 
-        for (x = 0; x < local->nb_blocks; x++) {
-            g_hash_table_remove(rdma->blockmap,
-                                (void *)(uintptr_t)old[x].offset);
-            g_hash_table_insert(rdma->blockmap,
-                                (void *)(uintptr_t)old[x].offset,
-                                &local->block[x]);
+        if (rdma->blockmap) {
+            for (x = 0; x < local->nb_blocks; x++) {
+                g_hash_table_remove(rdma->blockmap,
+                                    (void *)(uintptr_t)old[x].offset);
+                g_hash_table_insert(rdma->blockmap,
+                                    (void *)(uintptr_t)old[x].offset,
+                                    &local->block[x]);
+            }
         }
         memcpy(local->block, old, sizeof(RDMALocalBlock) * local->nb_blocks);
         g_free(old);
@@ -571,7 +570,9 @@ static int rdma_add_block(RDMAContext *rdma, const char *block_name,
 
     block->is_ram_block = local->init ? false : true;
 
-    g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
+    if (rdma->blockmap) {
+        g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
+    }
 
     trace_rdma_add_block(block_name, local->nb_blocks,
                          (uintptr_t) block->local_host_addr,
@@ -607,7 +608,6 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
 
     assert(rdma->blockmap == NULL);
-    rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
     memset(local, 0, sizeof *local);
     qemu_ram_foreach_block(qemu_rdma_init_one_block, rdma);
     trace_qemu_rdma_init_ram_blocks(local->nb_blocks);
@@ -2292,6 +2292,14 @@ static int qemu_rdma_source_init(RDMAContext *rdma, Error **errp, bool pin_all)
         goto err_rdma_source_init;
     }
 
+    /* Build the hash that maps from offset to RAMBlock */
+    rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
+    for (idx = 0; idx < rdma->local_ram_blocks.nb_blocks; idx++) {
+        g_hash_table_insert(rdma->blockmap,
+                (void *)(uintptr_t)rdma->local_ram_blocks.block[idx].offset,
+                &rdma->local_ram_blocks.block[idx]);
+    }
+
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
         if (ret) {
-- 
2.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v2 10/12] Sort destination RAMBlocks to be the same as the source
  2015-06-11 17:17 [Qemu-devel] [PATCH v2 00/12] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
                   ` (8 preceding siblings ...)
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 09/12] Rework ram block hash Dr. David Alan Gilbert (git)
@ 2015-06-11 17:17 ` Dr. David Alan Gilbert (git)
  2015-06-11 18:55   ` Michael R. Hines
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 11/12] Sanity check RDMA remote data Dr. David Alan Gilbert (git)
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 12/12] Fail more cleanly in mismatched RAM cases Dr. David Alan Gilbert (git)
  11 siblings, 1 reply; 36+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-11 17:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Use the order of incoming RAMBlocks from the source to record
an index number; that then allows us to sort the destination
local RAMBlock list to match the source.

Now that the RAMBlocks are known to be in the same order, this
simplifies the RDMA Registration step which previously tried to
match RAMBlocks based on offset (which isn't guaranteed to match).

Looking at the existing compress code, I think it was erroneously
relying on an assumption of matching ordering, which this fixes.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 101 ++++++++++++++++++++++++++++++++++++++++---------------
 trace-events     |   2 ++
 2 files changed, 75 insertions(+), 28 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index f541586..92dc5c1 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -224,6 +224,7 @@ typedef struct RDMALocalBlock {
     uint32_t      *remote_keys;     /* rkeys for chunk-level registration */
     uint32_t       remote_rkey;     /* rkeys for non-chunk-level registration */
     int            index;           /* which block are we */
+    unsigned int   src_index;       /* (Only used on dest) */
     bool           is_ram_block;
     int            nb_chunks;
     unsigned long *transit_bitmap;
@@ -353,6 +354,9 @@ typedef struct RDMAContext {
     RDMALocalBlocks local_ram_blocks;
     RDMADestBlock  *dest_blocks;
 
+    /* Index of the next RAMBlock received during block registration */
+    unsigned int    next_src_index;
+
     /*
      * Migration on *destination* started.
      * Then use coroutine yield function.
@@ -561,6 +565,7 @@ static int rdma_add_block(RDMAContext *rdma, const char *block_name,
     block->offset = block_offset;
     block->length = length;
     block->index = local->nb_blocks;
+    block->src_index = ~0U; /* Filled in by the receipt of the block list */
     block->nb_chunks = ram_chunk_index(host_addr, host_addr + length) + 1UL;
     block->transit_bitmap = bitmap_new(block->nb_chunks);
     bitmap_clear(block->transit_bitmap, 0, block->nb_chunks);
@@ -2909,6 +2914,14 @@ err_rdma_dest_wait:
     return ret;
 }
 
+static int dest_ram_sort_func(const void *a, const void *b)
+{
+    unsigned int a_index = ((const RDMALocalBlock *)a)->src_index;
+    unsigned int b_index = ((const RDMALocalBlock *)b)->src_index;
+
+    return (a_index < b_index) ? -1 : (a_index != b_index);
+}
+
 /*
  * During each iteration of the migration, we listen for instructions
  * by the source VM to perform dynamic page registrations before they
@@ -2986,6 +2999,13 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
         case RDMA_CONTROL_RAM_BLOCKS_REQUEST:
             trace_qemu_rdma_registration_handle_ram_blocks();
 
+            /* Sort our local RAM Block list so it's the same as the source,
+             * we can do this since we've filled in a src_index in the list
+             * as we received the RAMBlock list earlier.
+             */
+            qsort(rdma->local_ram_blocks.block,
+                  rdma->local_ram_blocks.nb_blocks,
+                  sizeof(RDMALocalBlock), dest_ram_sort_func);
             if (rdma->pin_all) {
                 ret = qemu_rdma_reg_whole_ram_blocks(rdma);
                 if (ret) {
@@ -3013,6 +3033,12 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
                 rdma->dest_blocks[i].length = local->block[i].length;
 
                 dest_block_to_network(&rdma->dest_blocks[i]);
+                trace_qemu_rdma_registration_handle_ram_blocks_loop(
+                    local->block[i].block_name,
+                    local->block[i].offset,
+                    local->block[i].length,
+                    local->block[i].local_host_addr,
+                    local->block[i].src_index);
             }
 
             blocks.len = rdma->local_ram_blocks.nb_blocks
@@ -3136,13 +3162,44 @@ out:
     return ret;
 }
 
+/* Destination:
+ * Called via a ram_control_load_hook during the initial RAM load section which
+ * lists the RAMBlocks by name.  This lets us know the order of the RAMBlocks
+ * on the source.
+ * We've already built our local RAMBlock list, but not yet sent the list to
+ * the source.
+ */
+static int rdma_block_notification_handle(QEMUFileRDMA *rfile, const char *name)
+{
+    RDMAContext *rdma = rfile->rdma;
+    int curr;
+    int found = -1;
+
+    /* Find the matching RAMBlock in our local list */
+    for (curr = 0; curr < rdma->local_ram_blocks.nb_blocks; curr++) {
+        if (!strcmp(rdma->local_ram_blocks.block[curr].block_name, name)) {
+            found = curr;
+            break;
+        }
+    }
+
+    if (found == -1) {
+        error_report("RAMBlock '%s' not found on destination", name);
+        return -ENOENT;
+    }
+
+    rdma->local_ram_blocks.block[curr].src_index = rdma->next_src_index;
+    trace_rdma_block_notification_handle(name, rdma->next_src_index);
+    rdma->next_src_index++;
+
+    return 0;
+}
+
 static int rdma_load_hook(QEMUFile *f, void *opaque, uint64_t flags, void *data)
 {
     switch (flags) {
     case RAM_CONTROL_BLOCK_REG:
-        /* TODO A later patch */
-        return 0;
-        break;
+        return rdma_block_notification_handle(opaque, data);
 
     case RAM_CONTROL_HOOK:
         return qemu_rdma_registration_handle(f, opaque);
@@ -3193,7 +3250,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
     if (flags == RAM_CONTROL_SETUP) {
         RDMAControlHeader resp = {.type = RDMA_CONTROL_RAM_BLOCKS_RESULT };
         RDMALocalBlocks *local = &rdma->local_ram_blocks;
-        int reg_result_idx, i, j, nb_dest_blocks;
+        int reg_result_idx, i, nb_dest_blocks;
 
         head.type = RDMA_CONTROL_RAM_BLOCKS_REQUEST;
         trace_qemu_rdma_registration_stop_ram();
@@ -3229,9 +3286,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
          */
 
         if (local->nb_blocks != nb_dest_blocks) {
-            ERROR(errp, "ram blocks mismatch #1! "
+            ERROR(errp, "ram blocks mismatch (Number of blocks %d vs %d) "
                         "Your QEMU command line parameters are probably "
-                        "not identical on both the source and destination.");
+                        "not identical on both the source and destination.",
+                        local->nb_blocks, nb_dest_blocks);
             return -EINVAL;
         }
 
@@ -3241,30 +3299,17 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
         for (i = 0; i < nb_dest_blocks; i++) {
             network_to_dest_block(&rdma->dest_blocks[i]);
 
-            /* search local ram blocks */
-            for (j = 0; j < local->nb_blocks; j++) {
-                if (rdma->dest_blocks[i].offset != local->block[j].offset) {
-                    continue;
-                }
-
-                if (rdma->dest_blocks[i].length != local->block[j].length) {
-                    ERROR(errp, "ram blocks mismatch #2! "
-                        "Your QEMU command line parameters are probably "
-                        "not identical on both the source and destination.");
-                    return -EINVAL;
-                }
-                local->block[j].remote_host_addr =
-                        rdma->dest_blocks[i].remote_host_addr;
-                local->block[j].remote_rkey = rdma->dest_blocks[i].remote_rkey;
-                break;
-            }
-
-            if (j >= local->nb_blocks) {
-                ERROR(errp, "ram blocks mismatch #3! "
-                        "Your QEMU command line parameters are probably "
-                        "not identical on both the source and destination.");
+            /* We require that the blocks are in the same order */
+            if (rdma->dest_blocks[i].length != local->block[i].length) {
+                ERROR(errp, "Block %s/%d has a different length %" PRIu64
+                            "vs %" PRIu64, local->block[i].block_name, i,
+                            local->block[i].length,
+                            rdma->dest_blocks[i].length);
                 return -EINVAL;
             }
+            local->block[i].remote_host_addr =
+                    rdma->dest_blocks[i].remote_host_addr;
+            local->block[i].remote_rkey = rdma->dest_blocks[i].remote_rkey;
         }
     }
 
diff --git a/trace-events b/trace-events
index 7dff362..b2bf8ea 100644
--- a/trace-events
+++ b/trace-events
@@ -1426,6 +1426,7 @@ qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %" PRIu6
 qemu_rdma_registration_handle_compress(int64_t length, int index, int64_t offset) "Zapping zero chunk: %" PRId64 " bytes, index %d, offset %" PRId64
 qemu_rdma_registration_handle_finished(void) ""
 qemu_rdma_registration_handle_ram_blocks(void) ""
+qemu_rdma_registration_handle_ram_blocks_loop(const char *name, uint64_t offset, uint64_t length, void *local_host_addr, unsigned int src_index) "%s: @%" PRIx64 "/%" PRIu64 " host:@%p src_index: %u"
 qemu_rdma_registration_handle_register(int requests) "%d requests"
 qemu_rdma_registration_handle_register_loop(int req, int index, uint64_t addr, uint64_t chunks) "Registration request (%d): index %d, current_addr %" PRIu64 " chunks: %" PRIu64
 qemu_rdma_registration_handle_register_rkey(int rkey) "%x"
@@ -1452,6 +1453,7 @@ qemu_rdma_write_one_sendreg(uint64_t chunk, int len, int index, int64_t offset)
 qemu_rdma_write_one_top(uint64_t chunks, uint64_t size) "Writing %" PRIu64 " chunks, (%" PRIu64 " MB)"
 qemu_rdma_write_one_zero(uint64_t chunk, int len, int index, int64_t offset) "Entire chunk is zero, sending compress: %" PRIu64 " for %d bytes, index: %d, offset: %" PRId64
 rdma_add_block(const char *block_name, int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Added Block: '%s':%d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
+rdma_block_notification_handle(const char *name, int index) "%s at %d"
 rdma_delete_block(void *block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Deleted Block: %p, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
 rdma_start_incoming_migration(void) ""
 rdma_start_incoming_migration_after_dest_init(void) ""
-- 
2.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v2 11/12] Sanity check RDMA remote data
  2015-06-11 17:17 [Qemu-devel] [PATCH v2 00/12] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
                   ` (9 preceding siblings ...)
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 10/12] Sort destination RAMBlocks to be the same as the source Dr. David Alan Gilbert (git)
@ 2015-06-11 17:17 ` Dr. David Alan Gilbert (git)
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 12/12] Fail more cleanly in mismatched RAM cases Dr. David Alan Gilbert (git)
  11 siblings, 0 replies; 36+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-11 17:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Perform some basic (but probably not complete) sanity checking on
requests from the RDMA source.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>
---
 migration/rdma.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index 92dc5c1..1f3a9fb 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2984,6 +2984,13 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
             trace_qemu_rdma_registration_handle_compress(comp->length,
                                                          comp->block_idx,
                                                          comp->offset);
+            if (comp->block_idx >= rdma->local_ram_blocks.nb_blocks) {
+                error_report("rdma: 'compress' bad block index %u (vs %d)",
+                             (unsigned int)comp->block_idx,
+                             rdma->local_ram_blocks.nb_blocks);
+                ret = -EIO;
+                break;
+            }
             block = &(rdma->local_ram_blocks.block[comp->block_idx]);
 
             host_addr = block->local_host_addr +
@@ -3072,8 +3079,23 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
                 trace_qemu_rdma_registration_handle_register_loop(count,
                          reg->current_index, reg->key.current_addr, reg->chunks);
 
+                if (reg->current_index >= rdma->local_ram_blocks.nb_blocks) {
+                    error_report("rdma: 'register' bad block index %u (vs %d)",
+                                 (unsigned int)reg->current_index,
+                                 rdma->local_ram_blocks.nb_blocks);
+                    ret = -ENOENT;
+                    break;
+                }
                 block = &(rdma->local_ram_blocks.block[reg->current_index]);
                 if (block->is_ram_block) {
+                    if (block->offset > reg->key.current_addr) {
+                        error_report("rdma: bad register address for block %s"
+                            " offset: %" PRIx64 " current_addr: %" PRIx64,
+                            block->block_name, block->offset,
+                            reg->key.current_addr);
+                        ret = -ERANGE;
+                        break;
+                    }
                     host_addr = (block->local_host_addr +
                                 (reg->key.current_addr - block->offset));
                     chunk = ram_chunk_index(block->local_host_addr,
@@ -3082,6 +3104,14 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
                     chunk = reg->key.chunk;
                     host_addr = block->local_host_addr +
                         (reg->key.chunk * (1UL << RDMA_REG_CHUNK_SHIFT));
+                    /* Check for particularly bad chunk value */
+                    if (host_addr < (void *)block->local_host_addr) {
+                        error_report("rdma: bad chunk for block %s"
+                            " chunk: %" PRIx64,
+                            block->block_name, reg->key.chunk);
+                        ret = -ERANGE;
+                        break;
+                    }
                 }
                 chunk_start = ram_chunk_start(block, chunk);
                 chunk_end = ram_chunk_end(block, chunk + reg->chunks);
-- 
2.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v2 12/12] Fail more cleanly in mismatched RAM cases
  2015-06-11 17:17 [Qemu-devel] [PATCH v2 00/12] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
                   ` (10 preceding siblings ...)
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 11/12] Sanity check RDMA remote data Dr. David Alan Gilbert (git)
@ 2015-06-11 17:17 ` Dr. David Alan Gilbert (git)
  11 siblings, 0 replies; 36+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-11 17:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

If the number of RAMBlocks was different on the source from the
destination, QEMU would hang waiting for a disconnect on the source
and wouldn't release from that hang until the destination was manually
killed.

Mark the stream as being in error, this causes the destination to die
and the source to carry on.

(It still gets a whole bunch of warnings on the destination, and I've
not managed to complete another migration after the 1st one, still
progress).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index 1f3a9fb..9156308 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3320,6 +3320,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
                         "Your QEMU command line parameters are probably "
                         "not identical on both the source and destination.",
                         local->nb_blocks, nb_dest_blocks);
+            rdma->error_state = -EINVAL;
             return -EINVAL;
         }
 
@@ -3335,6 +3336,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
                             "vs %" PRIu64, local->block[i].block_name, i,
                             local->block[i].length,
                             rdma->dest_blocks[i].length);
+                rdma->error_state = -EINVAL;
                 return -EINVAL;
             }
             local->block[i].remote_host_addr =
-- 
2.4.2

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/12] rdma typos
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 04/12] rdma typos Dr. David Alan Gilbert (git)
@ 2015-06-11 17:56   ` Michael R. Hines
  2015-06-11 18:37     ` Dr. David Alan Gilbert
  2015-07-01  8:37   ` Juan Quintela
  1 sibling, 1 reply; 36+ messages in thread
From: Michael R. Hines @ 2015-06-11 17:56 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: amit.shah, arei.gonglei, mrhines, quintela

On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> A couple of typo fixes.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   migration/rdma.c | 6 +++---
>   trace-events     | 4 ++--
>   2 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index bc73ff8..44ed996 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1215,7 +1215,7 @@ const char *print_wrid(int wrid)
>
>   /*
>    * Perform a non-optimized memory unregistration after every transfer
> - * for demonsration purposes, only if pin-all is not requested.
> + * for demonstration purposes, only if pin-all is not requested.
>    *
>    * Potential optimizations:
>    * 1. Start a new thread to run this function continuously
> @@ -3279,7 +3279,7 @@ static void rdma_accept_incoming_migration(void *opaque)
>       QEMUFile *f;
>       Error *local_err = NULL, **errp = &local_err;
>
> -    trace_qemu_dma_accept_incoming_migration();
> +    trace_qemu_rdma_accept_incoming_migration();
>       ret = qemu_rdma_accept(rdma);
>
>       if (ret) {
> @@ -3287,7 +3287,7 @@ static void rdma_accept_incoming_migration(void *opaque)
>           return;
>       }
>
> -    trace_qemu_dma_accept_incoming_migration_accepted();
> +    trace_qemu_rdma_accept_incoming_migration_accepted();
>
>       f = qemu_fopen_rdma(rdma, "rb");
>       if (f == NULL) {
> diff --git a/trace-events b/trace-events
> index 2662ffa..8b468fe 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1398,8 +1398,8 @@ migrate_pending(uint64_t size, uint64_t max) "pending size %" PRIu64 " max %" PR
>   migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64
>
>   # migration/rdma.c
> -qemu_dma_accept_incoming_migration(void) ""
> -qemu_dma_accept_incoming_migration_accepted(void) ""
> +qemu_rdma_accept_incoming_migration(void) ""
> +qemu_rdma_accept_incoming_migration_accepted(void) ""

What happened to the actual message inside the quotes? =)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/12] Translate offsets to destination address space
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 06/12] Translate offsets to destination address space Dr. David Alan Gilbert (git)
@ 2015-06-11 18:12   ` Michael R. Hines
  2015-06-11 18:58     ` Dr. David Alan Gilbert
  2015-07-01  8:43   ` Juan Quintela
  1 sibling, 1 reply; 36+ messages in thread
From: Michael R. Hines @ 2015-06-11 18:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: amit.shah, arei.gonglei, mrhines, quintela

On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> The 'offset' field in RDMACompress and 'current_addr' field
> in RDMARegister are commented as being offsets within a particular
> RAMBlock, however they appear to actually be offsets within the
> ram_addr_t space.
>
> The code currently assumes that the offsets on the source/destination
> match, this change removes the need for the assumption for these
> structures by translating the addresses into the ram_addr_t space of
> the destination host.
>
> Note: An alternative would be to change the fields to actually
> take the data they're commented for; this would potentially be
> simpler but would break stream compatibility for those cases
> that currently work.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   migration/rdma.c | 31 ++++++++++++++++++++++++-------
>   1 file changed, 24 insertions(+), 7 deletions(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 9532461..cb66721 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -411,7 +411,7 @@ static void network_to_control(RDMAControlHeader *control)
>    */
>   typedef struct QEMU_PACKED {
>       union QEMU_PACKED {
> -        uint64_t current_addr;  /* offset into the ramblock of the chunk */
> +        uint64_t current_addr;  /* offset into the ram_addr_t space */
>           uint64_t chunk;         /* chunk to lookup if unregistering */
>       } key;
>       uint32_t current_index; /* which ramblock the chunk belongs to */
> @@ -419,8 +419,19 @@ typedef struct QEMU_PACKED {
>       uint64_t chunks;            /* how many sequential chunks to register */
>   } RDMARegister;
>
> -static void register_to_network(RDMARegister *reg)
> +static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
>   {
> +    RDMALocalBlock *local_block;
> +    local_block  = &rdma->local_ram_blocks.block[reg->current_index];
> +
> +    if (local_block->is_ram_block) {
> +        /*
> +         * current_addr as passed in is an address in the local ram_addr_t
> +         * space, we need to translate this for the destination
> +         */
> +        reg->key.current_addr -= local_block->offset;
> +        reg->key.current_addr += rdma->dest_blocks[reg->current_index].offset;
> +    }
>       reg->key.current_addr = htonll(reg->key.current_addr);
>       reg->current_index = htonl(reg->current_index);
>       reg->chunks = htonll(reg->chunks);
> @@ -436,13 +447,19 @@ static void network_to_register(RDMARegister *reg)
>   typedef struct QEMU_PACKED {
>       uint32_t value;     /* if zero, we will madvise() */
>       uint32_t block_idx; /* which ram block index */
> -    uint64_t offset;    /* where in the remote ramblock this chunk */
> +    uint64_t offset;    /* Address in remote ram_addr_t space */
>       uint64_t length;    /* length of the chunk */
>   } RDMACompress;
>
> -static void compress_to_network(RDMACompress *comp)
> +static void compress_to_network(RDMAContext *rdma, RDMACompress *comp)
>   {
>       comp->value = htonl(comp->value);
> +    /*
> +     * comp->offset as passed in is an address in the local ram_addr_t
> +     * space, we need to translate this for the destination
> +     */
> +    comp->offset -= rdma->local_ram_blocks.block[comp->block_idx].offset;
> +    comp->offset += rdma->dest_blocks[comp->block_idx].offset;
>       comp->block_idx = htonl(comp->block_idx);
>       comp->offset = htonll(comp->offset);
>       comp->length = htonll(comp->length);

So, why add the destination block's offset on the source side
just for it to be re-adjusted again when it gets to the destination side?

Can you just stop at this:

+        reg->key.current_addr -= local_block->offset;

Without this:

+        reg->key.current_addr += 
rdma->dest_blocks[reg->current_index].offset;

... on the source, followed by this on the destionation:

+    comp->offset -= rdma->local_ram_blocks.block[comp->block_idx].offset;

Without this:

+    comp->offset += rdma->dest_blocks[comp->block_idx].offset;

Did I follow correctly?

> @@ -1288,7 +1305,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
>           rdma->total_registrations--;
>
>           reg.key.chunk = chunk;
> -        register_to_network(&reg);
> +        register_to_network(rdma, &reg);
>           ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
>                                   &resp, NULL, NULL);
>           if (ret < 0) {
> @@ -1909,7 +1926,7 @@ retry:
>                   trace_qemu_rdma_write_one_zero(chunk, sge.length,
>                                                  current_index, current_addr);
>
> -                compress_to_network(&comp);
> +                compress_to_network(rdma, &comp);
>                   ret = qemu_rdma_exchange_send(rdma, &head,
>                                   (uint8_t *) &comp, NULL, NULL, NULL);
>
> @@ -1936,7 +1953,7 @@ retry:
>               trace_qemu_rdma_write_one_sendreg(chunk, sge.length, current_index,
>                                                 current_addr);
>
> -            register_to_network(&reg);
> +            register_to_network(rdma, &reg);
>               ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
>                                       &resp, &reg_result_idx, NULL);
>               if (ret < 0) {

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/12] Rework ram_control_load_hook to hook during block load
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 07/12] Rework ram_control_load_hook to hook during block load Dr. David Alan Gilbert (git)
@ 2015-06-11 18:20   ` Michael R. Hines
  2015-06-11 18:44     ` Dr. David Alan Gilbert
  2015-07-01  8:49   ` Juan Quintela
  1 sibling, 1 reply; 36+ messages in thread
From: Michael R. Hines @ 2015-06-11 18:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: amit.shah, arei.gonglei, mrhines, quintela

On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> We need the names of RAMBlocks as they're loaded for RDMA,
> reuse a slightly modified ram_control_load_hook:
>    a) Pass a 'data' parameter to use for the name in the block-reg
>       case
>    b) Only some hook types now require the presence of a hook function.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   arch_init.c                   |  4 +++-
>   include/migration/migration.h |  2 +-
>   include/migration/qemu-file.h | 14 +++++++++-----
>   migration/qemu-file.c         | 16 +++++++++++-----
>   migration/rdma.c              | 28 ++++++++++++++++++++++------
>   trace-events                  |  2 +-
>   6 files changed, 47 insertions(+), 19 deletions(-)
>
> diff --git a/arch_init.c b/arch_init.c
> index d294474..dc9cc7e 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -1569,6 +1569,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>                                   error_report_err(local_err);
>                               }
>                           }
> +                        ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG,
> +                                              block->idstr);
>                           break;
>                       }
>                   }
> @@ -1637,7 +1639,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>               break;
>           default:
>               if (flags & RAM_SAVE_FLAG_HOOK) {
> -                ram_control_load_hook(f, flags);
> +                ram_control_load_hook(f, RAM_CONTROL_HOOK, NULL);
>               } else {
>                   error_report("Unknown combination of migration flags: %#x",
>                                flags);
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index a6e025a..096e1ea 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -164,7 +164,7 @@ int migrate_decompress_threads(void);
>
>   void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
>   void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
> -void ram_control_load_hook(QEMUFile *f, uint64_t flags);
> +void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data);
>
>   /* Whenever this is found in the data stream, the flags
>    * will be passed to ram_control_load_hook in the incoming-migration
> diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
> index a01c5b8..7aafe19 100644
> --- a/include/migration/qemu-file.h
> +++ b/include/migration/qemu-file.h
> @@ -63,16 +63,20 @@ typedef ssize_t (QEMUFileWritevBufferFunc)(void *opaque, struct iovec *iov,
>   /*
>    * This function provides hooks around different
>    * stages of RAM migration.
> + * 'opaque' is the backend specific data in QEMUFile
> + * 'data' is call specific data associated with the 'flags' value
>    */
> -typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags);
> +typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags,
> +                              void *data);
>
>   /*
>    * Constants used by ram_control_* hooks
>    */
> -#define RAM_CONTROL_SETUP    0
> -#define RAM_CONTROL_ROUND    1
> -#define RAM_CONTROL_HOOK     2
> -#define RAM_CONTROL_FINISH   3
> +#define RAM_CONTROL_SETUP     0
> +#define RAM_CONTROL_ROUND     1
> +#define RAM_CONTROL_HOOK      2
> +#define RAM_CONTROL_FINISH    3
> +#define RAM_CONTROL_BLOCK_REG 4
>
>   /*
>    * This function allows override of where the RAM page
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 2750365..5493977 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -128,7 +128,7 @@ void ram_control_before_iterate(QEMUFile *f, uint64_t flags)
>       int ret = 0;
>
>       if (f->ops->before_ram_iterate) {
> -        ret = f->ops->before_ram_iterate(f, f->opaque, flags);
> +        ret = f->ops->before_ram_iterate(f, f->opaque, flags, NULL);
>           if (ret < 0) {
>               qemu_file_set_error(f, ret);
>           }
> @@ -140,24 +140,30 @@ void ram_control_after_iterate(QEMUFile *f, uint64_t flags)
>       int ret = 0;
>
>       if (f->ops->after_ram_iterate) {
> -        ret = f->ops->after_ram_iterate(f, f->opaque, flags);
> +        ret = f->ops->after_ram_iterate(f, f->opaque, flags, NULL);
>           if (ret < 0) {
>               qemu_file_set_error(f, ret);
>           }
>       }
>   }
>
> -void ram_control_load_hook(QEMUFile *f, uint64_t flags)
> +void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data)
>   {
>       int ret = -EINVAL;
>
>       if (f->ops->hook_ram_load) {
> -        ret = f->ops->hook_ram_load(f, f->opaque, flags);
> +        ret = f->ops->hook_ram_load(f, f->opaque, flags, data);
>           if (ret < 0) {
>               qemu_file_set_error(f, ret);
>           }
>       } else {
> -        qemu_file_set_error(f, ret);
> +        /*
> +         * Hook is a hook specifically requested by the source sending a flag
> +         * that expects there to be a hook on the destination.
> +         */
> +        if (flags == RAM_CONTROL_HOOK) {
> +            qemu_file_set_error(f, ret);
> +        }
>       }
>   }
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index cb66721..396329c 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2905,8 +2905,7 @@ err_rdma_dest_wait:
>    *
>    * Keep doing this until the source tells us to stop.
>    */
> -static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque,
> -                                         uint64_t flags)
> +static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>   {
>       RDMAControlHeader reg_resp = { .len = sizeof(RDMARegisterResult),
>                                  .type = RDMA_CONTROL_REGISTER_RESULT,
> @@ -2936,7 +2935,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque,
>       CHECK_ERROR_STATE();
>
>       do {
> -        trace_qemu_rdma_registration_handle_wait(flags);
> +        trace_qemu_rdma_registration_handle_wait();
>
>           ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_NONE);
>
> @@ -3124,8 +3123,25 @@ out:
>       return ret;
>   }
>
> +static int rdma_load_hook(QEMUFile *f, void *opaque, uint64_t flags, void *data)
> +{
> +    switch (flags) {
> +    case RAM_CONTROL_BLOCK_REG:
> +        /* TODO A later patch */
> +        return 0;
> +        break;
> +
> +    case RAM_CONTROL_HOOK:
> +        return qemu_rdma_registration_handle(f, opaque);
> +
> +    default:
> +        /* Shouldn't be called with any other values */
> +        abort();
> +    }
> +}
> +
>   static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
> -                                        uint64_t flags)
> +                                        uint64_t flags, void *data)
>   {
>       QEMUFileRDMA *rfile = opaque;
>       RDMAContext *rdma = rfile->rdma;
> @@ -3144,7 +3160,7 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
>    * First, flush writes, if any.
>    */
>   static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
> -                                       uint64_t flags)
> +                                       uint64_t flags, void *data)
>   {
>       Error *local_err = NULL, **errp = &local_err;
>       QEMUFileRDMA *rfile = opaque;
> @@ -3266,7 +3282,7 @@ static const QEMUFileOps rdma_read_ops = {
>       .get_buffer    = qemu_rdma_get_buffer,
>       .get_fd        = qemu_rdma_get_fd,
>       .close         = qemu_rdma_close,
> -    .hook_ram_load = qemu_rdma_registration_handle,
> +    .hook_ram_load = rdma_load_hook,
>   };
>
>   static const QEMUFileOps rdma_write_ops = {
> diff --git a/trace-events b/trace-events
> index 557770c..0f37a4b 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1432,7 +1432,7 @@ qemu_rdma_registration_handle_register_rkey(int rkey) "%x"
>   qemu_rdma_registration_handle_unregister(int requests) "%d requests"
>   qemu_rdma_registration_handle_unregister_loop(int count, int index, uint64_t chunk) "Unregistration request (%d): index %d, chunk %" PRIu64
>   qemu_rdma_registration_handle_unregister_success(uint64_t chunk) "%" PRIu64
> -qemu_rdma_registration_handle_wait(uint64_t flags) "Waiting for next request %" PRIu64
> +qemu_rdma_registration_handle_wait(void) ""

Are you using some kind of script to generate these trace prototypes?
What happened to the message? =)

>   qemu_rdma_registration_start(uint64_t flags) "%" PRIu64
>   qemu_rdma_registration_stop(uint64_t flags) "%" PRIu64
>   qemu_rdma_registration_stop_ram(void) ""

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/12] Allow rdma_delete_block to work without the hash
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 08/12] Allow rdma_delete_block to work without the hash Dr. David Alan Gilbert (git)
@ 2015-06-11 18:36   ` Michael R. Hines
  2015-06-11 18:39     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 36+ messages in thread
From: Michael R. Hines @ 2015-06-11 18:36 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: amit.shah, arei.gonglei, mrhines, quintela

On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> In the next patch we remove the hash on the destination,
> rdma_delete_block does two things with the hash which can be avoided:
>    a) The caller passes the offset and rdma_delete_block looks it up
>       in the hash; fixed by getting the caller to pass the block
>    b) The hash gets recreated after deletion; fixed by making that
>       conditional on the hash being initialised.
>
> While this function is currently only used during cleanup, Michael
> asked that we keep it general for future dynamic block registration
> work.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   migration/rdma.c | 27 ++++++++++++++++-----------
>   trace-events     |  2 +-
>   2 files changed, 17 insertions(+), 12 deletions(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 396329c..8d99378 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -617,16 +617,19 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
>       return 0;
>   }
>
> -static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
> +/*
> + * Note: If used outside of cleanup, the caller must ensure that the destination
> + * block structures are also updated
> + */
> +static int rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
>   {
>       RDMALocalBlocks *local = &rdma->local_ram_blocks;
> -    RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
> -        (void *) block_offset);
>       RDMALocalBlock *old = local->block;
>       int x;
>
> -    assert(block);
> -
> +    if (rdma->blockmap) {
> +        g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)block->offset);
> +    }
>       if (block->pmr) {
>           int j;
>
> @@ -659,8 +662,11 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
>       g_free(block->block_name);
>       block->block_name = NULL;
>
> -    for (x = 0; x < local->nb_blocks; x++) {
> -        g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)old[x].offset);
> +    if (rdma->blockmap) {
> +        for (x = 0; x < local->nb_blocks; x++) {
> +            g_hash_table_remove(rdma->blockmap,
> +                                (void *)(uintptr_t)old[x].offset);
> +        }
>       }
>
>       if (local->nb_blocks > 1) {
> @@ -682,8 +688,7 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
>           local->block = NULL;
>       }
>
> -    trace_rdma_delete_block(local->nb_blocks,
> -                           (uintptr_t)block->local_host_addr,
> +    trace_rdma_delete_block(block, (uintptr_t)block->local_host_addr,
>                              block->offset, block->length,
>                               (uintptr_t)(block->local_host_addr + block->length),
>                              BITS_TO_LONGS(block->nb_chunks) *
> @@ -693,7 +698,7 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
>
>       local->nb_blocks--;
>
> -    if (local->nb_blocks) {
> +    if (local->nb_blocks && rdma->blockmap) {
>           for (x = 0; x < local->nb_blocks; x++) {
>               g_hash_table_insert(rdma->blockmap,
>                                   (void *)(uintptr_t)local->block[x].offset,
> @@ -2214,7 +2219,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>
>       if (rdma->local_ram_blocks.block) {
>           while (rdma->local_ram_blocks.nb_blocks) {
> -            rdma_delete_block(rdma, rdma->local_ram_blocks.block->offset);
> +            rdma_delete_block(rdma, &rdma->local_ram_blocks.block[0]);
>           }
>       }

Looks good overall. Maybe this is a silly question, but have you done
a few migrations over actual RDMA hardware yet?

Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>

> diff --git a/trace-events b/trace-events
> index 0f37a4b..7dff362 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1452,7 +1452,7 @@ qemu_rdma_write_one_sendreg(uint64_t chunk, int len, int index, int64_t offset)
>   qemu_rdma_write_one_top(uint64_t chunks, uint64_t size) "Writing %" PRIu64 " chunks, (%" PRIu64 " MB)"
>   qemu_rdma_write_one_zero(uint64_t chunk, int len, int index, int64_t offset) "Entire chunk is zero, sending compress: %" PRIu64 " for %d bytes, index: %d, offset: %" PRId64
>   rdma_add_block(const char *block_name, int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Added Block: '%s':%d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
> -rdma_delete_block(int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Deleted Block: %d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
> +rdma_delete_block(void *block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Deleted Block: %p, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
>   rdma_start_incoming_migration(void) ""
>   rdma_start_incoming_migration_after_dest_init(void) ""
>   rdma_start_incoming_migration_after_rdma_listen(void) ""
These messages are also empty? What happened to them? =)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/12] rdma typos
  2015-06-11 17:56   ` Michael R. Hines
@ 2015-06-11 18:37     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 36+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-11 18:37 UTC (permalink / raw)
  To: Michael R. Hines; +Cc: amit.shah, quintela, arei.gonglei, qemu-devel, mrhines

* Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
> On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> >A couple of typo fixes.
> >
> >Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >---
> >  migration/rdma.c | 6 +++---
> >  trace-events     | 4 ++--
> >  2 files changed, 5 insertions(+), 5 deletions(-)
> >
> >diff --git a/migration/rdma.c b/migration/rdma.c
> >index bc73ff8..44ed996 100644
> >--- a/migration/rdma.c
> >+++ b/migration/rdma.c
> >@@ -1215,7 +1215,7 @@ const char *print_wrid(int wrid)
> >
> >  /*
> >   * Perform a non-optimized memory unregistration after every transfer
> >- * for demonsration purposes, only if pin-all is not requested.
> >+ * for demonstration purposes, only if pin-all is not requested.
> >   *
> >   * Potential optimizations:
> >   * 1. Start a new thread to run this function continuously
> >@@ -3279,7 +3279,7 @@ static void rdma_accept_incoming_migration(void *opaque)
> >      QEMUFile *f;
> >      Error *local_err = NULL, **errp = &local_err;
> >
> >-    trace_qemu_dma_accept_incoming_migration();
> >+    trace_qemu_rdma_accept_incoming_migration();
> >      ret = qemu_rdma_accept(rdma);
> >
> >      if (ret) {
> >@@ -3287,7 +3287,7 @@ static void rdma_accept_incoming_migration(void *opaque)
> >          return;
> >      }
> >
> >-    trace_qemu_dma_accept_incoming_migration_accepted();
> >+    trace_qemu_rdma_accept_incoming_migration_accepted();
> >
> >      f = qemu_fopen_rdma(rdma, "rb");
> >      if (f == NULL) {
> >diff --git a/trace-events b/trace-events
> >index 2662ffa..8b468fe 100644
> >--- a/trace-events
> >+++ b/trace-events
> >@@ -1398,8 +1398,8 @@ migrate_pending(uint64_t size, uint64_t max) "pending size %" PRIu64 " max %" PR
> >  migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64
> >
> >  # migration/rdma.c
> >-qemu_dma_accept_incoming_migration(void) ""
> >-qemu_dma_accept_incoming_migration_accepted(void) ""
> >+qemu_rdma_accept_incoming_migration(void) ""
> >+qemu_rdma_accept_incoming_migration_accepted(void) ""
> 
> What happened to the actual message inside the quotes? =)

You don't need them in most cases; the trace tools normally print
the name of the event.

Dave

> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/12] Allow rdma_delete_block to work without the hash
  2015-06-11 18:36   ` Michael R. Hines
@ 2015-06-11 18:39     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 36+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-11 18:39 UTC (permalink / raw)
  To: Michael R. Hines; +Cc: amit.shah, quintela, arei.gonglei, qemu-devel, mrhines

* Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
> On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> >In the next patch we remove the hash on the destination,
> >rdma_delete_block does two things with the hash which can be avoided:
> >   a) The caller passes the offset and rdma_delete_block looks it up
> >      in the hash; fixed by getting the caller to pass the block
> >   b) The hash gets recreated after deletion; fixed by making that
> >      conditional on the hash being initialised.
> >
> >While this function is currently only used during cleanup, Michael
> >asked that we keep it general for future dynamic block registration
> >work.
> >
> >Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >---
> >  migration/rdma.c | 27 ++++++++++++++++-----------
> >  trace-events     |  2 +-
> >  2 files changed, 17 insertions(+), 12 deletions(-)
> >
> >diff --git a/migration/rdma.c b/migration/rdma.c
> >index 396329c..8d99378 100644
> >--- a/migration/rdma.c
> >+++ b/migration/rdma.c
> >@@ -617,16 +617,19 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
> >      return 0;
> >  }
> >
> >-static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
> >+/*
> >+ * Note: If used outside of cleanup, the caller must ensure that the destination
> >+ * block structures are also updated
> >+ */
> >+static int rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
> >  {
> >      RDMALocalBlocks *local = &rdma->local_ram_blocks;
> >-    RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
> >-        (void *) block_offset);
> >      RDMALocalBlock *old = local->block;
> >      int x;
> >
> >-    assert(block);
> >-
> >+    if (rdma->blockmap) {
> >+        g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)block->offset);
> >+    }
> >      if (block->pmr) {
> >          int j;
> >
> >@@ -659,8 +662,11 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
> >      g_free(block->block_name);
> >      block->block_name = NULL;
> >
> >-    for (x = 0; x < local->nb_blocks; x++) {
> >-        g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)old[x].offset);
> >+    if (rdma->blockmap) {
> >+        for (x = 0; x < local->nb_blocks; x++) {
> >+            g_hash_table_remove(rdma->blockmap,
> >+                                (void *)(uintptr_t)old[x].offset);
> >+        }
> >      }
> >
> >      if (local->nb_blocks > 1) {
> >@@ -682,8 +688,7 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
> >          local->block = NULL;
> >      }
> >
> >-    trace_rdma_delete_block(local->nb_blocks,
> >-                           (uintptr_t)block->local_host_addr,
> >+    trace_rdma_delete_block(block, (uintptr_t)block->local_host_addr,
> >                             block->offset, block->length,
> >                              (uintptr_t)(block->local_host_addr + block->length),
> >                             BITS_TO_LONGS(block->nb_chunks) *
> >@@ -693,7 +698,7 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
> >
> >      local->nb_blocks--;
> >
> >-    if (local->nb_blocks) {
> >+    if (local->nb_blocks && rdma->blockmap) {
> >          for (x = 0; x < local->nb_blocks; x++) {
> >              g_hash_table_insert(rdma->blockmap,
> >                                  (void *)(uintptr_t)local->block[x].offset,
> >@@ -2214,7 +2219,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
> >
> >      if (rdma->local_ram_blocks.block) {
> >          while (rdma->local_ram_blocks.nb_blocks) {
> >-            rdma_delete_block(rdma, rdma->local_ram_blocks.block->offset);
> >+            rdma_delete_block(rdma, &rdma->local_ram_blocks.block[0]);
> >          }
> >      }
> 
> Looks good overall. Maybe this is a silly question, but have you done
> a few migrations over actual RDMA hardware yet?

Yes, I wouldn't call it heavy testing but I've done a few basic f22 migrates
with load.

Dave

> 
> Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>
> 
> >diff --git a/trace-events b/trace-events
> >index 0f37a4b..7dff362 100644
> >--- a/trace-events
> >+++ b/trace-events
> >@@ -1452,7 +1452,7 @@ qemu_rdma_write_one_sendreg(uint64_t chunk, int len, int index, int64_t offset)
> >  qemu_rdma_write_one_top(uint64_t chunks, uint64_t size) "Writing %" PRIu64 " chunks, (%" PRIu64 " MB)"
> >  qemu_rdma_write_one_zero(uint64_t chunk, int len, int index, int64_t offset) "Entire chunk is zero, sending compress: %" PRIu64 " for %d bytes, index: %d, offset: %" PRId64
> >  rdma_add_block(const char *block_name, int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Added Block: '%s':%d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
> >-rdma_delete_block(int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Deleted Block: %d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
> >+rdma_delete_block(void *block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Deleted Block: %p, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
> >  rdma_start_incoming_migration(void) ""
> >  rdma_start_incoming_migration_after_dest_init(void) ""
> >  rdma_start_incoming_migration_after_rdma_listen(void) ""
> These messages are also empty? What happened to them? =)
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/12] Rework ram block hash
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 09/12] Rework ram block hash Dr. David Alan Gilbert (git)
@ 2015-06-11 18:40   ` Michael R. Hines
  2015-06-11 18:45     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 36+ messages in thread
From: Michael R. Hines @ 2015-06-11 18:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: amit.shah, arei.gonglei, mrhines, quintela

On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> RDMA uses a hash from block offset->RAM Block; this isn't needed
> on the destination, and it becomes harder to maintain after the next
> patch in the series that sorts the block list.
>
> Split the hash so that it's only generated on the source.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   migration/rdma.c | 32 ++++++++++++++++++++------------
>   1 file changed, 20 insertions(+), 12 deletions(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 8d99378..f541586 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -533,23 +533,22 @@ static int rdma_add_block(RDMAContext *rdma, const char *block_name,
>                            ram_addr_t block_offset, uint64_t length)
>   {
>       RDMALocalBlocks *local = &rdma->local_ram_blocks;
> -    RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
> -        (void *)(uintptr_t)block_offset);
> +    RDMALocalBlock *block;
>       RDMALocalBlock *old = local->block;
>
> -    assert(block == NULL);
> -
>       local->block = g_malloc0(sizeof(RDMALocalBlock) * (local->nb_blocks + 1));
>
>       if (local->nb_blocks) {
>           int x;
>
> -        for (x = 0; x < local->nb_blocks; x++) {
> -            g_hash_table_remove(rdma->blockmap,
> -                                (void *)(uintptr_t)old[x].offset);
> -            g_hash_table_insert(rdma->blockmap,
> -                                (void *)(uintptr_t)old[x].offset,
> -                                &local->block[x]);
> +        if (rdma->blockmap) {
> +            for (x = 0; x < local->nb_blocks; x++) {
> +                g_hash_table_remove(rdma->blockmap,
> +                                    (void *)(uintptr_t)old[x].offset);
> +                g_hash_table_insert(rdma->blockmap,
> +                                    (void *)(uintptr_t)old[x].offset,
> +                                    &local->block[x]);
> +            }
>           }
>           memcpy(local->block, old, sizeof(RDMALocalBlock) * local->nb_blocks);
>           g_free(old);
> @@ -571,7 +570,9 @@ static int rdma_add_block(RDMAContext *rdma, const char *block_name,
>
>       block->is_ram_block = local->init ? false : true;
>
> -    g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
> +    if (rdma->blockmap) {
> +        g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
> +    }
>
>       trace_rdma_add_block(block_name, local->nb_blocks,
>                            (uintptr_t) block->local_host_addr,
> @@ -607,7 +608,6 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
>       RDMALocalBlocks *local = &rdma->local_ram_blocks;
>
>       assert(rdma->blockmap == NULL);
> -    rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
>       memset(local, 0, sizeof *local);
>       qemu_ram_foreach_block(qemu_rdma_init_one_block, rdma);
>       trace_qemu_rdma_init_ram_blocks(local->nb_blocks);
> @@ -2292,6 +2292,14 @@ static int qemu_rdma_source_init(RDMAContext *rdma, Error **errp, bool pin_all)
>           goto err_rdma_source_init;
>       }
>
> +    /* Build the hash that maps from offset to RAMBlock */
> +    rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
> +    for (idx = 0; idx < rdma->local_ram_blocks.nb_blocks; idx++) {
> +        g_hash_table_insert(rdma->blockmap,
> +                (void *)(uintptr_t)rdma->local_ram_blocks.block[idx].offset,
> +                &rdma->local_ram_blocks.block[idx]);
> +    }
> +
>       for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
>           ret = qemu_rdma_reg_control(rdma, idx);
>           if (ret) {

You didn't want to use the ID string as a key? I forget....

Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/12] Rework ram_control_load_hook to hook during block load
  2015-06-11 18:20   ` Michael R. Hines
@ 2015-06-11 18:44     ` Dr. David Alan Gilbert
  2015-07-01  8:47       ` Juan Quintela
  0 siblings, 1 reply; 36+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-11 18:44 UTC (permalink / raw)
  To: Michael R. Hines; +Cc: amit.shah, quintela, arei.gonglei, qemu-devel, mrhines

* Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
> On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> >We need the names of RAMBlocks as they're loaded for RDMA,
> >reuse a slightly modified ram_control_load_hook:
> >   a) Pass a 'data' parameter to use for the name in the block-reg
> >      case
> >   b) Only some hook types now require the presence of a hook function.
> >
> >Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >---
> >  arch_init.c                   |  4 +++-
> >  include/migration/migration.h |  2 +-
> >  include/migration/qemu-file.h | 14 +++++++++-----
> >  migration/qemu-file.c         | 16 +++++++++++-----
> >  migration/rdma.c              | 28 ++++++++++++++++++++++------
> >  trace-events                  |  2 +-
> >  6 files changed, 47 insertions(+), 19 deletions(-)
> >
> >diff --git a/arch_init.c b/arch_init.c
> >index d294474..dc9cc7e 100644
> >--- a/arch_init.c
> >+++ b/arch_init.c
> >@@ -1569,6 +1569,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >                                  error_report_err(local_err);
> >                              }
> >                          }
> >+                        ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG,
> >+                                              block->idstr);
> >                          break;
> >                      }
> >                  }
> >@@ -1637,7 +1639,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >              break;
> >          default:
> >              if (flags & RAM_SAVE_FLAG_HOOK) {
> >-                ram_control_load_hook(f, flags);
> >+                ram_control_load_hook(f, RAM_CONTROL_HOOK, NULL);
> >              } else {
> >                  error_report("Unknown combination of migration flags: %#x",
> >                               flags);
> >diff --git a/include/migration/migration.h b/include/migration/migration.h
> >index a6e025a..096e1ea 100644
> >--- a/include/migration/migration.h
> >+++ b/include/migration/migration.h
> >@@ -164,7 +164,7 @@ int migrate_decompress_threads(void);
> >
> >  void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
> >  void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
> >-void ram_control_load_hook(QEMUFile *f, uint64_t flags);
> >+void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data);
> >
> >  /* Whenever this is found in the data stream, the flags
> >   * will be passed to ram_control_load_hook in the incoming-migration
> >diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
> >index a01c5b8..7aafe19 100644
> >--- a/include/migration/qemu-file.h
> >+++ b/include/migration/qemu-file.h
> >@@ -63,16 +63,20 @@ typedef ssize_t (QEMUFileWritevBufferFunc)(void *opaque, struct iovec *iov,
> >  /*
> >   * This function provides hooks around different
> >   * stages of RAM migration.
> >+ * 'opaque' is the backend specific data in QEMUFile
> >+ * 'data' is call specific data associated with the 'flags' value
> >   */
> >-typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags);
> >+typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags,
> >+                              void *data);
> >
> >  /*
> >   * Constants used by ram_control_* hooks
> >   */
> >-#define RAM_CONTROL_SETUP    0
> >-#define RAM_CONTROL_ROUND    1
> >-#define RAM_CONTROL_HOOK     2
> >-#define RAM_CONTROL_FINISH   3
> >+#define RAM_CONTROL_SETUP     0
> >+#define RAM_CONTROL_ROUND     1
> >+#define RAM_CONTROL_HOOK      2
> >+#define RAM_CONTROL_FINISH    3
> >+#define RAM_CONTROL_BLOCK_REG 4
> >
> >  /*
> >   * This function allows override of where the RAM page
> >diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> >index 2750365..5493977 100644
> >--- a/migration/qemu-file.c
> >+++ b/migration/qemu-file.c
> >@@ -128,7 +128,7 @@ void ram_control_before_iterate(QEMUFile *f, uint64_t flags)
> >      int ret = 0;
> >
> >      if (f->ops->before_ram_iterate) {
> >-        ret = f->ops->before_ram_iterate(f, f->opaque, flags);
> >+        ret = f->ops->before_ram_iterate(f, f->opaque, flags, NULL);
> >          if (ret < 0) {
> >              qemu_file_set_error(f, ret);
> >          }
> >@@ -140,24 +140,30 @@ void ram_control_after_iterate(QEMUFile *f, uint64_t flags)
> >      int ret = 0;
> >
> >      if (f->ops->after_ram_iterate) {
> >-        ret = f->ops->after_ram_iterate(f, f->opaque, flags);
> >+        ret = f->ops->after_ram_iterate(f, f->opaque, flags, NULL);
> >          if (ret < 0) {
> >              qemu_file_set_error(f, ret);
> >          }
> >      }
> >  }
> >
> >-void ram_control_load_hook(QEMUFile *f, uint64_t flags)
> >+void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data)
> >  {
> >      int ret = -EINVAL;
> >
> >      if (f->ops->hook_ram_load) {
> >-        ret = f->ops->hook_ram_load(f, f->opaque, flags);
> >+        ret = f->ops->hook_ram_load(f, f->opaque, flags, data);
> >          if (ret < 0) {
> >              qemu_file_set_error(f, ret);
> >          }
> >      } else {
> >-        qemu_file_set_error(f, ret);
> >+        /*
> >+         * Hook is a hook specifically requested by the source sending a flag
> >+         * that expects there to be a hook on the destination.
> >+         */
> >+        if (flags == RAM_CONTROL_HOOK) {
> >+            qemu_file_set_error(f, ret);
> >+        }
> >      }
> >  }
> >
> >diff --git a/migration/rdma.c b/migration/rdma.c
> >index cb66721..396329c 100644
> >--- a/migration/rdma.c
> >+++ b/migration/rdma.c
> >@@ -2905,8 +2905,7 @@ err_rdma_dest_wait:
> >   *
> >   * Keep doing this until the source tells us to stop.
> >   */
> >-static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque,
> >-                                         uint64_t flags)
> >+static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
> >  {
> >      RDMAControlHeader reg_resp = { .len = sizeof(RDMARegisterResult),
> >                                 .type = RDMA_CONTROL_REGISTER_RESULT,
> >@@ -2936,7 +2935,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque,
> >      CHECK_ERROR_STATE();
> >
> >      do {
> >-        trace_qemu_rdma_registration_handle_wait(flags);
> >+        trace_qemu_rdma_registration_handle_wait();
> >
> >          ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_NONE);
> >
> >@@ -3124,8 +3123,25 @@ out:
> >      return ret;
> >  }
> >
> >+static int rdma_load_hook(QEMUFile *f, void *opaque, uint64_t flags, void *data)
> >+{
> >+    switch (flags) {
> >+    case RAM_CONTROL_BLOCK_REG:
> >+        /* TODO A later patch */
> >+        return 0;
> >+        break;
> >+
> >+    case RAM_CONTROL_HOOK:
> >+        return qemu_rdma_registration_handle(f, opaque);
> >+
> >+    default:
> >+        /* Shouldn't be called with any other values */
> >+        abort();
> >+    }
> >+}
> >+
> >  static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
> >-                                        uint64_t flags)
> >+                                        uint64_t flags, void *data)
> >  {
> >      QEMUFileRDMA *rfile = opaque;
> >      RDMAContext *rdma = rfile->rdma;
> >@@ -3144,7 +3160,7 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
> >   * First, flush writes, if any.
> >   */
> >  static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
> >-                                       uint64_t flags)
> >+                                       uint64_t flags, void *data)
> >  {
> >      Error *local_err = NULL, **errp = &local_err;
> >      QEMUFileRDMA *rfile = opaque;
> >@@ -3266,7 +3282,7 @@ static const QEMUFileOps rdma_read_ops = {
> >      .get_buffer    = qemu_rdma_get_buffer,
> >      .get_fd        = qemu_rdma_get_fd,
> >      .close         = qemu_rdma_close,
> >-    .hook_ram_load = qemu_rdma_registration_handle,
> >+    .hook_ram_load = rdma_load_hook,
> >  };
> >
> >  static const QEMUFileOps rdma_write_ops = {
> >diff --git a/trace-events b/trace-events
> >index 557770c..0f37a4b 100644
> >--- a/trace-events
> >+++ b/trace-events
> >@@ -1432,7 +1432,7 @@ qemu_rdma_registration_handle_register_rkey(int rkey) "%x"
> >  qemu_rdma_registration_handle_unregister(int requests) "%d requests"
> >  qemu_rdma_registration_handle_unregister_loop(int count, int index, uint64_t chunk) "Unregistration request (%d): index %d, chunk %" PRIu64
> >  qemu_rdma_registration_handle_unregister_success(uint64_t chunk) "%" PRIu64
> >-qemu_rdma_registration_handle_wait(uint64_t flags) "Waiting for next request %" PRIu64
> >+qemu_rdma_registration_handle_wait(void) ""
> 
> Are you using some kind of script to generate these trace prototypes?
> What happened to the message? =)

The prototypes I added manually.  You normally have the trace name in your
tool so you don't need the text that gives you the same piece of information.
So I get output like:

25988@1434041422.299944:qemu_rdma_registration_handle_ram_blocks 

without having to write anything in the text.
(That's with the stderr trace backend, and the prefix being PID and time)

Dave

> 
> >  qemu_rdma_registration_start(uint64_t flags) "%" PRIu64
> >  qemu_rdma_registration_stop(uint64_t flags) "%" PRIu64
> >  qemu_rdma_registration_stop_ram(void) ""
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/12] Rework ram block hash
  2015-06-11 18:40   ` Michael R. Hines
@ 2015-06-11 18:45     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 36+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-11 18:45 UTC (permalink / raw)
  To: Michael R. Hines; +Cc: amit.shah, quintela, arei.gonglei, qemu-devel, mrhines

* Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
> On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> >RDMA uses a hash from block offset->RAM Block; this isn't needed
> >on the destination, and it becomes harder to maintain after the next
> >patch in the series that sorts the block list.
> >
> >Split the hash so that it's only generated on the source.
> >
> >Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >---
> >  migration/rdma.c | 32 ++++++++++++++++++++------------
> >  1 file changed, 20 insertions(+), 12 deletions(-)
> >
> >diff --git a/migration/rdma.c b/migration/rdma.c
> >index 8d99378..f541586 100644
> >--- a/migration/rdma.c
> >+++ b/migration/rdma.c
> >@@ -533,23 +533,22 @@ static int rdma_add_block(RDMAContext *rdma, const char *block_name,
> >                           ram_addr_t block_offset, uint64_t length)
> >  {
> >      RDMALocalBlocks *local = &rdma->local_ram_blocks;
> >-    RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
> >-        (void *)(uintptr_t)block_offset);
> >+    RDMALocalBlock *block;
> >      RDMALocalBlock *old = local->block;
> >
> >-    assert(block == NULL);
> >-
> >      local->block = g_malloc0(sizeof(RDMALocalBlock) * (local->nb_blocks + 1));
> >
> >      if (local->nb_blocks) {
> >          int x;
> >
> >-        for (x = 0; x < local->nb_blocks; x++) {
> >-            g_hash_table_remove(rdma->blockmap,
> >-                                (void *)(uintptr_t)old[x].offset);
> >-            g_hash_table_insert(rdma->blockmap,
> >-                                (void *)(uintptr_t)old[x].offset,
> >-                                &local->block[x]);
> >+        if (rdma->blockmap) {
> >+            for (x = 0; x < local->nb_blocks; x++) {
> >+                g_hash_table_remove(rdma->blockmap,
> >+                                    (void *)(uintptr_t)old[x].offset);
> >+                g_hash_table_insert(rdma->blockmap,
> >+                                    (void *)(uintptr_t)old[x].offset,
> >+                                    &local->block[x]);
> >+            }
> >          }
> >          memcpy(local->block, old, sizeof(RDMALocalBlock) * local->nb_blocks);
> >          g_free(old);
> >@@ -571,7 +570,9 @@ static int rdma_add_block(RDMAContext *rdma, const char *block_name,
> >
> >      block->is_ram_block = local->init ? false : true;
> >
> >-    g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
> >+    if (rdma->blockmap) {
> >+        g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
> >+    }
> >
> >      trace_rdma_add_block(block_name, local->nb_blocks,
> >                           (uintptr_t) block->local_host_addr,
> >@@ -607,7 +608,6 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
> >      RDMALocalBlocks *local = &rdma->local_ram_blocks;
> >
> >      assert(rdma->blockmap == NULL);
> >-    rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
> >      memset(local, 0, sizeof *local);
> >      qemu_ram_foreach_block(qemu_rdma_init_one_block, rdma);
> >      trace_qemu_rdma_init_ram_blocks(local->nb_blocks);
> >@@ -2292,6 +2292,14 @@ static int qemu_rdma_source_init(RDMAContext *rdma, Error **errp, bool pin_all)
> >          goto err_rdma_source_init;
> >      }
> >
> >+    /* Build the hash that maps from offset to RAMBlock */
> >+    rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
> >+    for (idx = 0; idx < rdma->local_ram_blocks.nb_blocks; idx++) {
> >+        g_hash_table_insert(rdma->blockmap,
> >+                (void *)(uintptr_t)rdma->local_ram_blocks.block[idx].offset,
> >+                &rdma->local_ram_blocks.block[idx]);
> >+    }
> >+
> >      for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
> >          ret = qemu_rdma_reg_control(rdma, idx);
> >          if (ret) {
> 
> You didn't want to use the ID string as a key? I forget....

I was trying not to; it sounded expensive to hash on strings
if we didn't need it.

> Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>

Thanks.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/12] Sort destination RAMBlocks to be the same as the source
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 10/12] Sort destination RAMBlocks to be the same as the source Dr. David Alan Gilbert (git)
@ 2015-06-11 18:55   ` Michael R. Hines
  0 siblings, 0 replies; 36+ messages in thread
From: Michael R. Hines @ 2015-06-11 18:55 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: amit.shah, arei.gonglei, mrhines, quintela

On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Use the order of incoming RAMBlocks from the source to record
> an index number; that then allows us to sort the destination
> local RAMBlock list to match the source.
>
> Now that the RAMBlocks are known to be in the same order, this
> simplifies the RDMA Registration step which previously tried to
> match RAMBlocks based on offset (which isn't guaranteed to match).
>
> Looking at the existing compress code, I think it was erroneously
> relying on an assumption of matching ordering, which this fixes.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   migration/rdma.c | 101 ++++++++++++++++++++++++++++++++++++++++---------------
>   trace-events     |   2 ++
>   2 files changed, 75 insertions(+), 28 deletions(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index f541586..92dc5c1 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -224,6 +224,7 @@ typedef struct RDMALocalBlock {
>       uint32_t      *remote_keys;     /* rkeys for chunk-level registration */
>       uint32_t       remote_rkey;     /* rkeys for non-chunk-level registration */
>       int            index;           /* which block are we */
> +    unsigned int   src_index;       /* (Only used on dest) */
>       bool           is_ram_block;
>       int            nb_chunks;
>       unsigned long *transit_bitmap;
> @@ -353,6 +354,9 @@ typedef struct RDMAContext {
>       RDMALocalBlocks local_ram_blocks;
>       RDMADestBlock  *dest_blocks;
>
> +    /* Index of the next RAMBlock received during block registration */
> +    unsigned int    next_src_index;
> +
>       /*
>        * Migration on *destination* started.
>        * Then use coroutine yield function.
> @@ -561,6 +565,7 @@ static int rdma_add_block(RDMAContext *rdma, const char *block_name,
>       block->offset = block_offset;
>       block->length = length;
>       block->index = local->nb_blocks;
> +    block->src_index = ~0U; /* Filled in by the receipt of the block list */
>       block->nb_chunks = ram_chunk_index(host_addr, host_addr + length) + 1UL;
>       block->transit_bitmap = bitmap_new(block->nb_chunks);
>       bitmap_clear(block->transit_bitmap, 0, block->nb_chunks);
> @@ -2909,6 +2914,14 @@ err_rdma_dest_wait:
>       return ret;
>   }
>
> +static int dest_ram_sort_func(const void *a, const void *b)
> +{
> +    unsigned int a_index = ((const RDMALocalBlock *)a)->src_index;
> +    unsigned int b_index = ((const RDMALocalBlock *)b)->src_index;
> +
> +    return (a_index < b_index) ? -1 : (a_index != b_index);
> +}
> +
>   /*
>    * During each iteration of the migration, we listen for instructions
>    * by the source VM to perform dynamic page registrations before they
> @@ -2986,6 +2999,13 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>           case RDMA_CONTROL_RAM_BLOCKS_REQUEST:
>               trace_qemu_rdma_registration_handle_ram_blocks();
>
> +            /* Sort our local RAM Block list so it's the same as the source,
> +             * we can do this since we've filled in a src_index in the list
> +             * as we received the RAMBlock list earlier.
> +             */
> +            qsort(rdma->local_ram_blocks.block,
> +                  rdma->local_ram_blocks.nb_blocks,
> +                  sizeof(RDMALocalBlock), dest_ram_sort_func);
>               if (rdma->pin_all) {
>                   ret = qemu_rdma_reg_whole_ram_blocks(rdma);
>                   if (ret) {
> @@ -3013,6 +3033,12 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>                   rdma->dest_blocks[i].length = local->block[i].length;
>
>                   dest_block_to_network(&rdma->dest_blocks[i]);
> +                trace_qemu_rdma_registration_handle_ram_blocks_loop(
> +                    local->block[i].block_name,
> +                    local->block[i].offset,
> +                    local->block[i].length,
> +                    local->block[i].local_host_addr,
> +                    local->block[i].src_index);
>               }
>
>               blocks.len = rdma->local_ram_blocks.nb_blocks
> @@ -3136,13 +3162,44 @@ out:
>       return ret;
>   }
>
> +/* Destination:
> + * Called via a ram_control_load_hook during the initial RAM load section which
> + * lists the RAMBlocks by name.  This lets us know the order of the RAMBlocks
> + * on the source.
> + * We've already built our local RAMBlock list, but not yet sent the list to
> + * the source.
> + */
> +static int rdma_block_notification_handle(QEMUFileRDMA *rfile, const char *name)
> +{
> +    RDMAContext *rdma = rfile->rdma;
> +    int curr;
> +    int found = -1;
> +
> +    /* Find the matching RAMBlock in our local list */
> +    for (curr = 0; curr < rdma->local_ram_blocks.nb_blocks; curr++) {
> +        if (!strcmp(rdma->local_ram_blocks.block[curr].block_name, name)) {
> +            found = curr;
> +            break;
> +        }
> +    }
> +
> +    if (found == -1) {
> +        error_report("RAMBlock '%s' not found on destination", name);
> +        return -ENOENT;
> +    }
> +
> +    rdma->local_ram_blocks.block[curr].src_index = rdma->next_src_index;
> +    trace_rdma_block_notification_handle(name, rdma->next_src_index);
> +    rdma->next_src_index++;
> +
> +    return 0;
> +}
> +
>   static int rdma_load_hook(QEMUFile *f, void *opaque, uint64_t flags, void *data)
>   {
>       switch (flags) {
>       case RAM_CONTROL_BLOCK_REG:
> -        /* TODO A later patch */
> -        return 0;
> -        break;
> +        return rdma_block_notification_handle(opaque, data);
>
>       case RAM_CONTROL_HOOK:
>           return qemu_rdma_registration_handle(f, opaque);
> @@ -3193,7 +3250,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>       if (flags == RAM_CONTROL_SETUP) {
>           RDMAControlHeader resp = {.type = RDMA_CONTROL_RAM_BLOCKS_RESULT };
>           RDMALocalBlocks *local = &rdma->local_ram_blocks;
> -        int reg_result_idx, i, j, nb_dest_blocks;
> +        int reg_result_idx, i, nb_dest_blocks;
>
>           head.type = RDMA_CONTROL_RAM_BLOCKS_REQUEST;
>           trace_qemu_rdma_registration_stop_ram();
> @@ -3229,9 +3286,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>            */
>
>           if (local->nb_blocks != nb_dest_blocks) {
> -            ERROR(errp, "ram blocks mismatch #1! "
> +            ERROR(errp, "ram blocks mismatch (Number of blocks %d vs %d) "
>                           "Your QEMU command line parameters are probably "
> -                        "not identical on both the source and destination.");
> +                        "not identical on both the source and destination.",
> +                        local->nb_blocks, nb_dest_blocks);
>               return -EINVAL;
>           }
>
> @@ -3241,30 +3299,17 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>           for (i = 0; i < nb_dest_blocks; i++) {
>               network_to_dest_block(&rdma->dest_blocks[i]);
>
> -            /* search local ram blocks */
> -            for (j = 0; j < local->nb_blocks; j++) {
> -                if (rdma->dest_blocks[i].offset != local->block[j].offset) {
> -                    continue;
> -                }
> -
> -                if (rdma->dest_blocks[i].length != local->block[j].length) {
> -                    ERROR(errp, "ram blocks mismatch #2! "
> -                        "Your QEMU command line parameters are probably "
> -                        "not identical on both the source and destination.");
> -                    return -EINVAL;
> -                }
> -                local->block[j].remote_host_addr =
> -                        rdma->dest_blocks[i].remote_host_addr;
> -                local->block[j].remote_rkey = rdma->dest_blocks[i].remote_rkey;
> -                break;
> -            }
> -
> -            if (j >= local->nb_blocks) {
> -                ERROR(errp, "ram blocks mismatch #3! "
> -                        "Your QEMU command line parameters are probably "
> -                        "not identical on both the source and destination.");
> +            /* We require that the blocks are in the same order */
> +            if (rdma->dest_blocks[i].length != local->block[i].length) {
> +                ERROR(errp, "Block %s/%d has a different length %" PRIu64
> +                            "vs %" PRIu64, local->block[i].block_name, i,
> +                            local->block[i].length,
> +                            rdma->dest_blocks[i].length);
>                   return -EINVAL;
>               }
> +            local->block[i].remote_host_addr =
> +                    rdma->dest_blocks[i].remote_host_addr;
> +            local->block[i].remote_rkey = rdma->dest_blocks[i].remote_rkey;
>           }
>       }

Seems pretty good overall, without actually testing it. Deleting code is 
good =).

Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>


> diff --git a/trace-events b/trace-events
> index 7dff362..b2bf8ea 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1426,6 +1426,7 @@ qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %" PRIu6
>   qemu_rdma_registration_handle_compress(int64_t length, int index, int64_t offset) "Zapping zero chunk: %" PRId64 " bytes, index %d, offset %" PRId64
>   qemu_rdma_registration_handle_finished(void) ""
>   qemu_rdma_registration_handle_ram_blocks(void) ""
> +qemu_rdma_registration_handle_ram_blocks_loop(const char *name, uint64_t offset, uint64_t length, void *local_host_addr, unsigned int src_index) "%s: @%" PRIx64 "/%" PRIu64 " host:@%p src_index: %u"
>   qemu_rdma_registration_handle_register(int requests) "%d requests"
>   qemu_rdma_registration_handle_register_loop(int req, int index, uint64_t addr, uint64_t chunks) "Registration request (%d): index %d, current_addr %" PRIu64 " chunks: %" PRIu64
>   qemu_rdma_registration_handle_register_rkey(int rkey) "%x"
> @@ -1452,6 +1453,7 @@ qemu_rdma_write_one_sendreg(uint64_t chunk, int len, int index, int64_t offset)
>   qemu_rdma_write_one_top(uint64_t chunks, uint64_t size) "Writing %" PRIu64 " chunks, (%" PRIu64 " MB)"
>   qemu_rdma_write_one_zero(uint64_t chunk, int len, int index, int64_t offset) "Entire chunk is zero, sending compress: %" PRIu64 " for %d bytes, index: %d, offset: %" PRId64
>   rdma_add_block(const char *block_name, int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Added Block: '%s':%d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
> +rdma_block_notification_handle(const char *name, int index) "%s at %d"
>   rdma_delete_block(void *block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Deleted Block: %p, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
>   rdma_start_incoming_migration(void) ""
>   rdma_start_incoming_migration_after_dest_init(void) ""

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/12] Translate offsets to destination address space
  2015-06-11 18:12   ` Michael R. Hines
@ 2015-06-11 18:58     ` Dr. David Alan Gilbert
  2015-06-11 19:08       ` Michael R. Hines
  0 siblings, 1 reply; 36+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-11 18:58 UTC (permalink / raw)
  To: Michael R. Hines; +Cc: amit.shah, quintela, arei.gonglei, qemu-devel, mrhines

* Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
> On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> >The 'offset' field in RDMACompress and 'current_addr' field
> >in RDMARegister are commented as being offsets within a particular
> >RAMBlock, however they appear to actually be offsets within the
> >ram_addr_t space.
> >
> >The code currently assumes that the offsets on the source/destination
> >match, this change removes the need for the assumption for these
> >structures by translating the addresses into the ram_addr_t space of
> >the destination host.
> >
> >Note: An alternative would be to change the fields to actually
> >take the data they're commented for; this would potentially be
> >simpler but would break stream compatibility for those cases
> >that currently work.
> >
> >Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >---
> >  migration/rdma.c | 31 ++++++++++++++++++++++++-------
> >  1 file changed, 24 insertions(+), 7 deletions(-)
> >
> >diff --git a/migration/rdma.c b/migration/rdma.c
> >index 9532461..cb66721 100644
> >--- a/migration/rdma.c
> >+++ b/migration/rdma.c
> >@@ -411,7 +411,7 @@ static void network_to_control(RDMAControlHeader *control)
> >   */
> >  typedef struct QEMU_PACKED {
> >      union QEMU_PACKED {
> >-        uint64_t current_addr;  /* offset into the ramblock of the chunk */
> >+        uint64_t current_addr;  /* offset into the ram_addr_t space */
> >          uint64_t chunk;         /* chunk to lookup if unregistering */
> >      } key;
> >      uint32_t current_index; /* which ramblock the chunk belongs to */
> >@@ -419,8 +419,19 @@ typedef struct QEMU_PACKED {
> >      uint64_t chunks;            /* how many sequential chunks to register */
> >  } RDMARegister;
> >
> >-static void register_to_network(RDMARegister *reg)
> >+static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
> >  {
> >+    RDMALocalBlock *local_block;
> >+    local_block  = &rdma->local_ram_blocks.block[reg->current_index];
> >+
> >+    if (local_block->is_ram_block) {
> >+        /*
> >+         * current_addr as passed in is an address in the local ram_addr_t
> >+         * space, we need to translate this for the destination
> >+         */
> >+        reg->key.current_addr -= local_block->offset;
> >+        reg->key.current_addr += rdma->dest_blocks[reg->current_index].offset;
> >+    }
> >      reg->key.current_addr = htonll(reg->key.current_addr);
> >      reg->current_index = htonl(reg->current_index);
> >      reg->chunks = htonll(reg->chunks);
> >@@ -436,13 +447,19 @@ static void network_to_register(RDMARegister *reg)
> >  typedef struct QEMU_PACKED {
> >      uint32_t value;     /* if zero, we will madvise() */
> >      uint32_t block_idx; /* which ram block index */
> >-    uint64_t offset;    /* where in the remote ramblock this chunk */
> >+    uint64_t offset;    /* Address in remote ram_addr_t space */
> >      uint64_t length;    /* length of the chunk */
> >  } RDMACompress;
> >
> >-static void compress_to_network(RDMACompress *comp)
> >+static void compress_to_network(RDMAContext *rdma, RDMACompress *comp)
> >  {
> >      comp->value = htonl(comp->value);
> >+    /*
> >+     * comp->offset as passed in is an address in the local ram_addr_t
> >+     * space, we need to translate this for the destination
> >+     */
> >+    comp->offset -= rdma->local_ram_blocks.block[comp->block_idx].offset;
> >+    comp->offset += rdma->dest_blocks[comp->block_idx].offset;
> >      comp->block_idx = htonl(comp->block_idx);
> >      comp->offset = htonll(comp->offset);
> >      comp->length = htonll(comp->length);
> 
> So, why add the destination block's offset on the source side
> just for it to be re-adjusted again when it gets to the destination side?
> 
> Can you just stop at this:
> 
> +        reg->key.current_addr -= local_block->offset;
> 
> Without this:
> 
> +        reg->key.current_addr +=
> rdma->dest_blocks[reg->current_index].offset;
> 
> ... on the source, followed by this on the destionation:
> 
> +    comp->offset -= rdma->local_ram_blocks.block[comp->block_idx].offset;
> 
> Without this:
> 
> +    comp->offset += rdma->dest_blocks[comp->block_idx].offset;
> 
> Did I follow correctly?

Aren't both of those conversions happening on the source?
Anyway, I think what you're saying is that we change the value sent over
the network to be an offset within the block instead of an offset in
the whole ram_addr_t space (i.e. that's what happens if you don't
add back on the dest_blocks[].offset).  As I commented in the commit
message, that would work but it would break compatibility with existing
RDMA migrations since the offset field would now have a different meaning.

Dave

> 
> >@@ -1288,7 +1305,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
> >          rdma->total_registrations--;
> >
> >          reg.key.chunk = chunk;
> >-        register_to_network(&reg);
> >+        register_to_network(rdma, &reg);
> >          ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
> >                                  &resp, NULL, NULL);
> >          if (ret < 0) {
> >@@ -1909,7 +1926,7 @@ retry:
> >                  trace_qemu_rdma_write_one_zero(chunk, sge.length,
> >                                                 current_index, current_addr);
> >
> >-                compress_to_network(&comp);
> >+                compress_to_network(rdma, &comp);
> >                  ret = qemu_rdma_exchange_send(rdma, &head,
> >                                  (uint8_t *) &comp, NULL, NULL, NULL);
> >
> >@@ -1936,7 +1953,7 @@ retry:
> >              trace_qemu_rdma_write_one_sendreg(chunk, sge.length, current_index,
> >                                                current_addr);
> >
> >-            register_to_network(&reg);
> >+            register_to_network(rdma, &reg);
> >              ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
> >                                      &resp, &reg_result_idx, NULL);
> >              if (ret < 0) {
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/12] Translate offsets to destination address space
  2015-06-11 18:58     ` Dr. David Alan Gilbert
@ 2015-06-11 19:08       ` Michael R. Hines
  2015-06-12 18:50         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 36+ messages in thread
From: Michael R. Hines @ 2015-06-11 19:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: amit.shah, quintela, arei.gonglei, qemu-devel, mrhines

On 06/11/2015 01:58 PM, Dr. David Alan Gilbert wrote:
> * Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
>> On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:
>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>
>>> The 'offset' field in RDMACompress and 'current_addr' field
>>> in RDMARegister are commented as being offsets within a particular
>>> RAMBlock, however they appear to actually be offsets within the
>>> ram_addr_t space.
>>>
>>> The code currently assumes that the offsets on the source/destination
>>> match, this change removes the need for the assumption for these
>>> structures by translating the addresses into the ram_addr_t space of
>>> the destination host.
>>>
>>> Note: An alternative would be to change the fields to actually
>>> take the data they're commented for; this would potentially be
>>> simpler but would break stream compatibility for those cases
>>> that currently work.
>>>
>>> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>> ---
>>>   migration/rdma.c | 31 ++++++++++++++++++++++++-------
>>>   1 file changed, 24 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/migration/rdma.c b/migration/rdma.c
>>> index 9532461..cb66721 100644
>>> --- a/migration/rdma.c
>>> +++ b/migration/rdma.c
>>> @@ -411,7 +411,7 @@ static void network_to_control(RDMAControlHeader *control)
>>>    */
>>>   typedef struct QEMU_PACKED {
>>>       union QEMU_PACKED {
>>> -        uint64_t current_addr;  /* offset into the ramblock of the chunk */
>>> +        uint64_t current_addr;  /* offset into the ram_addr_t space */
>>>           uint64_t chunk;         /* chunk to lookup if unregistering */
>>>       } key;
>>>       uint32_t current_index; /* which ramblock the chunk belongs to */
>>> @@ -419,8 +419,19 @@ typedef struct QEMU_PACKED {
>>>       uint64_t chunks;            /* how many sequential chunks to register */
>>>   } RDMARegister;
>>>
>>> -static void register_to_network(RDMARegister *reg)
>>> +static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
>>>   {
>>> +    RDMALocalBlock *local_block;
>>> +    local_block  = &rdma->local_ram_blocks.block[reg->current_index];
>>> +
>>> +    if (local_block->is_ram_block) {
>>> +        /*
>>> +         * current_addr as passed in is an address in the local ram_addr_t
>>> +         * space, we need to translate this for the destination
>>> +         */
>>> +        reg->key.current_addr -= local_block->offset;
>>> +        reg->key.current_addr += rdma->dest_blocks[reg->current_index].offset;
>>> +    }
>>>       reg->key.current_addr = htonll(reg->key.current_addr);
>>>       reg->current_index = htonl(reg->current_index);
>>>       reg->chunks = htonll(reg->chunks);
>>> @@ -436,13 +447,19 @@ static void network_to_register(RDMARegister *reg)
>>>   typedef struct QEMU_PACKED {
>>>       uint32_t value;     /* if zero, we will madvise() */
>>>       uint32_t block_idx; /* which ram block index */
>>> -    uint64_t offset;    /* where in the remote ramblock this chunk */
>>> +    uint64_t offset;    /* Address in remote ram_addr_t space */
>>>       uint64_t length;    /* length of the chunk */
>>>   } RDMACompress;
>>>
>>> -static void compress_to_network(RDMACompress *comp)
>>> +static void compress_to_network(RDMAContext *rdma, RDMACompress *comp)
>>>   {
>>>       comp->value = htonl(comp->value);
>>> +    /*
>>> +     * comp->offset as passed in is an address in the local ram_addr_t
>>> +     * space, we need to translate this for the destination
>>> +     */
>>> +    comp->offset -= rdma->local_ram_blocks.block[comp->block_idx].offset;
>>> +    comp->offset += rdma->dest_blocks[comp->block_idx].offset;
>>>       comp->block_idx = htonl(comp->block_idx);
>>>       comp->offset = htonll(comp->offset);
>>>       comp->length = htonll(comp->length);
>> So, why add the destination block's offset on the source side
>> just for it to be re-adjusted again when it gets to the destination side?
>>
>> Can you just stop at this:
>>
>> +        reg->key.current_addr -= local_block->offset;
>>
>> Without this:
>>
>> +        reg->key.current_addr +=
>> rdma->dest_blocks[reg->current_index].offset;
>>
>> ... on the source, followed by this on the destionation:
>>
>> +    comp->offset -= rdma->local_ram_blocks.block[comp->block_idx].offset;
>>
>> Without this:
>>
>> +    comp->offset += rdma->dest_blocks[comp->block_idx].offset;
>>
>> Did I follow correctly?
> Aren't both of those conversions happening on the source?
> Anyway, I think what you're saying is that we change the value sent over
> the network to be an offset within the block instead of an offset in
> the whole ram_addr_t space (i.e. that's what happens if you don't
> add back on the dest_blocks[].offset).

Yes, right. Can you skip adding/subtracting the local block offset on 
each side?

- Michael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/12] Translate offsets to destination address space
  2015-06-11 19:08       ` Michael R. Hines
@ 2015-06-12 18:50         ` Dr. David Alan Gilbert
  2015-06-12 19:17           ` Michael R. Hines
  0 siblings, 1 reply; 36+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-12 18:50 UTC (permalink / raw)
  To: Michael R. Hines; +Cc: amit.shah, quintela, arei.gonglei, qemu-devel, mrhines

* Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
> On 06/11/2015 01:58 PM, Dr. David Alan Gilbert wrote:
> >* Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
> >>On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:
> >>>From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >>>
> >>>The 'offset' field in RDMACompress and 'current_addr' field
> >>>in RDMARegister are commented as being offsets within a particular
> >>>RAMBlock, however they appear to actually be offsets within the
> >>>ram_addr_t space.
> >>>
> >>>The code currently assumes that the offsets on the source/destination
> >>>match, this change removes the need for the assumption for these
> >>>structures by translating the addresses into the ram_addr_t space of
> >>>the destination host.
> >>>
> >>>Note: An alternative would be to change the fields to actually
> >>>take the data they're commented for; this would potentially be
> >>>simpler but would break stream compatibility for those cases
> >>>that currently work.
> >>>
> >>>Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >>>---
> >>>  migration/rdma.c | 31 ++++++++++++++++++++++++-------
> >>>  1 file changed, 24 insertions(+), 7 deletions(-)
> >>>
> >>>diff --git a/migration/rdma.c b/migration/rdma.c
> >>>index 9532461..cb66721 100644
> >>>--- a/migration/rdma.c
> >>>+++ b/migration/rdma.c
> >>>@@ -411,7 +411,7 @@ static void network_to_control(RDMAControlHeader *control)
> >>>   */
> >>>  typedef struct QEMU_PACKED {
> >>>      union QEMU_PACKED {
> >>>-        uint64_t current_addr;  /* offset into the ramblock of the chunk */
> >>>+        uint64_t current_addr;  /* offset into the ram_addr_t space */
> >>>          uint64_t chunk;         /* chunk to lookup if unregistering */
> >>>      } key;
> >>>      uint32_t current_index; /* which ramblock the chunk belongs to */
> >>>@@ -419,8 +419,19 @@ typedef struct QEMU_PACKED {
> >>>      uint64_t chunks;            /* how many sequential chunks to register */
> >>>  } RDMARegister;
> >>>
> >>>-static void register_to_network(RDMARegister *reg)
> >>>+static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
> >>>  {
> >>>+    RDMALocalBlock *local_block;
> >>>+    local_block  = &rdma->local_ram_blocks.block[reg->current_index];
> >>>+
> >>>+    if (local_block->is_ram_block) {
> >>>+        /*
> >>>+         * current_addr as passed in is an address in the local ram_addr_t
> >>>+         * space, we need to translate this for the destination
> >>>+         */
> >>>+        reg->key.current_addr -= local_block->offset;
> >>>+        reg->key.current_addr += rdma->dest_blocks[reg->current_index].offset;
> >>>+    }
> >>>      reg->key.current_addr = htonll(reg->key.current_addr);
> >>>      reg->current_index = htonl(reg->current_index);
> >>>      reg->chunks = htonll(reg->chunks);
> >>>@@ -436,13 +447,19 @@ static void network_to_register(RDMARegister *reg)
> >>>  typedef struct QEMU_PACKED {
> >>>      uint32_t value;     /* if zero, we will madvise() */
> >>>      uint32_t block_idx; /* which ram block index */
> >>>-    uint64_t offset;    /* where in the remote ramblock this chunk */
> >>>+    uint64_t offset;    /* Address in remote ram_addr_t space */
> >>>      uint64_t length;    /* length of the chunk */
> >>>  } RDMACompress;
> >>>
> >>>-static void compress_to_network(RDMACompress *comp)
> >>>+static void compress_to_network(RDMAContext *rdma, RDMACompress *comp)
> >>>  {
> >>>      comp->value = htonl(comp->value);
> >>>+    /*
> >>>+     * comp->offset as passed in is an address in the local ram_addr_t
> >>>+     * space, we need to translate this for the destination
> >>>+     */
> >>>+    comp->offset -= rdma->local_ram_blocks.block[comp->block_idx].offset;
> >>>+    comp->offset += rdma->dest_blocks[comp->block_idx].offset;
> >>>      comp->block_idx = htonl(comp->block_idx);
> >>>      comp->offset = htonll(comp->offset);
> >>>      comp->length = htonll(comp->length);
> >>So, why add the destination block's offset on the source side
> >>just for it to be re-adjusted again when it gets to the destination side?
> >>
> >>Can you just stop at this:
> >>
> >>+        reg->key.current_addr -= local_block->offset;
> >>
> >>Without this:
> >>
> >>+        reg->key.current_addr +=
> >>rdma->dest_blocks[reg->current_index].offset;
> >>
> >>... on the source, followed by this on the destionation:
> >>
> >>+    comp->offset -= rdma->local_ram_blocks.block[comp->block_idx].offset;
> >>
> >>Without this:
> >>
> >>+    comp->offset += rdma->dest_blocks[comp->block_idx].offset;
> >>
> >>Did I follow correctly?
> >Aren't both of those conversions happening on the source?
> >Anyway, I think what you're saying is that we change the value sent over
> >the network to be an offset within the block instead of an offset in
> >the whole ram_addr_t space (i.e. that's what happens if you don't
> >add back on the dest_blocks[].offset).
> 
> Yes, right. Can you skip adding/subtracting the local block offset on each
> side?

I don't understand how I can do that without changing the wire format so
that it would be subtly incompatible, and I'd like to get 2.1ish migrating to 2.3ish.
If I didn't add the local_block->offset on the source, the value on the wire
would now be the offset within the RAMBlock rather than the offset in ram_addr_t.

Except for compatibility I'd agree it would be simpler.

Dave


> 
> - Michael
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/12] Translate offsets to destination address space
  2015-06-12 18:50         ` Dr. David Alan Gilbert
@ 2015-06-12 19:17           ` Michael R. Hines
  0 siblings, 0 replies; 36+ messages in thread
From: Michael R. Hines @ 2015-06-12 19:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: amit.shah, quintela, arei.gonglei, qemu-devel, mrhines

On 06/12/2015 01:50 PM, Dr. David Alan Gilbert wrote:
> * Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
>> On 06/11/2015 01:58 PM, Dr. David Alan Gilbert wrote:
>>> * Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
>>>> On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:
>>>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>>>
>>>>> The 'offset' field in RDMACompress and 'current_addr' field
>>>>> in RDMARegister are commented as being offsets within a particular
>>>>> RAMBlock, however they appear to actually be offsets within the
>>>>> ram_addr_t space.
>>>>>
>>>>> The code currently assumes that the offsets on the source/destination
>>>>> match, this change removes the need for the assumption for these
>>>>> structures by translating the addresses into the ram_addr_t space of
>>>>> the destination host.
>>>>>
>>>>> Note: An alternative would be to change the fields to actually
>>>>> take the data they're commented for; this would potentially be
>>>>> simpler but would break stream compatibility for those cases
>>>>> that currently work.
>>>>>
>>>>> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>>> ---
>>>>>   migration/rdma.c | 31 ++++++++++++++++++++++++-------
>>>>>   1 file changed, 24 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/migration/rdma.c b/migration/rdma.c
>>>>> index 9532461..cb66721 100644
>>>>> --- a/migration/rdma.c
>>>>> +++ b/migration/rdma.c
>>>>> @@ -411,7 +411,7 @@ static void network_to_control(RDMAControlHeader *control)
>>>>>    */
>>>>>   typedef struct QEMU_PACKED {
>>>>>       union QEMU_PACKED {
>>>>> -        uint64_t current_addr;  /* offset into the ramblock of the chunk */
>>>>> +        uint64_t current_addr;  /* offset into the ram_addr_t space */
>>>>>           uint64_t chunk;         /* chunk to lookup if unregistering */
>>>>>       } key;
>>>>>       uint32_t current_index; /* which ramblock the chunk belongs to */
>>>>> @@ -419,8 +419,19 @@ typedef struct QEMU_PACKED {
>>>>>       uint64_t chunks;            /* how many sequential chunks to register */
>>>>>   } RDMARegister;
>>>>>
>>>>> -static void register_to_network(RDMARegister *reg)
>>>>> +static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
>>>>>   {
>>>>> +    RDMALocalBlock *local_block;
>>>>> +    local_block  = &rdma->local_ram_blocks.block[reg->current_index];
>>>>> +
>>>>> +    if (local_block->is_ram_block) {
>>>>> +        /*
>>>>> +         * current_addr as passed in is an address in the local ram_addr_t
>>>>> +         * space, we need to translate this for the destination
>>>>> +         */
>>>>> +        reg->key.current_addr -= local_block->offset;
>>>>> +        reg->key.current_addr += rdma->dest_blocks[reg->current_index].offset;
>>>>> +    }
>>>>>       reg->key.current_addr = htonll(reg->key.current_addr);
>>>>>       reg->current_index = htonl(reg->current_index);
>>>>>       reg->chunks = htonll(reg->chunks);
>>>>> @@ -436,13 +447,19 @@ static void network_to_register(RDMARegister *reg)
>>>>>   typedef struct QEMU_PACKED {
>>>>>       uint32_t value;     /* if zero, we will madvise() */
>>>>>       uint32_t block_idx; /* which ram block index */
>>>>> -    uint64_t offset;    /* where in the remote ramblock this chunk */
>>>>> +    uint64_t offset;    /* Address in remote ram_addr_t space */
>>>>>       uint64_t length;    /* length of the chunk */
>>>>>   } RDMACompress;
>>>>>
>>>>> -static void compress_to_network(RDMACompress *comp)
>>>>> +static void compress_to_network(RDMAContext *rdma, RDMACompress *comp)
>>>>>   {
>>>>>       comp->value = htonl(comp->value);
>>>>> +    /*
>>>>> +     * comp->offset as passed in is an address in the local ram_addr_t
>>>>> +     * space, we need to translate this for the destination
>>>>> +     */
>>>>> +    comp->offset -= rdma->local_ram_blocks.block[comp->block_idx].offset;
>>>>> +    comp->offset += rdma->dest_blocks[comp->block_idx].offset;
>>>>>       comp->block_idx = htonl(comp->block_idx);
>>>>>       comp->offset = htonll(comp->offset);
>>>>>       comp->length = htonll(comp->length);
>>>> So, why add the destination block's offset on the source side
>>>> just for it to be re-adjusted again when it gets to the destination side?
>>>>
>>>> Can you just stop at this:
>>>>
>>>> +        reg->key.current_addr -= local_block->offset;
>>>>
>>>> Without this:
>>>>
>>>> +        reg->key.current_addr +=
>>>> rdma->dest_blocks[reg->current_index].offset;
>>>>
>>>> ... on the source, followed by this on the destionation:
>>>>
>>>> +    comp->offset -= rdma->local_ram_blocks.block[comp->block_idx].offset;
>>>>
>>>> Without this:
>>>>
>>>> +    comp->offset += rdma->dest_blocks[comp->block_idx].offset;
>>>>
>>>> Did I follow correctly?
>>> Aren't both of those conversions happening on the source?
>>> Anyway, I think what you're saying is that we change the value sent over
>>> the network to be an offset within the block instead of an offset in
>>> the whole ram_addr_t space (i.e. that's what happens if you don't
>>> add back on the dest_blocks[].offset).
>> Yes, right. Can you skip adding/subtracting the local block offset on each
>> side?
> I don't understand how I can do that without changing the wire format so
> that it would be subtly incompatible, and I'd like to get 2.1ish migrating to 2.3ish.
> If I didn't add the local_block->offset on the source, the value on the wire
> would now be the offset within the RAMBlock rather than the offset in ram_addr_t.
>
> Except for compatibility I'd agree it would be simpler.
>
> Dave
>
>
>> - Michael
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>


Acknowledged.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/12] Rename RDMA structures to make destination clear
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 01/12] Rename RDMA structures to make destination clear Dr. David Alan Gilbert (git)
@ 2015-07-01  8:36   ` Juan Quintela
  0 siblings, 0 replies; 36+ messages in thread
From: Juan Quintela @ 2015-07-01  8:36 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: amit.shah, arei.gonglei, qemu-devel, mrhines

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> RDMA has two data types that are named confusingly;
>    RDMALocalBlock (pointed to indirectly by local_ram_blocks)
>    RDMARemoteBlock (pointed to by block in RDMAContext)
>
> RDMALocalBlocks, as the name suggests is a data strucuture that
> represents the RDMAable RAM Blocks on the current side of the migration
> whichever that is.
>
> RDMARemoteBlocks is always the shape of the RAMBlocks on the
> destination, even on the destination.
>
> Rename:
>      RDMARemoteBlock -> RDMADestBlock
>      context->'block' -> context->dest_blocks
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/12] qemu_ram_foreach_block: pass up error value, and down the ramblock name
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 02/12] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
@ 2015-07-01  8:36   ` Juan Quintela
  0 siblings, 0 replies; 36+ messages in thread
From: Juan Quintela @ 2015-07-01  8:36 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: amit.shah, arei.gonglei, qemu-devel, mrhines

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> check the return value of the function it calls and error if it's non-0
> Fixup qemu_rdma_init_one_block that is the only current caller,
>   and rdma_add_block the only function it calls using it.
>
> Pass the name of the ramblock to the function; helps in debugging.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> Reviewed-by: Amit Shah <amit.shah@redhat.com>
> Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/12] Remove unneeded memset
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 03/12] Remove unneeded memset Dr. David Alan Gilbert (git)
@ 2015-07-01  8:37   ` Juan Quintela
  0 siblings, 0 replies; 36+ messages in thread
From: Juan Quintela @ 2015-07-01  8:37 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: amit.shah, arei.gonglei, qemu-devel, mrhines

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/12] rdma typos
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 04/12] rdma typos Dr. David Alan Gilbert (git)
  2015-06-11 17:56   ` Michael R. Hines
@ 2015-07-01  8:37   ` Juan Quintela
  1 sibling, 0 replies; 36+ messages in thread
From: Juan Quintela @ 2015-07-01  8:37 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: amit.shah, arei.gonglei, qemu-devel, mrhines

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> A couple of typo fixes.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/12] Store block name in local blocks structure
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 05/12] Store block name in local blocks structure Dr. David Alan Gilbert (git)
@ 2015-07-01  8:38   ` Juan Quintela
  0 siblings, 0 replies; 36+ messages in thread
From: Juan Quintela @ 2015-07-01  8:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: amit.shah, arei.gonglei, qemu-devel, mrhines

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> In a later patch the block name will be used to match up two views
> of the block list.  Keep a copy of the block name with the local block
> list.
>
> (At some point it could be argued that it would be best just to let
> migration see the innards of RAMBlock and avoid the need to use
> foreach).
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/12] Translate offsets to destination address space
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 06/12] Translate offsets to destination address space Dr. David Alan Gilbert (git)
  2015-06-11 18:12   ` Michael R. Hines
@ 2015-07-01  8:43   ` Juan Quintela
  1 sibling, 0 replies; 36+ messages in thread
From: Juan Quintela @ 2015-07-01  8:43 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: amit.shah, arei.gonglei, qemu-devel, mrhines

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> The 'offset' field in RDMACompress and 'current_addr' field
> in RDMARegister are commented as being offsets within a particular
> RAMBlock, however they appear to actually be offsets within the
> ram_addr_t space.
>
> The code currently assumes that the offsets on the source/destination
> match, this change removes the need for the assumption for these
> structures by translating the addresses into the ram_addr_t space of
> the destination host.
>
> Note: An alternative would be to change the fields to actually
> take the data they're commented for; this would potentially be
> simpler but would break stream compatibility for those cases
> that currently work.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>


backwards compatibility, you will live with your errors forever....

> +        /*
> +         * current_addr as passed in is an address in the local ram_addr_t
> +         * space, we need to translate this for the destination
> +         */
> +        reg->key.current_addr -= local_block->offset;
> +        reg->key.current_addr += rdma->dest_blocks[reg->current_index].offset;

I would add a function that is: rdma_adjust_offest() or something, but
it needs three pointer parameters, not sure that it is any easier :-(

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/12] Rework ram_control_load_hook to hook during block load
  2015-06-11 18:44     ` Dr. David Alan Gilbert
@ 2015-07-01  8:47       ` Juan Quintela
  0 siblings, 0 replies; 36+ messages in thread
From: Juan Quintela @ 2015-07-01  8:47 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: amit.shah, arei.gonglei, qemu-devel, Michael R. Hines, mrhines

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:

>> >  qemu_rdma_registration_handle_unregister_success(uint64_t chunk) "%" PRIu64
>> >-qemu_rdma_registration_handle_wait(uint64_t flags) "Waiting for next request %" PRIu64
>> >+qemu_rdma_registration_handle_wait(void) ""
>> 
>> Are you using some kind of script to generate these trace prototypes?
>> What happened to the message? =)
>
> The prototypes I added manually.  You normally have the trace name in your
> tool so you don't need the text that gives you the same piece of information.
> So I get output like:
>
> 25988@1434041422.299944:qemu_rdma_registration_handle_ram_blocks 
>
> without having to write anything in the text.
> (That's with the stderr trace backend, and the prefix being PID and time)

Notice also that he removed the flags argument on the caller function,
so nothing to print there, really.


>
> Dave
>
>> 
>> >  qemu_rdma_registration_start(uint64_t flags) "%" PRIu64
>> >  qemu_rdma_registration_stop(uint64_t flags) "%" PRIu64
>> >  qemu_rdma_registration_stop_ram(void) ""
>> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/12] Rework ram_control_load_hook to hook during block load
  2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 07/12] Rework ram_control_load_hook to hook during block load Dr. David Alan Gilbert (git)
  2015-06-11 18:20   ` Michael R. Hines
@ 2015-07-01  8:49   ` Juan Quintela
  2015-07-01  8:51     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 36+ messages in thread
From: Juan Quintela @ 2015-07-01  8:49 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: amit.shah, arei.gonglei, qemu-devel, mrhines

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> We need the names of RAMBlocks as they're loaded for RDMA,
> reuse a slightly modified ram_control_load_hook:
>   a) Pass a 'data' parameter to use for the name in the block-reg
>      case
>   b) Only some hook types now require the presence of a hook function.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> @@ -1569,6 +1569,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>                                  error_report_err(local_err);
>                              }
>                          }
> +                        ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG,
> +                                              block->idstr);
>                          break;
>                      }
>                  }
> @@ -1637,7 +1639,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>              break;
>          default:
>              if (flags & RAM_SAVE_FLAG_HOOK) {
> -                ram_control_load_hook(f, flags);
> +                ram_control_load_hook(f, RAM_CONTROL_HOOK, NULL);

Using a function in only two places, and passing two additional
parameters for that ....

> +static int rdma_load_hook(QEMUFile *f, void *opaque, uint64_t flags, void *data)
> +{
> +    switch (flags) {
> +    case RAM_CONTROL_BLOCK_REG:
> +        /* TODO A later patch */
> +        return 0;
> +        break;
> +
> +    case RAM_CONTROL_HOOK:
> +        return qemu_rdma_registration_handle(f, opaque);
> +
> +    default:
> +        /* Shouldn't be called with any other values */
> +        abort();
> +    }

And you are doing two completely different things depending of the flag ....

Later, Juan.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/12] Rework ram_control_load_hook to hook during block load
  2015-07-01  8:49   ` Juan Quintela
@ 2015-07-01  8:51     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 36+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-01  8:51 UTC (permalink / raw)
  To: Juan Quintela; +Cc: amit.shah, arei.gonglei, qemu-devel, mrhines

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > We need the names of RAMBlocks as they're loaded for RDMA,
> > reuse a slightly modified ram_control_load_hook:
> >   a) Pass a 'data' parameter to use for the name in the block-reg
> >      case
> >   b) Only some hook types now require the presence of a hook function.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> > @@ -1569,6 +1569,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >                                  error_report_err(local_err);
> >                              }
> >                          }
> > +                        ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG,
> > +                                              block->idstr);
> >                          break;
> >                      }
> >                  }
> > @@ -1637,7 +1639,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >              break;
> >          default:
> >              if (flags & RAM_SAVE_FLAG_HOOK) {
> > -                ram_control_load_hook(f, flags);
> > +                ram_control_load_hook(f, RAM_CONTROL_HOOK, NULL);
> 
> Using a function in only two places, and passing two additional
> parameters for that ....
> 
> > +static int rdma_load_hook(QEMUFile *f, void *opaque, uint64_t flags, void *data)
> > +{
> > +    switch (flags) {
> > +    case RAM_CONTROL_BLOCK_REG:
> > +        /* TODO A later patch */
> > +        return 0;
> > +        break;
> > +
> > +    case RAM_CONTROL_HOOK:
> > +        return qemu_rdma_registration_handle(f, opaque);
> > +
> > +    default:
> > +        /* Shouldn't be called with any other values */
> > +        abort();
> > +    }
> 
> And you are doing two completely different things depending of the flag ....

The other way would be to add a new function pointer to qemu_file and
wire up the new pointer.  It didn't seem any prettier.

Dave

> Later, Juan.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2015-07-01  8:51 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-11 17:17 [Qemu-devel] [PATCH v2 00/12] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 01/12] Rename RDMA structures to make destination clear Dr. David Alan Gilbert (git)
2015-07-01  8:36   ` Juan Quintela
2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 02/12] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
2015-07-01  8:36   ` Juan Quintela
2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 03/12] Remove unneeded memset Dr. David Alan Gilbert (git)
2015-07-01  8:37   ` Juan Quintela
2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 04/12] rdma typos Dr. David Alan Gilbert (git)
2015-06-11 17:56   ` Michael R. Hines
2015-06-11 18:37     ` Dr. David Alan Gilbert
2015-07-01  8:37   ` Juan Quintela
2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 05/12] Store block name in local blocks structure Dr. David Alan Gilbert (git)
2015-07-01  8:38   ` Juan Quintela
2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 06/12] Translate offsets to destination address space Dr. David Alan Gilbert (git)
2015-06-11 18:12   ` Michael R. Hines
2015-06-11 18:58     ` Dr. David Alan Gilbert
2015-06-11 19:08       ` Michael R. Hines
2015-06-12 18:50         ` Dr. David Alan Gilbert
2015-06-12 19:17           ` Michael R. Hines
2015-07-01  8:43   ` Juan Quintela
2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 07/12] Rework ram_control_load_hook to hook during block load Dr. David Alan Gilbert (git)
2015-06-11 18:20   ` Michael R. Hines
2015-06-11 18:44     ` Dr. David Alan Gilbert
2015-07-01  8:47       ` Juan Quintela
2015-07-01  8:49   ` Juan Quintela
2015-07-01  8:51     ` Dr. David Alan Gilbert
2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 08/12] Allow rdma_delete_block to work without the hash Dr. David Alan Gilbert (git)
2015-06-11 18:36   ` Michael R. Hines
2015-06-11 18:39     ` Dr. David Alan Gilbert
2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 09/12] Rework ram block hash Dr. David Alan Gilbert (git)
2015-06-11 18:40   ` Michael R. Hines
2015-06-11 18:45     ` Dr. David Alan Gilbert
2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 10/12] Sort destination RAMBlocks to be the same as the source Dr. David Alan Gilbert (git)
2015-06-11 18:55   ` Michael R. Hines
2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 11/12] Sanity check RDMA remote data Dr. David Alan Gilbert (git)
2015-06-11 17:17 ` [Qemu-devel] [PATCH v2 12/12] Fail more cleanly in mismatched RAM cases Dr. David Alan Gilbert (git)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.