All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/10] Remove RDMA migration dependence on RAMBlock offset
@ 2015-04-20 15:57 Dr. David Alan Gilbert (git)
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 01/10] Rename RDMA structures to make destination clear Dr. David Alan Gilbert (git)
                   ` (10 more replies)
  0 siblings, 11 replies; 31+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-20 15:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

RDMA migration currently relies on the source and destination RAMBlocks
having the same offsets within ram_addr_t space;  unfortunately that's
just not true when:
   a) You hotplug on the source but then create the device on the command line
     on the destination.

   b) Across two versions of qemu

Thus there are migrations that work with TCP that don't with RDMA.

The changes keep stream compatibility with existing RDMA migration,
so cases that already work (i.e. no hotplug) will keep working.

With some light testing this seems to work; hopefully I've got all the
cases that pass offsets back and forward.

Dave

Dr. David Alan Gilbert (10):
  Rename RDMA structures to make destination clear
  qemu_ram_foreach_block: pass up error value, and down the ramblock
    name
  Store block name in local blocks structure
  Translate offsets to destination address space
  Rework ram_control_load_hook to hook during block load
  Remove unneeded memset
  Simplify rdma_delete_block and remove it's dependence on the hash
  Rework ram block hash
  Sort destination RAMBlocks to be the same as the source
  Sanity check RDMA remote data

 arch_init.c                   |   4 +-
 exec.c                        |  10 +-
 include/exec/cpu-common.h     |   4 +-
 include/migration/migration.h |   2 +-
 include/migration/qemu-file.h |  14 +-
 migration/qemu-file.c         |   8 +-
 migration/rdma.c              | 367 ++++++++++++++++++++++++++----------------
 trace-events                  |   8 +-
 8 files changed, 257 insertions(+), 160 deletions(-)

-- 
2.3.5

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 01/10] Rename RDMA structures to make destination clear
  2015-04-20 15:57 [Qemu-devel] [PATCH 00/10] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
@ 2015-04-20 15:57 ` Dr. David Alan Gilbert (git)
  2015-05-19 17:52   ` Michael R. Hines
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 02/10] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-20 15:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

RDMA has two data types that are named confusingly;
   RDMALocalBlock (pointed to indirectly by local_ram_blocks)
   RDMARemoteBlock (pointed to by block in RDMAContext)

RDMALocalBlocks, as the name suggests is a data strucuture that
represents the RDMAable RAM Blocks on the current side of the migration
whichever that is.

RDMARemoteBlocks is always the shape of the RAMBlocks on the
destination, even on the destination.

Rename:
     RDMARemoteBlock -> RDMADestBlock
     context->'block' -> context->dest_blocks

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 66 ++++++++++++++++++++++++++++----------------------------
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 77e3444..089adcf 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -236,13 +236,13 @@ typedef struct RDMALocalBlock {
  * corresponding RDMALocalBlock with
  * the information needed to perform the actual RDMA.
  */
-typedef struct QEMU_PACKED RDMARemoteBlock {
+typedef struct QEMU_PACKED RDMADestBlock {
     uint64_t remote_host_addr;
     uint64_t offset;
     uint64_t length;
     uint32_t remote_rkey;
     uint32_t padding;
-} RDMARemoteBlock;
+} RDMADestBlock;
 
 static uint64_t htonll(uint64_t v)
 {
@@ -258,20 +258,20 @@ static uint64_t ntohll(uint64_t v) {
     return ((uint64_t)ntohl(u.lv[0]) << 32) | (uint64_t) ntohl(u.lv[1]);
 }
 
-static void remote_block_to_network(RDMARemoteBlock *rb)
+static void dest_block_to_network(RDMADestBlock *db)
 {
-    rb->remote_host_addr = htonll(rb->remote_host_addr);
-    rb->offset = htonll(rb->offset);
-    rb->length = htonll(rb->length);
-    rb->remote_rkey = htonl(rb->remote_rkey);
+    db->remote_host_addr = htonll(db->remote_host_addr);
+    db->offset = htonll(db->offset);
+    db->length = htonll(db->length);
+    db->remote_rkey = htonl(db->remote_rkey);
 }
 
-static void network_to_remote_block(RDMARemoteBlock *rb)
+static void network_to_dest_block(RDMADestBlock *db)
 {
-    rb->remote_host_addr = ntohll(rb->remote_host_addr);
-    rb->offset = ntohll(rb->offset);
-    rb->length = ntohll(rb->length);
-    rb->remote_rkey = ntohl(rb->remote_rkey);
+    db->remote_host_addr = ntohll(db->remote_host_addr);
+    db->offset = ntohll(db->offset);
+    db->length = ntohll(db->length);
+    db->remote_rkey = ntohl(db->remote_rkey);
 }
 
 /*
@@ -350,7 +350,7 @@ typedef struct RDMAContext {
      * Description of ram blocks used throughout the code.
      */
     RDMALocalBlocks local_ram_blocks;
-    RDMARemoteBlock *block;
+    RDMADestBlock  *dest_blocks;
 
     /*
      * Migration on *destination* started.
@@ -590,7 +590,7 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
     memset(local, 0, sizeof *local);
     qemu_ram_foreach_block(qemu_rdma_init_one_block, rdma);
     trace_qemu_rdma_init_ram_blocks(local->nb_blocks);
-    rdma->block = (RDMARemoteBlock *) g_malloc0(sizeof(RDMARemoteBlock) *
+    rdma->dest_blocks = (RDMADestBlock *) g_malloc0(sizeof(RDMADestBlock) *
                         rdma->local_ram_blocks.nb_blocks);
     local->init = true;
     return 0;
@@ -2177,8 +2177,8 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
         rdma->connected = false;
     }
 
-    g_free(rdma->block);
-    rdma->block = NULL;
+    g_free(rdma->dest_blocks);
+    rdma->dest_blocks = NULL;
 
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         if (rdma->wr_data[idx].control_mr) {
@@ -2967,25 +2967,25 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque,
              * their "local" descriptions with what was sent.
              */
             for (i = 0; i < local->nb_blocks; i++) {
-                rdma->block[i].remote_host_addr =
+                rdma->dest_blocks[i].remote_host_addr =
                     (uintptr_t)(local->block[i].local_host_addr);
 
                 if (rdma->pin_all) {
-                    rdma->block[i].remote_rkey = local->block[i].mr->rkey;
+                    rdma->dest_blocks[i].remote_rkey = local->block[i].mr->rkey;
                 }
 
-                rdma->block[i].offset = local->block[i].offset;
-                rdma->block[i].length = local->block[i].length;
+                rdma->dest_blocks[i].offset = local->block[i].offset;
+                rdma->dest_blocks[i].length = local->block[i].length;
 
-                remote_block_to_network(&rdma->block[i]);
+                dest_block_to_network(&rdma->dest_blocks[i]);
             }
 
             blocks.len = rdma->local_ram_blocks.nb_blocks
-                                                * sizeof(RDMARemoteBlock);
+                                                * sizeof(RDMADestBlock);
 
 
             ret = qemu_rdma_post_send_control(rdma,
-                                        (uint8_t *) rdma->block, &blocks);
+                                        (uint8_t *) rdma->dest_blocks, &blocks);
 
             if (ret < 0) {
                 error_report("rdma migration: error sending remote info");
@@ -3141,7 +3141,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
     if (flags == RAM_CONTROL_SETUP) {
         RDMAControlHeader resp = {.type = RDMA_CONTROL_RAM_BLOCKS_RESULT };
         RDMALocalBlocks *local = &rdma->local_ram_blocks;
-        int reg_result_idx, i, j, nb_remote_blocks;
+        int reg_result_idx, i, j, nb_dest_blocks;
 
         head.type = RDMA_CONTROL_RAM_BLOCKS_REQUEST;
         trace_qemu_rdma_registration_stop_ram();
@@ -3162,7 +3162,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
             return ret;
         }
 
-        nb_remote_blocks = resp.len / sizeof(RDMARemoteBlock);
+        nb_dest_blocks = resp.len / sizeof(RDMADestBlock);
 
         /*
          * The protocol uses two different sets of rkeys (mutually exclusive):
@@ -3176,7 +3176,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
          * and then propagates the remote ram block descriptions to his local copy.
          */
 
-        if (local->nb_blocks != nb_remote_blocks) {
+        if (local->nb_blocks != nb_dest_blocks) {
             ERROR(errp, "ram blocks mismatch #1! "
                         "Your QEMU command line parameters are probably "
                         "not identical on both the source and destination.");
@@ -3184,26 +3184,26 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
         }
 
         qemu_rdma_move_header(rdma, reg_result_idx, &resp);
-        memcpy(rdma->block,
+        memcpy(rdma->dest_blocks,
             rdma->wr_data[reg_result_idx].control_curr, resp.len);
-        for (i = 0; i < nb_remote_blocks; i++) {
-            network_to_remote_block(&rdma->block[i]);
+        for (i = 0; i < nb_dest_blocks; i++) {
+            network_to_dest_block(&rdma->dest_blocks[i]);
 
             /* search local ram blocks */
             for (j = 0; j < local->nb_blocks; j++) {
-                if (rdma->block[i].offset != local->block[j].offset) {
+                if (rdma->dest_blocks[i].offset != local->block[j].offset) {
                     continue;
                 }
 
-                if (rdma->block[i].length != local->block[j].length) {
+                if (rdma->dest_blocks[i].length != local->block[j].length) {
                     ERROR(errp, "ram blocks mismatch #2! "
                         "Your QEMU command line parameters are probably "
                         "not identical on both the source and destination.");
                     return -EINVAL;
                 }
                 local->block[j].remote_host_addr =
-                        rdma->block[i].remote_host_addr;
-                local->block[j].remote_rkey = rdma->block[i].remote_rkey;
+                        rdma->dest_blocks[i].remote_host_addr;
+                local->block[j].remote_rkey = rdma->dest_blocks[i].remote_rkey;
                 break;
             }
 
-- 
2.3.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 02/10] qemu_ram_foreach_block: pass up error value, and down the ramblock name
  2015-04-20 15:57 [Qemu-devel] [PATCH 00/10] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 01/10] Rename RDMA structures to make destination clear Dr. David Alan Gilbert (git)
@ 2015-04-20 15:57 ` Dr. David Alan Gilbert (git)
  2015-05-19 17:56   ` Michael R. Hines
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 03/10] Store block name in local blocks structure Dr. David Alan Gilbert (git)
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-20 15:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

check the return value of the function it calls and error if it's non-0
Fixup qemu_rdma_init_one_block that is the only current caller,
  and rdma_add_block the only function it calls using it.

Pass the name of the ramblock to the function; helps in debugging.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 exec.c                    | 10 ++++++++--
 include/exec/cpu-common.h |  4 ++--
 migration/rdma.c          |  4 ++--
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index 874ecfc..7693794 100644
--- a/exec.c
+++ b/exec.c
@@ -3067,14 +3067,20 @@ bool cpu_physical_memory_is_io(hwaddr phys_addr)
              memory_region_is_romd(mr));
 }
 
-void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
+int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
 {
     RAMBlock *block;
+    int ret = 0;
 
     rcu_read_lock();
     QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
-        func(block->host, block->offset, block->used_length, opaque);
+        ret = func(block->idstr, block->host, block->offset,
+                   block->used_length, opaque);
+        if (ret) {
+            break;
+        }
     }
     rcu_read_unlock();
+    return ret;
 }
 #endif
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index fcc3162..2abecac 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -125,10 +125,10 @@ void cpu_flush_icache_range(hwaddr start, int len);
 extern struct MemoryRegion io_mem_rom;
 extern struct MemoryRegion io_mem_notdirty;
 
-typedef void (RAMBlockIterFunc)(void *host_addr,
+typedef int (RAMBlockIterFunc)(const char *block_name, void *host_addr,
     ram_addr_t offset, ram_addr_t length, void *opaque);
 
-void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
+int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
 
 #endif
 
diff --git a/migration/rdma.c b/migration/rdma.c
index 089adcf..38e5f44 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -570,10 +570,10 @@ static int rdma_add_block(RDMAContext *rdma, void *host_addr,
  * in advanced before the migration starts. This tells us where the RAM blocks
  * are so that we can register them individually.
  */
-static void qemu_rdma_init_one_block(void *host_addr,
+static int qemu_rdma_init_one_block(const char *block_name, void *host_addr,
     ram_addr_t block_offset, ram_addr_t length, void *opaque)
 {
-    rdma_add_block(opaque, host_addr, block_offset, length);
+    return rdma_add_block(opaque, host_addr, block_offset, length);
 }
 
 /*
-- 
2.3.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 03/10] Store block name in local blocks structure
  2015-04-20 15:57 [Qemu-devel] [PATCH 00/10] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 01/10] Rename RDMA structures to make destination clear Dr. David Alan Gilbert (git)
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 02/10] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
@ 2015-04-20 15:57 ` Dr. David Alan Gilbert (git)
  2015-05-19 18:00   ` Michael R. Hines
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 04/10] Translate offsets to destination address space Dr. David Alan Gilbert (git)
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-20 15:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

In a later patch the block name will be used to match up two views
of the block list.  Keep a copy of the block name with the local block
list.

(At some point it could be argued that it would be best just to let
migration see the innards of RAMBlock and avoid the need to use
foreach).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 35 +++++++++++++++++++++--------------
 trace-events     |  2 +-
 2 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 38e5f44..c3814c5 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -214,17 +214,18 @@ static void network_to_caps(RDMACapabilities *cap)
  * the information. It's small anyway, so a list is overkill.
  */
 typedef struct RDMALocalBlock {
-    uint8_t  *local_host_addr; /* local virtual address */
-    uint64_t remote_host_addr; /* remote virtual address */
-    uint64_t offset;
-    uint64_t length;
-    struct   ibv_mr **pmr;     /* MRs for chunk-level registration */
-    struct   ibv_mr *mr;       /* MR for non-chunk-level registration */
-    uint32_t *remote_keys;     /* rkeys for chunk-level registration */
-    uint32_t remote_rkey;      /* rkeys for non-chunk-level registration */
-    int      index;            /* which block are we */
-    bool     is_ram_block;
-    int      nb_chunks;
+    char          *block_name;
+    uint8_t       *local_host_addr; /* local virtual address */
+    uint64_t       remote_host_addr; /* remote virtual address */
+    uint64_t       offset;
+    uint64_t       length;
+    struct         ibv_mr **pmr;    /* MRs for chunk-level registration */
+    struct         ibv_mr *mr;      /* MR for non-chunk-level registration */
+    uint32_t      *remote_keys;     /* rkeys for chunk-level registration */
+    uint32_t       remote_rkey;     /* rkeys for non-chunk-level registration */
+    int            index;           /* which block are we */
+    bool           is_ram_block;
+    int            nb_chunks;
     unsigned long *transit_bitmap;
     unsigned long *unregister_bitmap;
 } RDMALocalBlock;
@@ -510,7 +511,8 @@ static inline uint8_t *ram_chunk_end(const RDMALocalBlock *rdma_ram_block,
     return result;
 }
 
-static int rdma_add_block(RDMAContext *rdma, void *host_addr,
+static int rdma_add_block(RDMAContext *rdma, const char *block_name,
+                         void *host_addr,
                          ram_addr_t block_offset, uint64_t length)
 {
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
@@ -538,6 +540,7 @@ static int rdma_add_block(RDMAContext *rdma, void *host_addr,
 
     block = &local->block[local->nb_blocks];
 
+    block->block_name = g_strdup(block_name);
     block->local_host_addr = host_addr;
     block->offset = block_offset;
     block->length = length;
@@ -553,7 +556,8 @@ static int rdma_add_block(RDMAContext *rdma, void *host_addr,
 
     g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
 
-    trace_rdma_add_block(local->nb_blocks, (uintptr_t) block->local_host_addr,
+    trace_rdma_add_block(block_name, local->nb_blocks,
+                         (uintptr_t) block->local_host_addr,
                          block->offset, block->length,
                          (uintptr_t) (block->local_host_addr + block->length),
                          BITS_TO_LONGS(block->nb_chunks) *
@@ -573,7 +577,7 @@ static int rdma_add_block(RDMAContext *rdma, void *host_addr,
 static int qemu_rdma_init_one_block(const char *block_name, void *host_addr,
     ram_addr_t block_offset, ram_addr_t length, void *opaque)
 {
-    return rdma_add_block(opaque, host_addr, block_offset, length);
+    return rdma_add_block(opaque, block_name, host_addr, block_offset, length);
 }
 
 /*
@@ -639,6 +643,9 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
         g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)old[x].offset);
     }
 
+    g_free(block->block_name);
+    block->block_name = NULL;
+
     if (local->nb_blocks > 1) {
 
         local->block = g_malloc0(sizeof(RDMALocalBlock) *
diff --git a/trace-events b/trace-events
index 30eba92..07f15da 100644
--- a/trace-events
+++ b/trace-events
@@ -1435,7 +1435,7 @@ qemu_rdma_write_one_recvregres(int mykey, int theirkey, uint64_t chunk) "Receive
 qemu_rdma_write_one_sendreg(uint64_t chunk, int len, int index, int64_t offset) "Sending registration request chunk %" PRIu64 " for %d bytes, index: %d, offset: %" PRId64
 qemu_rdma_write_one_top(uint64_t chunks, uint64_t size) "Writing %" PRIu64 " chunks, (%" PRIu64 " MB)"
 qemu_rdma_write_one_zero(uint64_t chunk, int len, int index, int64_t offset) "Entire chunk is zero, sending compress: %" PRIu64 " for %d bytes, index: %d, offset: %" PRId64
-rdma_add_block(int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Added Block: %d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
+rdma_add_block(const char *block_name, int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Added Block: '%s':%d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
 rdma_delete_block(int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Deleted Block: %d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
 rdma_start_incoming_migration(void) ""
 rdma_start_incoming_migration_after_dest_init(void) ""
-- 
2.3.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 04/10] Translate offsets to destination address space
  2015-04-20 15:57 [Qemu-devel] [PATCH 00/10] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 03/10] Store block name in local blocks structure Dr. David Alan Gilbert (git)
@ 2015-04-20 15:57 ` Dr. David Alan Gilbert (git)
  2015-05-19 18:28   ` Michael R. Hines
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 05/10] Rework ram_control_load_hook to hook during block load Dr. David Alan Gilbert (git)
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-20 15:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The 'offset' field in RDMACompress and 'current_addr' field
in RDMARegister are commented as being offsets within a particular
RAMBlock, however they appear to actually be offsets within the
ram_addr_t space.

The code currently assumes that the offsets on the source/destination
match, this change removes the need for the assumption for these
structures by translating the addresses into the ram_addr_t space of
the destination host.

Note: An alternative would be to change the fields to actually
take the data they're commented for; this would potentially be
simpler but would break stream compatibility for those cases
that currently work.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 31 ++++++++++++++++++++++++-------
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index c3814c5..2c0d11b 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -411,7 +411,7 @@ static void network_to_control(RDMAControlHeader *control)
  */
 typedef struct QEMU_PACKED {
     union QEMU_PACKED {
-        uint64_t current_addr;  /* offset into the ramblock of the chunk */
+        uint64_t current_addr;  /* offset into the ram_addr_t space */
         uint64_t chunk;         /* chunk to lookup if unregistering */
     } key;
     uint32_t current_index; /* which ramblock the chunk belongs to */
@@ -419,8 +419,19 @@ typedef struct QEMU_PACKED {
     uint64_t chunks;            /* how many sequential chunks to register */
 } RDMARegister;
 
-static void register_to_network(RDMARegister *reg)
+static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
 {
+    RDMALocalBlock *local_block;
+    local_block  = &rdma->local_ram_blocks.block[reg->current_index];
+
+    if (local_block->is_ram_block) {
+        /*
+         * current_addr as passed in is an address in the local ram_addr_t
+         * space, we need to translate this for the destination
+         */
+        reg->key.current_addr -= local_block->offset;
+        reg->key.current_addr += rdma->dest_blocks[reg->current_index].offset;
+    }
     reg->key.current_addr = htonll(reg->key.current_addr);
     reg->current_index = htonl(reg->current_index);
     reg->chunks = htonll(reg->chunks);
@@ -436,13 +447,19 @@ static void network_to_register(RDMARegister *reg)
 typedef struct QEMU_PACKED {
     uint32_t value;     /* if zero, we will madvise() */
     uint32_t block_idx; /* which ram block index */
-    uint64_t offset;    /* where in the remote ramblock this chunk */
+    uint64_t offset;    /* Address in remote ram_addr_t space */
     uint64_t length;    /* length of the chunk */
 } RDMACompress;
 
-static void compress_to_network(RDMACompress *comp)
+static void compress_to_network(RDMAContext *rdma, RDMACompress *comp)
 {
     comp->value = htonl(comp->value);
+    /*
+     * comp->offset as passed in is an address in the local ram_addr_t
+     * space, we need to translate this for the destination
+     */
+    comp->offset -= rdma->local_ram_blocks.block[comp->block_idx].offset;
+    comp->offset += rdma->dest_blocks[comp->block_idx].offset;
     comp->block_idx = htonl(comp->block_idx);
     comp->offset = htonll(comp->offset);
     comp->length = htonll(comp->length);
@@ -1288,7 +1305,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
         rdma->total_registrations--;
 
         reg.key.chunk = chunk;
-        register_to_network(&reg);
+        register_to_network(rdma, &reg);
         ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
                                 &resp, NULL, NULL);
         if (ret < 0) {
@@ -1909,7 +1926,7 @@ retry:
                 trace_qemu_rdma_write_one_zero(chunk, sge.length,
                                                current_index, current_addr);
 
-                compress_to_network(&comp);
+                compress_to_network(rdma, &comp);
                 ret = qemu_rdma_exchange_send(rdma, &head,
                                 (uint8_t *) &comp, NULL, NULL, NULL);
 
@@ -1936,7 +1953,7 @@ retry:
             trace_qemu_rdma_write_one_sendreg(chunk, sge.length, current_index,
                                               current_addr);
 
-            register_to_network(&reg);
+            register_to_network(rdma, &reg);
             ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
                                     &resp, &reg_result_idx, NULL);
             if (ret < 0) {
-- 
2.3.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 05/10] Rework ram_control_load_hook to hook during block load
  2015-04-20 15:57 [Qemu-devel] [PATCH 00/10] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 04/10] Translate offsets to destination address space Dr. David Alan Gilbert (git)
@ 2015-04-20 15:57 ` Dr. David Alan Gilbert (git)
  2015-05-19 18:35   ` Michael R. Hines
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 06/10] Remove unneeded memset Dr. David Alan Gilbert (git)
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-20 15:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

We need the names of RAMBlocks as they're loaded for RDMA,
reuse an existing QEMUFile hook with some small mods.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 arch_init.c                   |  4 +++-
 include/migration/migration.h |  2 +-
 include/migration/qemu-file.h | 14 +++++++++-----
 migration/qemu-file.c         |  8 ++++----
 migration/rdma.c              | 28 ++++++++++++++++++++++------
 trace-events                  |  2 +-
 6 files changed, 40 insertions(+), 18 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 4c8fcee..6c0b355 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1164,6 +1164,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                                 error_report_err(local_err);
                             }
                         }
+                        ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG,
+                                              block->idstr);
                         break;
                     }
                 }
@@ -1215,7 +1217,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             break;
         default:
             if (flags & RAM_SAVE_FLAG_HOOK) {
-                ram_control_load_hook(f, flags);
+                ram_control_load_hook(f, RAM_CONTROL_HOOK, NULL);
             } else {
                 error_report("Unknown combination of migration flags: %#x",
                              flags);
diff --git a/include/migration/migration.h b/include/migration/migration.h
index bf09968..3c2f91a 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -154,7 +154,7 @@ int64_t xbzrle_cache_resize(int64_t new_size);
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
-void ram_control_load_hook(QEMUFile *f, uint64_t flags);
+void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data);
 
 /* Whenever this is found in the data stream, the flags
  * will be passed to ram_control_load_hook in the incoming-migration
diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 745a850..a3ceb3b 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -63,16 +63,20 @@ typedef ssize_t (QEMUFileWritevBufferFunc)(void *opaque, struct iovec *iov,
 /*
  * This function provides hooks around different
  * stages of RAM migration.
+ * 'opaque' is the backend specific data in QEMUFile
+ * 'data' is call specific data associated with the 'flags' value
  */
-typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags);
+typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags,
+                              void *data);
 
 /*
  * Constants used by ram_control_* hooks
  */
-#define RAM_CONTROL_SETUP    0
-#define RAM_CONTROL_ROUND    1
-#define RAM_CONTROL_HOOK     2
-#define RAM_CONTROL_FINISH   3
+#define RAM_CONTROL_SETUP     0
+#define RAM_CONTROL_ROUND     1
+#define RAM_CONTROL_HOOK      2
+#define RAM_CONTROL_FINISH    3
+#define RAM_CONTROL_BLOCK_REG 4
 
 /*
  * This function allows override of where the RAM page
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 1a4f986..4986e05 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -127,7 +127,7 @@ void ram_control_before_iterate(QEMUFile *f, uint64_t flags)
     int ret = 0;
 
     if (f->ops->before_ram_iterate) {
-        ret = f->ops->before_ram_iterate(f, f->opaque, flags);
+        ret = f->ops->before_ram_iterate(f, f->opaque, flags, NULL);
         if (ret < 0) {
             qemu_file_set_error(f, ret);
         }
@@ -139,19 +139,19 @@ void ram_control_after_iterate(QEMUFile *f, uint64_t flags)
     int ret = 0;
 
     if (f->ops->after_ram_iterate) {
-        ret = f->ops->after_ram_iterate(f, f->opaque, flags);
+        ret = f->ops->after_ram_iterate(f, f->opaque, flags, NULL);
         if (ret < 0) {
             qemu_file_set_error(f, ret);
         }
     }
 }
 
-void ram_control_load_hook(QEMUFile *f, uint64_t flags)
+void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data)
 {
     int ret = -EINVAL;
 
     if (f->ops->hook_ram_load) {
-        ret = f->ops->hook_ram_load(f, f->opaque, flags);
+        ret = f->ops->hook_ram_load(f, f->opaque, flags, data);
         if (ret < 0) {
             qemu_file_set_error(f, ret);
         }
diff --git a/migration/rdma.c b/migration/rdma.c
index 2c0d11b..e43fae4 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2906,8 +2906,7 @@ err_rdma_dest_wait:
  *
  * Keep doing this until the source tells us to stop.
  */
-static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque,
-                                         uint64_t flags)
+static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
 {
     RDMAControlHeader reg_resp = { .len = sizeof(RDMARegisterResult),
                                .type = RDMA_CONTROL_REGISTER_RESULT,
@@ -2937,7 +2936,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque,
     CHECK_ERROR_STATE();
 
     do {
-        trace_qemu_rdma_registration_handle_wait(flags);
+        trace_qemu_rdma_registration_handle_wait();
 
         ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_NONE);
 
@@ -3125,8 +3124,25 @@ out:
     return ret;
 }
 
+static int rdma_load_hook(QEMUFile *f, void *opaque, uint64_t flags, void *data)
+{
+    switch (flags) {
+    case RAM_CONTROL_BLOCK_REG:
+        /* TODO A later patch */
+        return -1;
+        break;
+
+    case RAM_CONTROL_HOOK:
+        return qemu_rdma_registration_handle(f, opaque);
+
+    default:
+        /* Shouldn't be called with any other values */
+        abort();
+    }
+}
+
 static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
-                                        uint64_t flags)
+                                        uint64_t flags, void *data)
 {
     QEMUFileRDMA *rfile = opaque;
     RDMAContext *rdma = rfile->rdma;
@@ -3145,7 +3161,7 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
  * First, flush writes, if any.
  */
 static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
-                                       uint64_t flags)
+                                       uint64_t flags, void *data)
 {
     Error *local_err = NULL, **errp = &local_err;
     QEMUFileRDMA *rfile = opaque;
@@ -3267,7 +3283,7 @@ static const QEMUFileOps rdma_read_ops = {
     .get_buffer    = qemu_rdma_get_buffer,
     .get_fd        = qemu_rdma_get_fd,
     .close         = qemu_rdma_close,
-    .hook_ram_load = qemu_rdma_registration_handle,
+    .hook_ram_load = rdma_load_hook,
 };
 
 static const QEMUFileOps rdma_write_ops = {
diff --git a/trace-events b/trace-events
index 07f15da..baf8647 100644
--- a/trace-events
+++ b/trace-events
@@ -1416,7 +1416,7 @@ qemu_rdma_registration_handle_register_rkey(int rkey) "%x"
 qemu_rdma_registration_handle_unregister(int requests) "%d requests"
 qemu_rdma_registration_handle_unregister_loop(int count, int index, uint64_t chunk) "Unregistration request (%d): index %d, chunk %" PRIu64
 qemu_rdma_registration_handle_unregister_success(uint64_t chunk) "%" PRIu64
-qemu_rdma_registration_handle_wait(uint64_t flags) "Waiting for next request %" PRIu64
+qemu_rdma_registration_handle_wait(void) ""
 qemu_rdma_registration_start(uint64_t flags) "%" PRIu64
 qemu_rdma_registration_stop(uint64_t flags) "%" PRIu64
 qemu_rdma_registration_stop_ram(void) ""
-- 
2.3.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 06/10] Remove unneeded memset
  2015-04-20 15:57 [Qemu-devel] [PATCH 00/10] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 05/10] Rework ram_control_load_hook to hook during block load Dr. David Alan Gilbert (git)
@ 2015-04-20 15:57 ` Dr. David Alan Gilbert (git)
  2015-05-19 18:35   ` Michael R. Hines
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 07/10] Simplify rdma_delete_block and remove it's dependence on the hash Dr. David Alan Gilbert (git)
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-20 15:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index e43fae4..4f7dd0d 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2469,7 +2469,6 @@ static void *qemu_rdma_data_init(const char *host_port, Error **errp)
 
     if (host_port) {
         rdma = g_malloc0(sizeof(RDMAContext));
-        memset(rdma, 0, sizeof(RDMAContext));
         rdma->current_index = -1;
         rdma->current_chunk = -1;
 
-- 
2.3.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 07/10] Simplify rdma_delete_block and remove it's dependence on the hash
  2015-04-20 15:57 [Qemu-devel] [PATCH 00/10] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
                   ` (5 preceding siblings ...)
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 06/10] Remove unneeded memset Dr. David Alan Gilbert (git)
@ 2015-04-20 15:57 ` Dr. David Alan Gilbert (git)
  2015-05-19 18:44   ` Michael R. Hines
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 08/10] Rework ram block hash Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-20 15:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

rdma_delete_block is currently very general, but it's only used
in cleanup at the end.   Simplify it and remove it's dependence
on the hash table and remove all of the hash-table regeneration
designed to handle the (unused) case of deleting an arbitrary block.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 57 +++++++++-----------------------------------------------
 trace-events     |  2 +-
 2 files changed, 10 insertions(+), 49 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 4f7dd0d..fe3b76e 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -617,16 +617,11 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
     return 0;
 }
 
-static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
+static int rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
 {
-    RDMALocalBlocks *local = &rdma->local_ram_blocks;
-    RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
-        (void *) block_offset);
-    RDMALocalBlock *old = local->block;
-    int x;
-
-    assert(block);
-
+    if (rdma->blockmap) {
+        g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)block->offset);
+    }
     if (block->pmr) {
         int j;
 
@@ -656,51 +651,15 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
     g_free(block->remote_keys);
     block->remote_keys = NULL;
 
-    for (x = 0; x < local->nb_blocks; x++) {
-        g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)old[x].offset);
-    }
-
     g_free(block->block_name);
     block->block_name = NULL;
 
-    if (local->nb_blocks > 1) {
-
-        local->block = g_malloc0(sizeof(RDMALocalBlock) *
-                                    (local->nb_blocks - 1));
-
-        if (block->index) {
-            memcpy(local->block, old, sizeof(RDMALocalBlock) * block->index);
-        }
-
-        if (block->index < (local->nb_blocks - 1)) {
-            memcpy(local->block + block->index, old + (block->index + 1),
-                sizeof(RDMALocalBlock) *
-                    (local->nb_blocks - (block->index + 1)));
-        }
-    } else {
-        assert(block == local->block);
-        local->block = NULL;
-    }
-
-    trace_rdma_delete_block(local->nb_blocks,
-                           (uintptr_t)block->local_host_addr,
+    trace_rdma_delete_block(block, (uintptr_t)block->local_host_addr,
                            block->offset, block->length,
                             (uintptr_t)(block->local_host_addr + block->length),
                            BITS_TO_LONGS(block->nb_chunks) *
                                sizeof(unsigned long) * 8, block->nb_chunks);
 
-    g_free(old);
-
-    local->nb_blocks--;
-
-    if (local->nb_blocks) {
-        for (x = 0; x < local->nb_blocks; x++) {
-            g_hash_table_insert(rdma->blockmap,
-                                (void *)(uintptr_t)local->block[x].offset,
-                                &local->block[x]);
-        }
-    }
-
     return 0;
 }
 
@@ -2213,9 +2172,11 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
     }
 
     if (rdma->local_ram_blocks.block) {
-        while (rdma->local_ram_blocks.nb_blocks) {
-            rdma_delete_block(rdma, rdma->local_ram_blocks.block->offset);
+        for (idx = 0; idx < rdma->local_ram_blocks.nb_blocks; idx++) {
+            rdma_delete_block(rdma, &rdma->local_ram_blocks.block[idx]);
         }
+        g_free(rdma->local_ram_blocks.block);
+        rdma->local_ram_blocks.block = NULL;
     }
 
     if (rdma->qp) {
diff --git a/trace-events b/trace-events
index baf8647..bdb0868 100644
--- a/trace-events
+++ b/trace-events
@@ -1436,7 +1436,7 @@ qemu_rdma_write_one_sendreg(uint64_t chunk, int len, int index, int64_t offset)
 qemu_rdma_write_one_top(uint64_t chunks, uint64_t size) "Writing %" PRIu64 " chunks, (%" PRIu64 " MB)"
 qemu_rdma_write_one_zero(uint64_t chunk, int len, int index, int64_t offset) "Entire chunk is zero, sending compress: %" PRIu64 " for %d bytes, index: %d, offset: %" PRId64
 rdma_add_block(const char *block_name, int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Added Block: '%s':%d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
-rdma_delete_block(int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Deleted Block: %d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
+rdma_delete_block(void *block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Deleted Block: %p, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
 rdma_start_incoming_migration(void) ""
 rdma_start_incoming_migration_after_dest_init(void) ""
 rdma_start_incoming_migration_after_rdma_listen(void) ""
-- 
2.3.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 08/10] Rework ram block hash
  2015-04-20 15:57 [Qemu-devel] [PATCH 00/10] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
                   ` (6 preceding siblings ...)
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 07/10] Simplify rdma_delete_block and remove it's dependence on the hash Dr. David Alan Gilbert (git)
@ 2015-04-20 15:57 ` Dr. David Alan Gilbert (git)
  2015-05-19 18:49   ` Michael R. Hines
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 09/10] Sort destination RAMBlocks to be the same as the source Dr. David Alan Gilbert (git)
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-20 15:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

RDMA uses a hash from block offset->RAM Block; this isn't needed
on the destination, and now that the destination sorts the ramblock
list, is harder to maintain.

Split the hash so that it's only generated on the source.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 32 ++++++++++++++++++++------------
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index fe3b76e..185817e 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -533,23 +533,22 @@ static int rdma_add_block(RDMAContext *rdma, const char *block_name,
                          ram_addr_t block_offset, uint64_t length)
 {
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
-    RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
-        (void *)(uintptr_t)block_offset);
+    RDMALocalBlock *block;
     RDMALocalBlock *old = local->block;
 
-    assert(block == NULL);
-
     local->block = g_malloc0(sizeof(RDMALocalBlock) * (local->nb_blocks + 1));
 
     if (local->nb_blocks) {
         int x;
 
-        for (x = 0; x < local->nb_blocks; x++) {
-            g_hash_table_remove(rdma->blockmap,
-                                (void *)(uintptr_t)old[x].offset);
-            g_hash_table_insert(rdma->blockmap,
-                                (void *)(uintptr_t)old[x].offset,
-                                &local->block[x]);
+        if (rdma->blockmap) {
+            for (x = 0; x < local->nb_blocks; x++) {
+                g_hash_table_remove(rdma->blockmap,
+                                    (void *)(uintptr_t)old[x].offset);
+                g_hash_table_insert(rdma->blockmap,
+                                    (void *)(uintptr_t)old[x].offset,
+                                    &local->block[x]);
+            }
         }
         memcpy(local->block, old, sizeof(RDMALocalBlock) * local->nb_blocks);
         g_free(old);
@@ -571,7 +570,9 @@ static int rdma_add_block(RDMAContext *rdma, const char *block_name,
 
     block->is_ram_block = local->init ? false : true;
 
-    g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
+    if (rdma->blockmap) {
+        g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
+    }
 
     trace_rdma_add_block(block_name, local->nb_blocks,
                          (uintptr_t) block->local_host_addr,
@@ -607,7 +608,6 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
 
     assert(rdma->blockmap == NULL);
-    rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
     memset(local, 0, sizeof *local);
     qemu_ram_foreach_block(qemu_rdma_init_one_block, rdma);
     trace_qemu_rdma_init_ram_blocks(local->nb_blocks);
@@ -2248,6 +2248,14 @@ static int qemu_rdma_source_init(RDMAContext *rdma, Error **errp, bool pin_all)
         goto err_rdma_source_init;
     }
 
+    /* Build the hash that maps from offset to RAMBlock */
+    rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
+    for (idx = 0; idx < rdma->local_ram_blocks.nb_blocks; idx++) {
+        g_hash_table_insert(rdma->blockmap,
+                (void *)(uintptr_t)rdma->local_ram_blocks.block[idx].offset,
+                &rdma->local_ram_blocks.block[idx]);
+    }
+
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
         if (ret) {
-- 
2.3.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 09/10] Sort destination RAMBlocks to be the same as the source
  2015-04-20 15:57 [Qemu-devel] [PATCH 00/10] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
                   ` (7 preceding siblings ...)
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 08/10] Rework ram block hash Dr. David Alan Gilbert (git)
@ 2015-04-20 15:57 ` Dr. David Alan Gilbert (git)
  2015-05-19 18:51   ` Michael R. Hines
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 10/10] Sanity check RDMA remote data Dr. David Alan Gilbert (git)
  2015-05-18 22:01 ` [Qemu-devel] [PATCH 00/10] Remove RDMA migration dependence on RAMBlock offset Michael R. Hines
  10 siblings, 1 reply; 31+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-20 15:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Use the order of incoming RAMBlocks from the source to record
an index number; that then allows us to sort the destination
local RAMBlock list to match the source.

Now that the RAMBlocks are known to be in the same order, this
simplifies the RDMA Registration step which previously tried to
match RAMBlocks based on offset (which isn't guaranteed to match).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 99 ++++++++++++++++++++++++++++++++++++++++----------------
 trace-events     |  2 ++
 2 files changed, 74 insertions(+), 27 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 185817e..36b78dc 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -224,6 +224,7 @@ typedef struct RDMALocalBlock {
     uint32_t      *remote_keys;     /* rkeys for chunk-level registration */
     uint32_t       remote_rkey;     /* rkeys for non-chunk-level registration */
     int            index;           /* which block are we */
+    unsigned int   src_index;       /* (Only used on dest) */
     bool           is_ram_block;
     int            nb_chunks;
     unsigned long *transit_bitmap;
@@ -353,6 +354,9 @@ typedef struct RDMAContext {
     RDMALocalBlocks local_ram_blocks;
     RDMADestBlock  *dest_blocks;
 
+    /* Index of the next RAMBlock received during block registration */
+    unsigned int    next_src_index;
+
     /*
      * Migration on *destination* started.
      * Then use coroutine yield function.
@@ -561,6 +565,7 @@ static int rdma_add_block(RDMAContext *rdma, const char *block_name,
     block->offset = block_offset;
     block->length = length;
     block->index = local->nb_blocks;
+    block->src_index = 0xdead; /* Filled in by the receipt of the block list */
     block->nb_chunks = ram_chunk_index(host_addr, host_addr + length) + 1UL;
     block->transit_bitmap = bitmap_new(block->nb_chunks);
     bitmap_clear(block->transit_bitmap, 0, block->nb_chunks);
@@ -2865,6 +2870,14 @@ err_rdma_dest_wait:
     return ret;
 }
 
+static int dest_ram_sort_func(const void *a, const void *b)
+{
+    unsigned int a_index = ((const RDMALocalBlock *)a)->src_index;
+    unsigned int b_index = ((const RDMALocalBlock *)b)->src_index;
+
+    return (a_index < b_index) ? -1 : (a_index != b_index);
+}
+
 /*
  * During each iteration of the migration, we listen for instructions
  * by the source VM to perform dynamic page registrations before they
@@ -2942,6 +2955,13 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
         case RDMA_CONTROL_RAM_BLOCKS_REQUEST:
             trace_qemu_rdma_registration_handle_ram_blocks();
 
+            /* Sort our local RAM Block list so it's the same as the source,
+             * we can do this since we've filled in a src_index in the list
+             * as we received the RAMBlock list earlier.
+             */
+            qsort(rdma->local_ram_blocks.block,
+                  rdma->local_ram_blocks.nb_blocks,
+                  sizeof(RDMALocalBlock), dest_ram_sort_func);
             if (rdma->pin_all) {
                 ret = qemu_rdma_reg_whole_ram_blocks(rdma);
                 if (ret) {
@@ -2969,6 +2989,12 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
                 rdma->dest_blocks[i].length = local->block[i].length;
 
                 dest_block_to_network(&rdma->dest_blocks[i]);
+                trace_qemu_rdma_registration_handle_ram_blocks_loop(
+                    local->block[i].block_name,
+                    local->block[i].offset,
+                    local->block[i].length,
+                    local->block[i].local_host_addr,
+                    local->block[i].src_index);
             }
 
             blocks.len = rdma->local_ram_blocks.nb_blocks
@@ -3092,12 +3118,43 @@ out:
     return ret;
 }
 
+/* Destination:
+ * Called via a ram_control_load_hook during the initial RAM load section which
+ * lists the RAMBlocks by name.  This lets us know the order of the RAMBlocks
+ * on the source.
+ * We've already built our local RAMBlock list, but not yet sent the list to
+ * the source.
+ */
+static int rdma_block_notification_handle(QEMUFileRDMA *rfile, const char *name)
+{
+    RDMAContext *rdma = rfile->rdma;
+    int curr;
+    int found = -1;
+
+    /* Find the matching RAMBlock in our local list */
+    for (curr = 0; curr < rdma->local_ram_blocks.nb_blocks; curr++) {
+        if (!strcmp(rdma->local_ram_blocks.block[curr].block_name, name)) {
+            found = curr;
+        }
+    }
+
+    if (found == -1) {
+        error_report("RAMBlock '%s' not found on destination", name);
+        return -ENOENT;
+    }
+
+    rdma->local_ram_blocks.block[curr].src_index = rdma->next_src_index;
+    trace_rdma_block_notification_handle(name, rdma->next_src_index);
+    rdma->next_src_index++;
+
+    return 0;
+}
+
 static int rdma_load_hook(QEMUFile *f, void *opaque, uint64_t flags, void *data)
 {
     switch (flags) {
     case RAM_CONTROL_BLOCK_REG:
-        /* TODO A later patch */
-        return -1;
+        return rdma_block_notification_handle(opaque, data);
         break;
 
     case RAM_CONTROL_HOOK:
@@ -3149,7 +3206,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
     if (flags == RAM_CONTROL_SETUP) {
         RDMAControlHeader resp = {.type = RDMA_CONTROL_RAM_BLOCKS_RESULT };
         RDMALocalBlocks *local = &rdma->local_ram_blocks;
-        int reg_result_idx, i, j, nb_dest_blocks;
+        int reg_result_idx, i, nb_dest_blocks;
 
         head.type = RDMA_CONTROL_RAM_BLOCKS_REQUEST;
         trace_qemu_rdma_registration_stop_ram();
@@ -3185,9 +3242,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
          */
 
         if (local->nb_blocks != nb_dest_blocks) {
-            ERROR(errp, "ram blocks mismatch #1! "
+            ERROR(errp, "ram blocks mismatch (Number of blocks %d vs %d) "
                         "Your QEMU command line parameters are probably "
-                        "not identical on both the source and destination.");
+                        "not identical on both the source and destination.",
+                        local->nb_blocks, nb_dest_blocks);
             return -EINVAL;
         }
 
@@ -3197,30 +3255,17 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
         for (i = 0; i < nb_dest_blocks; i++) {
             network_to_dest_block(&rdma->dest_blocks[i]);
 
-            /* search local ram blocks */
-            for (j = 0; j < local->nb_blocks; j++) {
-                if (rdma->dest_blocks[i].offset != local->block[j].offset) {
-                    continue;
-                }
-
-                if (rdma->dest_blocks[i].length != local->block[j].length) {
-                    ERROR(errp, "ram blocks mismatch #2! "
-                        "Your QEMU command line parameters are probably "
-                        "not identical on both the source and destination.");
-                    return -EINVAL;
-                }
-                local->block[j].remote_host_addr =
-                        rdma->dest_blocks[i].remote_host_addr;
-                local->block[j].remote_rkey = rdma->dest_blocks[i].remote_rkey;
-                break;
-            }
-
-            if (j >= local->nb_blocks) {
-                ERROR(errp, "ram blocks mismatch #3! "
-                        "Your QEMU command line parameters are probably "
-                        "not identical on both the source and destination.");
+            /* We require that the blocks are in the same order */
+            if (rdma->dest_blocks[i].length != local->block[i].length) {
+                ERROR(errp, "Block %s/%d has a different length %" PRIu64
+                            "vs %" PRIu64, local->block[i].block_name, i,
+                            local->block[i].length,
+                            rdma->dest_blocks[i].length);
                 return -EINVAL;
             }
+            local->block[i].remote_host_addr =
+                    rdma->dest_blocks[i].remote_host_addr;
+            local->block[i].remote_rkey = rdma->dest_blocks[i].remote_rkey;
         }
     }
 
diff --git a/trace-events b/trace-events
index bdb0868..9966891 100644
--- a/trace-events
+++ b/trace-events
@@ -1410,6 +1410,7 @@ qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %" PRIu6
 qemu_rdma_registration_handle_compress(int64_t length, int index, int64_t offset) "Zapping zero chunk: %" PRId64 " bytes, index %d, offset %" PRId64
 qemu_rdma_registration_handle_finished(void) ""
 qemu_rdma_registration_handle_ram_blocks(void) ""
+qemu_rdma_registration_handle_ram_blocks_loop(const char *name, uint64_t offset, uint64_t length, void *local_host_addr, unsigned int src_index) "%s: @%" PRIx64 "/%" PRIu64 " host:@%p src_index: %u"
 qemu_rdma_registration_handle_register(int requests) "%d requests"
 qemu_rdma_registration_handle_register_loop(int req, int index, uint64_t addr, uint64_t chunks) "Registration request (%d): index %d, current_addr %" PRIu64 " chunks: %" PRIu64
 qemu_rdma_registration_handle_register_rkey(int rkey) "%x"
@@ -1436,6 +1437,7 @@ qemu_rdma_write_one_sendreg(uint64_t chunk, int len, int index, int64_t offset)
 qemu_rdma_write_one_top(uint64_t chunks, uint64_t size) "Writing %" PRIu64 " chunks, (%" PRIu64 " MB)"
 qemu_rdma_write_one_zero(uint64_t chunk, int len, int index, int64_t offset) "Entire chunk is zero, sending compress: %" PRIu64 " for %d bytes, index: %d, offset: %" PRId64
 rdma_add_block(const char *block_name, int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Added Block: '%s':%d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
+rdma_block_notification_handle(const char *name, int index) "%s at %d"
 rdma_delete_block(void *block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Deleted Block: %p, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
 rdma_start_incoming_migration(void) ""
 rdma_start_incoming_migration_after_dest_init(void) ""
-- 
2.3.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 10/10] Sanity check RDMA remote data
  2015-04-20 15:57 [Qemu-devel] [PATCH 00/10] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
                   ` (8 preceding siblings ...)
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 09/10] Sort destination RAMBlocks to be the same as the source Dr. David Alan Gilbert (git)
@ 2015-04-20 15:57 ` Dr. David Alan Gilbert (git)
  2015-05-19 18:52   ` Michael R. Hines
  2015-05-18 22:01 ` [Qemu-devel] [PATCH 00/10] Remove RDMA migration dependence on RAMBlock offset Michael R. Hines
  10 siblings, 1 reply; 31+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-04-20 15:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, arei.gonglei, mrhines, quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Perform some basic (but probably not complete) sanity checking on
requests from the RDMA source.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index 36b78dc..0c9d446 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2940,6 +2940,13 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
             trace_qemu_rdma_registration_handle_compress(comp->length,
                                                          comp->block_idx,
                                                          comp->offset);
+            if (comp->block_idx >= rdma->local_ram_blocks.nb_blocks) {
+                error_report("rdma: 'compress' bad block index %u (vs %d)",
+                             (unsigned int)comp->block_idx,
+                             rdma->local_ram_blocks.nb_blocks);
+                ret = -EIO;
+                break;
+            }
             block = &(rdma->local_ram_blocks.block[comp->block_idx]);
 
             host_addr = block->local_host_addr +
@@ -3028,8 +3035,23 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
                 trace_qemu_rdma_registration_handle_register_loop(count,
                          reg->current_index, reg->key.current_addr, reg->chunks);
 
+                if (reg->current_index >= rdma->local_ram_blocks.nb_blocks) {
+                    error_report("rdma: 'register' bad block index %u (vs %d)",
+                                 (unsigned int)reg->current_index,
+                                 rdma->local_ram_blocks.nb_blocks);
+                    ret = -ENOENT;
+                    break;
+                }
                 block = &(rdma->local_ram_blocks.block[reg->current_index]);
                 if (block->is_ram_block) {
+                    if (block->offset > reg->key.current_addr) {
+                        error_report("rdma: bad register address for block %s"
+                            " offset: %" PRIx64 " current_addr: %" PRIx64,
+                            block->block_name, block->offset,
+                            reg->key.current_addr);
+                        ret = -ERANGE;
+                        break;
+                    }
                     host_addr = (block->local_host_addr +
                                 (reg->key.current_addr - block->offset));
                     chunk = ram_chunk_index(block->local_host_addr,
@@ -3038,6 +3060,14 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
                     chunk = reg->key.chunk;
                     host_addr = block->local_host_addr +
                         (reg->key.chunk * (1UL << RDMA_REG_CHUNK_SHIFT));
+                    /* Check for particularly bad chunk value */
+                    if (host_addr < (void *)block->local_host_addr) {
+                        error_report("rdma: bad chunk for block %s"
+                            " chunk: %" PRIx64,
+                            block->block_name, reg->key.chunk);
+                        ret = -ERANGE;
+                        break;
+                    }
                 }
                 chunk_start = ram_chunk_start(block, chunk);
                 chunk_end = ram_chunk_end(block, chunk + reg->chunks);
-- 
2.3.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 00/10] Remove RDMA migration dependence on RAMBlock offset
  2015-04-20 15:57 [Qemu-devel] [PATCH 00/10] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
                   ` (9 preceding siblings ...)
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 10/10] Sanity check RDMA remote data Dr. David Alan Gilbert (git)
@ 2015-05-18 22:01 ` Michael R. Hines
  10 siblings, 0 replies; 31+ messages in thread
From: Michael R. Hines @ 2015-05-18 22:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: amit.shah, arei.gonglei, mrhines, quintela

On 04/20/2015 10:57 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> RDMA migration currently relies on the source and destination RAMBlocks
> having the same offsets within ram_addr_t space;  unfortunately that's
> just not true when:
>     a) You hotplug on the source but then create the device on the command line
>       on the destination.
>
>     b) Across two versions of qemu
>
> Thus there are migrations that work with TCP that don't with RDMA.
>
> The changes keep stream compatibility with existing RDMA migration,
> so cases that already work (i.e. no hotplug) will keep working.
>
> With some light testing this seems to work; hopefully I've got all the
> cases that pass offsets back and forward.
>
> Dave
>
> Dr. David Alan Gilbert (10):
>    Rename RDMA structures to make destination clear
>    qemu_ram_foreach_block: pass up error value, and down the ramblock
>      name
>    Store block name in local blocks structure
>    Translate offsets to destination address space
>    Rework ram_control_load_hook to hook during block load
>    Remove unneeded memset
>    Simplify rdma_delete_block and remove it's dependence on the hash
>    Rework ram block hash
>    Sort destination RAMBlocks to be the same as the source
>    Sanity check RDMA remote data
>
>   arch_init.c                   |   4 +-
>   exec.c                        |  10 +-
>   include/exec/cpu-common.h     |   4 +-
>   include/migration/migration.h |   2 +-
>   include/migration/qemu-file.h |  14 +-
>   migration/qemu-file.c         |   8 +-
>   migration/rdma.c              | 367 ++++++++++++++++++++++++++----------------
>   trace-events                  |   8 +-
>   8 files changed, 257 insertions(+), 160 deletions(-)
>

OK, I've got the patchset open now on my new office in the US.

Getting ready to go through it..........

- Michael

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] Rename RDMA structures to make destination clear
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 01/10] Rename RDMA structures to make destination clear Dr. David Alan Gilbert (git)
@ 2015-05-19 17:52   ` Michael R. Hines
  0 siblings, 0 replies; 31+ messages in thread
From: Michael R. Hines @ 2015-05-19 17:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: amit.shah, arei.gonglei, mrhines, quintela

On 04/20/2015 10:57 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> RDMA has two data types that are named confusingly;
>     RDMALocalBlock (pointed to indirectly by local_ram_blocks)
>     RDMARemoteBlock (pointed to by block in RDMAContext)
>
> RDMALocalBlocks, as the name suggests is a data strucuture that
> represents the RDMAable RAM Blocks on the current side of the migration
> whichever that is.
>
> RDMARemoteBlocks is always the shape of the RAMBlocks on the
> destination, even on the destination.
>
> Rename:
>       RDMARemoteBlock -> RDMADestBlock
>       context->'block' -> context->dest_blocks
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   migration/rdma.c | 66 ++++++++++++++++++++++++++++----------------------------
>   1 file changed, 33 insertions(+), 33 deletions(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 77e3444..089adcf 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -236,13 +236,13 @@ typedef struct RDMALocalBlock {
>    * corresponding RDMALocalBlock with
>    * the information needed to perform the actual RDMA.
>    */
> -typedef struct QEMU_PACKED RDMARemoteBlock {
> +typedef struct QEMU_PACKED RDMADestBlock {
>       uint64_t remote_host_addr;
>       uint64_t offset;
>       uint64_t length;
>       uint32_t remote_rkey;
>       uint32_t padding;
> -} RDMARemoteBlock;
> +} RDMADestBlock;
>
>   static uint64_t htonll(uint64_t v)
>   {
> @@ -258,20 +258,20 @@ static uint64_t ntohll(uint64_t v) {
>       return ((uint64_t)ntohl(u.lv[0]) << 32) | (uint64_t) ntohl(u.lv[1]);
>   }
>
> -static void remote_block_to_network(RDMARemoteBlock *rb)
> +static void dest_block_to_network(RDMADestBlock *db)
>   {
> -    rb->remote_host_addr = htonll(rb->remote_host_addr);
> -    rb->offset = htonll(rb->offset);
> -    rb->length = htonll(rb->length);
> -    rb->remote_rkey = htonl(rb->remote_rkey);
> +    db->remote_host_addr = htonll(db->remote_host_addr);
> +    db->offset = htonll(db->offset);
> +    db->length = htonll(db->length);
> +    db->remote_rkey = htonl(db->remote_rkey);
>   }
>
> -static void network_to_remote_block(RDMARemoteBlock *rb)
> +static void network_to_dest_block(RDMADestBlock *db)
>   {
> -    rb->remote_host_addr = ntohll(rb->remote_host_addr);
> -    rb->offset = ntohll(rb->offset);
> -    rb->length = ntohll(rb->length);
> -    rb->remote_rkey = ntohl(rb->remote_rkey);
> +    db->remote_host_addr = ntohll(db->remote_host_addr);
> +    db->offset = ntohll(db->offset);
> +    db->length = ntohll(db->length);
> +    db->remote_rkey = ntohl(db->remote_rkey);
>   }
>
>   /*
> @@ -350,7 +350,7 @@ typedef struct RDMAContext {
>        * Description of ram blocks used throughout the code.
>        */
>       RDMALocalBlocks local_ram_blocks;
> -    RDMARemoteBlock *block;
> +    RDMADestBlock  *dest_blocks;
>
>       /*
>        * Migration on *destination* started.
> @@ -590,7 +590,7 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
>       memset(local, 0, sizeof *local);
>       qemu_ram_foreach_block(qemu_rdma_init_one_block, rdma);
>       trace_qemu_rdma_init_ram_blocks(local->nb_blocks);
> -    rdma->block = (RDMARemoteBlock *) g_malloc0(sizeof(RDMARemoteBlock) *
> +    rdma->dest_blocks = (RDMADestBlock *) g_malloc0(sizeof(RDMADestBlock) *
>                           rdma->local_ram_blocks.nb_blocks);
>       local->init = true;
>       return 0;
> @@ -2177,8 +2177,8 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>           rdma->connected = false;
>       }
>
> -    g_free(rdma->block);
> -    rdma->block = NULL;
> +    g_free(rdma->dest_blocks);
> +    rdma->dest_blocks = NULL;
>
>       for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
>           if (rdma->wr_data[idx].control_mr) {
> @@ -2967,25 +2967,25 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque,
>                * their "local" descriptions with what was sent.
>                */
>               for (i = 0; i < local->nb_blocks; i++) {
> -                rdma->block[i].remote_host_addr =
> +                rdma->dest_blocks[i].remote_host_addr =
>                       (uintptr_t)(local->block[i].local_host_addr);
>
>                   if (rdma->pin_all) {
> -                    rdma->block[i].remote_rkey = local->block[i].mr->rkey;
> +                    rdma->dest_blocks[i].remote_rkey = local->block[i].mr->rkey;
>                   }
>
> -                rdma->block[i].offset = local->block[i].offset;
> -                rdma->block[i].length = local->block[i].length;
> +                rdma->dest_blocks[i].offset = local->block[i].offset;
> +                rdma->dest_blocks[i].length = local->block[i].length;
>
> -                remote_block_to_network(&rdma->block[i]);
> +                dest_block_to_network(&rdma->dest_blocks[i]);
>               }
>
>               blocks.len = rdma->local_ram_blocks.nb_blocks
> -                                                * sizeof(RDMARemoteBlock);
> +                                                * sizeof(RDMADestBlock);
>
>
>               ret = qemu_rdma_post_send_control(rdma,
> -                                        (uint8_t *) rdma->block, &blocks);
> +                                        (uint8_t *) rdma->dest_blocks, &blocks);
>
>               if (ret < 0) {
>                   error_report("rdma migration: error sending remote info");
> @@ -3141,7 +3141,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>       if (flags == RAM_CONTROL_SETUP) {
>           RDMAControlHeader resp = {.type = RDMA_CONTROL_RAM_BLOCKS_RESULT };
>           RDMALocalBlocks *local = &rdma->local_ram_blocks;
> -        int reg_result_idx, i, j, nb_remote_blocks;
> +        int reg_result_idx, i, j, nb_dest_blocks;
>
>           head.type = RDMA_CONTROL_RAM_BLOCKS_REQUEST;
>           trace_qemu_rdma_registration_stop_ram();
> @@ -3162,7 +3162,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>               return ret;
>           }
>
> -        nb_remote_blocks = resp.len / sizeof(RDMARemoteBlock);
> +        nb_dest_blocks = resp.len / sizeof(RDMADestBlock);
>
>           /*
>            * The protocol uses two different sets of rkeys (mutually exclusive):
> @@ -3176,7 +3176,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>            * and then propagates the remote ram block descriptions to his local copy.
>            */
>
> -        if (local->nb_blocks != nb_remote_blocks) {
> +        if (local->nb_blocks != nb_dest_blocks) {
>               ERROR(errp, "ram blocks mismatch #1! "
>                           "Your QEMU command line parameters are probably "
>                           "not identical on both the source and destination.");
> @@ -3184,26 +3184,26 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
>           }
>
>           qemu_rdma_move_header(rdma, reg_result_idx, &resp);
> -        memcpy(rdma->block,
> +        memcpy(rdma->dest_blocks,
>               rdma->wr_data[reg_result_idx].control_curr, resp.len);
> -        for (i = 0; i < nb_remote_blocks; i++) {
> -            network_to_remote_block(&rdma->block[i]);
> +        for (i = 0; i < nb_dest_blocks; i++) {
> +            network_to_dest_block(&rdma->dest_blocks[i]);
>
>               /* search local ram blocks */
>               for (j = 0; j < local->nb_blocks; j++) {
> -                if (rdma->block[i].offset != local->block[j].offset) {
> +                if (rdma->dest_blocks[i].offset != local->block[j].offset) {
>                       continue;
>                   }
>
> -                if (rdma->block[i].length != local->block[j].length) {
> +                if (rdma->dest_blocks[i].length != local->block[j].length) {
>                       ERROR(errp, "ram blocks mismatch #2! "
>                           "Your QEMU command line parameters are probably "
>                           "not identical on both the source and destination.");
>                       return -EINVAL;
>                   }
>                   local->block[j].remote_host_addr =
> -                        rdma->block[i].remote_host_addr;
> -                local->block[j].remote_rkey = rdma->block[i].remote_rkey;
> +                        rdma->dest_blocks[i].remote_host_addr;
> +                local->block[j].remote_rkey = rdma->dest_blocks[i].remote_rkey;
>                   break;
>               }
>

Good to get these renamed, thanks.

Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] qemu_ram_foreach_block: pass up error value, and down the ramblock name
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 02/10] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
@ 2015-05-19 17:56   ` Michael R. Hines
  0 siblings, 0 replies; 31+ messages in thread
From: Michael R. Hines @ 2015-05-19 17:56 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: amit.shah, arei.gonglei, mrhines, quintela

On 04/20/2015 10:57 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> check the return value of the function it calls and error if it's non-0
> Fixup qemu_rdma_init_one_block that is the only current caller,
>    and rdma_add_block the only function it calls using it.
>
> Pass the name of the ramblock to the function; helps in debugging.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>   exec.c                    | 10 ++++++++--
>   include/exec/cpu-common.h |  4 ++--
>   migration/rdma.c          |  4 ++--
>   3 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 874ecfc..7693794 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -3067,14 +3067,20 @@ bool cpu_physical_memory_is_io(hwaddr phys_addr)
>                memory_region_is_romd(mr));
>   }
>
> -void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
> +int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
>   {
>       RAMBlock *block;
> +    int ret = 0;
>
>       rcu_read_lock();
>       QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
> -        func(block->host, block->offset, block->used_length, opaque);
> +        ret = func(block->idstr, block->host, block->offset,
> +                   block->used_length, opaque);
> +        if (ret) {
> +            break;
> +        }
>       }
>       rcu_read_unlock();
> +    return ret;
>   }
>   #endif
> diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> index fcc3162..2abecac 100644
> --- a/include/exec/cpu-common.h
> +++ b/include/exec/cpu-common.h
> @@ -125,10 +125,10 @@ void cpu_flush_icache_range(hwaddr start, int len);
>   extern struct MemoryRegion io_mem_rom;
>   extern struct MemoryRegion io_mem_notdirty;
>
> -typedef void (RAMBlockIterFunc)(void *host_addr,
> +typedef int (RAMBlockIterFunc)(const char *block_name, void *host_addr,
>       ram_addr_t offset, ram_addr_t length, void *opaque);
>
> -void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
> +int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
>
>   #endif
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 089adcf..38e5f44 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -570,10 +570,10 @@ static int rdma_add_block(RDMAContext *rdma, void *host_addr,
>    * in advanced before the migration starts. This tells us where the RAM blocks
>    * are so that we can register them individually.
>    */
> -static void qemu_rdma_init_one_block(void *host_addr,
> +static int qemu_rdma_init_one_block(const char *block_name, void *host_addr,
>       ram_addr_t block_offset, ram_addr_t length, void *opaque)
>   {
> -    rdma_add_block(opaque, host_addr, block_offset, length);
> +    return rdma_add_block(opaque, host_addr, block_offset, length);
>   }
>
>   /*

Shame on me for not checking the return value =)

Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 03/10] Store block name in local blocks structure
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 03/10] Store block name in local blocks structure Dr. David Alan Gilbert (git)
@ 2015-05-19 18:00   ` Michael R. Hines
  2015-05-19 18:46     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 31+ messages in thread
From: Michael R. Hines @ 2015-05-19 18:00 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: amit.shah, arei.gonglei, mrhines, quintela

On 04/20/2015 10:57 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> In a later patch the block name will be used to match up two views
> of the block list.  Keep a copy of the block name with the local block
> list.
>
> (At some point it could be argued that it would be best just to let
> migration see the innards of RAMBlock and avoid the need to use
> foreach).

Maybe a silly question, but is there anything to enforce that
the string names are always unique?

> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   migration/rdma.c | 35 +++++++++++++++++++++--------------
>   trace-events     |  2 +-
>   2 files changed, 22 insertions(+), 15 deletions(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 38e5f44..c3814c5 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -214,17 +214,18 @@ static void network_to_caps(RDMACapabilities *cap)
>    * the information. It's small anyway, so a list is overkill.
>    */
>   typedef struct RDMALocalBlock {
> -    uint8_t  *local_host_addr; /* local virtual address */
> -    uint64_t remote_host_addr; /* remote virtual address */
> -    uint64_t offset;
> -    uint64_t length;
> -    struct   ibv_mr **pmr;     /* MRs for chunk-level registration */
> -    struct   ibv_mr *mr;       /* MR for non-chunk-level registration */
> -    uint32_t *remote_keys;     /* rkeys for chunk-level registration */
> -    uint32_t remote_rkey;      /* rkeys for non-chunk-level registration */
> -    int      index;            /* which block are we */
> -    bool     is_ram_block;
> -    int      nb_chunks;
> +    char          *block_name;
> +    uint8_t       *local_host_addr; /* local virtual address */
> +    uint64_t       remote_host_addr; /* remote virtual address */
> +    uint64_t       offset;
> +    uint64_t       length;
> +    struct         ibv_mr **pmr;    /* MRs for chunk-level registration */
> +    struct         ibv_mr *mr;      /* MR for non-chunk-level registration */
> +    uint32_t      *remote_keys;     /* rkeys for chunk-level registration */
> +    uint32_t       remote_rkey;     /* rkeys for non-chunk-level registration */
> +    int            index;           /* which block are we */
> +    bool           is_ram_block;
> +    int            nb_chunks;
>       unsigned long *transit_bitmap;
>       unsigned long *unregister_bitmap;
>   } RDMALocalBlock;
> @@ -510,7 +511,8 @@ static inline uint8_t *ram_chunk_end(const RDMALocalBlock *rdma_ram_block,
>       return result;
>   }
>
> -static int rdma_add_block(RDMAContext *rdma, void *host_addr,
> +static int rdma_add_block(RDMAContext *rdma, const char *block_name,
> +                         void *host_addr,
>                            ram_addr_t block_offset, uint64_t length)
>   {
>       RDMALocalBlocks *local = &rdma->local_ram_blocks;
> @@ -538,6 +540,7 @@ static int rdma_add_block(RDMAContext *rdma, void *host_addr,
>
>       block = &local->block[local->nb_blocks];
>
> +    block->block_name = g_strdup(block_name);
>       block->local_host_addr = host_addr;
>       block->offset = block_offset;
>       block->length = length;
> @@ -553,7 +556,8 @@ static int rdma_add_block(RDMAContext *rdma, void *host_addr,
>
>       g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
>
> -    trace_rdma_add_block(local->nb_blocks, (uintptr_t) block->local_host_addr,
> +    trace_rdma_add_block(block_name, local->nb_blocks,
> +                         (uintptr_t) block->local_host_addr,
>                            block->offset, block->length,
>                            (uintptr_t) (block->local_host_addr + block->length),
>                            BITS_TO_LONGS(block->nb_chunks) *
> @@ -573,7 +577,7 @@ static int rdma_add_block(RDMAContext *rdma, void *host_addr,
>   static int qemu_rdma_init_one_block(const char *block_name, void *host_addr,
>       ram_addr_t block_offset, ram_addr_t length, void *opaque)
>   {
> -    return rdma_add_block(opaque, host_addr, block_offset, length);
> +    return rdma_add_block(opaque, block_name, host_addr, block_offset, length);
>   }
>
>   /*
> @@ -639,6 +643,9 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
>           g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)old[x].offset);
>       }
>
> +    g_free(block->block_name);
> +    block->block_name = NULL;
> +
>       if (local->nb_blocks > 1) {
>
>           local->block = g_malloc0(sizeof(RDMALocalBlock) *
> diff --git a/trace-events b/trace-events
> index 30eba92..07f15da 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1435,7 +1435,7 @@ qemu_rdma_write_one_recvregres(int mykey, int theirkey, uint64_t chunk) "Receive
>   qemu_rdma_write_one_sendreg(uint64_t chunk, int len, int index, int64_t offset) "Sending registration request chunk %" PRIu64 " for %d bytes, index: %d, offset: %" PRId64
>   qemu_rdma_write_one_top(uint64_t chunks, uint64_t size) "Writing %" PRIu64 " chunks, (%" PRIu64 " MB)"
>   qemu_rdma_write_one_zero(uint64_t chunk, int len, int index, int64_t offset) "Entire chunk is zero, sending compress: %" PRIu64 " for %d bytes, index: %d, offset: %" PRId64
> -rdma_add_block(int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Added Block: %d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
> +rdma_add_block(const char *block_name, int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Added Block: '%s':%d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
>   rdma_delete_block(int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Deleted Block: %d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
>   rdma_start_incoming_migration(void) ""
>   rdma_start_incoming_migration_after_dest_init(void) ""

Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 04/10] Translate offsets to destination address space
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 04/10] Translate offsets to destination address space Dr. David Alan Gilbert (git)
@ 2015-05-19 18:28   ` Michael R. Hines
  2015-05-19 18:44     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 31+ messages in thread
From: Michael R. Hines @ 2015-05-19 18:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: amit.shah, arei.gonglei, mrhines, quintela

On 04/20/2015 10:57 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> The 'offset' field in RDMACompress and 'current_addr' field
> in RDMARegister are commented as being offsets within a particular
> RAMBlock, however they appear to actually be offsets within the
> ram_addr_t space.
>
> The code currently assumes that the offsets on the source/destination
> match, this change removes the need for the assumption for these
> structures by translating the addresses into the ram_addr_t space of
> the destination host.

I don't understand fully: If the offsets are not the same, then
why would the RAMBlocks be the same? If a RAMBlock
is hot-plugged on one side, shouldn't an identical one be
hotplugged on the other side, including the offset into ram_addr_t?
(Even if the base address of the ram_addr_t space is different
between source and destination, then at least the offsets
and list of blocks should be the same, no?)

Is hotplugging an asynchronous operation for post-copy or
something?

>
> Note: An alternative would be to change the fields to actually
> take the data they're commented for; this would potentially be
> simpler but would break stream compatibility for those cases
> that currently work.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   migration/rdma.c | 31 ++++++++++++++++++++++++-------
>   1 file changed, 24 insertions(+), 7 deletions(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index c3814c5..2c0d11b 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -411,7 +411,7 @@ static void network_to_control(RDMAControlHeader *control)
>    */
>   typedef struct QEMU_PACKED {
>       union QEMU_PACKED {
> -        uint64_t current_addr;  /* offset into the ramblock of the chunk */
> +        uint64_t current_addr;  /* offset into the ram_addr_t space */
>           uint64_t chunk;         /* chunk to lookup if unregistering */
>       } key;
>       uint32_t current_index; /* which ramblock the chunk belongs to */
> @@ -419,8 +419,19 @@ typedef struct QEMU_PACKED {
>       uint64_t chunks;            /* how many sequential chunks to register */
>   } RDMARegister;

The below seems OK, but I would prefer not to do this translation here.
Can the source and destination apply the offset calculations outside
of the byte-order functions? Like, before register_to_network, the
source removes the offset, and then when the message is received,
the destination then again re-applies the correct offset?

> -static void register_to_network(RDMARegister *reg)
> +static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
>   {
> +    RDMALocalBlock *local_block;
> +    local_block  = &rdma->local_ram_blocks.block[reg->current_index];
> +
> +    if (local_block->is_ram_block) {
> +        /*
> +         * current_addr as passed in is an address in the local ram_addr_t
> +         * space, we need to translate this for the destination
> +         */
> +        reg->key.current_addr -= local_block->offset;
> +        reg->key.current_addr += rdma->dest_blocks[reg->current_index].offset;
> +    }
>       reg->key.current_addr = htonll(reg->key.current_addr);
>       reg->current_index = htonl(reg->current_index);
>       reg->chunks = htonll(reg->chunks);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 05/10] Rework ram_control_load_hook to hook during block load
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 05/10] Rework ram_control_load_hook to hook during block load Dr. David Alan Gilbert (git)
@ 2015-05-19 18:35   ` Michael R. Hines
  2015-05-19 18:49     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 31+ messages in thread
From: Michael R. Hines @ 2015-05-19 18:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: amit.shah, arei.gonglei, mrhines, quintela

On 04/20/2015 10:57 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> We need the names of RAMBlocks as they're loaded for RDMA,
> reuse an existing QEMUFile hook with some small mods.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   arch_init.c                   |  4 +++-
>   include/migration/migration.h |  2 +-
>   include/migration/qemu-file.h | 14 +++++++++-----
>   migration/qemu-file.c         |  8 ++++----
>   migration/rdma.c              | 28 ++++++++++++++++++++++------
>   trace-events                  |  2 +-
>   6 files changed, 40 insertions(+), 18 deletions(-)
>
> diff --git a/arch_init.c b/arch_init.c
> index 4c8fcee..6c0b355 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -1164,6 +1164,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>                                   error_report_err(local_err);
>                               }
>                           }
> +                        ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG,
> +                                              block->idstr);
>                           break;
>                       }
>                   }
> @@ -1215,7 +1217,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>               break;
>           default:
>               if (flags & RAM_SAVE_FLAG_HOOK) {
> -                ram_control_load_hook(f, flags);
> +                ram_control_load_hook(f, RAM_CONTROL_HOOK, NULL);
>               } else {
>                   error_report("Unknown combination of migration flags: %#x",
>                                flags);
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index bf09968..3c2f91a 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -154,7 +154,7 @@ int64_t xbzrle_cache_resize(int64_t new_size);
>
>   void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
>   void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
> -void ram_control_load_hook(QEMUFile *f, uint64_t flags);
> +void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data);
>
>   /* Whenever this is found in the data stream, the flags
>    * will be passed to ram_control_load_hook in the incoming-migration
> diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
> index 745a850..a3ceb3b 100644
> --- a/include/migration/qemu-file.h
> +++ b/include/migration/qemu-file.h
> @@ -63,16 +63,20 @@ typedef ssize_t (QEMUFileWritevBufferFunc)(void *opaque, struct iovec *iov,
>   /*
>    * This function provides hooks around different
>    * stages of RAM migration.
> + * 'opaque' is the backend specific data in QEMUFile
> + * 'data' is call specific data associated with the 'flags' value
>    */

What data is this exactly?

> -typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags);
> +typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags,
> +                              void *data);
>
>   /*
>    * Constants used by ram_control_* hooks
>    */
> -#define RAM_CONTROL_SETUP    0
> -#define RAM_CONTROL_ROUND    1
> -#define RAM_CONTROL_HOOK     2
> -#define RAM_CONTROL_FINISH   3
> +#define RAM_CONTROL_SETUP     0
> +#define RAM_CONTROL_ROUND     1
> +#define RAM_CONTROL_HOOK      2
> +#define RAM_CONTROL_FINISH    3
> +#define RAM_CONTROL_BLOCK_REG 4
>
>   /*
>    * This function allows override of where the RAM page
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 1a4f986..4986e05 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -127,7 +127,7 @@ void ram_control_before_iterate(QEMUFile *f, uint64_t flags)
>       int ret = 0;
>
>       if (f->ops->before_ram_iterate) {
> -        ret = f->ops->before_ram_iterate(f, f->opaque, flags);
> +        ret = f->ops->before_ram_iterate(f, f->opaque, flags, NULL);
>           if (ret < 0) {
>               qemu_file_set_error(f, ret);
>           }
> @@ -139,19 +139,19 @@ void ram_control_after_iterate(QEMUFile *f, uint64_t flags)
>       int ret = 0;
>
>       if (f->ops->after_ram_iterate) {
> -        ret = f->ops->after_ram_iterate(f, f->opaque, flags);
> +        ret = f->ops->after_ram_iterate(f, f->opaque, flags, NULL);
>           if (ret < 0) {
>               qemu_file_set_error(f, ret);
>           }
>       }
>   }
>
> -void ram_control_load_hook(QEMUFile *f, uint64_t flags)
> +void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data)
>   {
>       int ret = -EINVAL;
>
>       if (f->ops->hook_ram_load) {
> -        ret = f->ops->hook_ram_load(f, f->opaque, flags);
> +        ret = f->ops->hook_ram_load(f, f->opaque, flags, data);
>           if (ret < 0) {
>               qemu_file_set_error(f, ret);
>           }
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 2c0d11b..e43fae4 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2906,8 +2906,7 @@ err_rdma_dest_wait:
>    *
>    * Keep doing this until the source tells us to stop.
>    */
> -static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque,
> -                                         uint64_t flags)
> +static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>   {
>       RDMAControlHeader reg_resp = { .len = sizeof(RDMARegisterResult),
>                                  .type = RDMA_CONTROL_REGISTER_RESULT,
> @@ -2937,7 +2936,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque,
>       CHECK_ERROR_STATE();
>
>       do {
> -        trace_qemu_rdma_registration_handle_wait(flags);
> +        trace_qemu_rdma_registration_handle_wait();
>
>           ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_NONE);
>
> @@ -3125,8 +3124,25 @@ out:
>       return ret;
>   }
>
> +static int rdma_load_hook(QEMUFile *f, void *opaque, uint64_t flags, void *data)
> +{
> +    switch (flags) {
> +    case RAM_CONTROL_BLOCK_REG:
> +        /* TODO A later patch */
> +        return -1;
> +        break;
> +
> +    case RAM_CONTROL_HOOK:
> +        return qemu_rdma_registration_handle(f, opaque);
> +
> +    default:
> +        /* Shouldn't be called with any other values */
> +        abort();
> +    }
> +}
> +

Rename this to qemu_rdma_load_hook()

>   static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
> -                                        uint64_t flags)
> +                                        uint64_t flags, void *data)
>   {
>       QEMUFileRDMA *rfile = opaque;
>       RDMAContext *rdma = rfile->rdma;
> @@ -3145,7 +3161,7 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
>    * First, flush writes, if any.
>    */
>   static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
> -                                       uint64_t flags)
> +                                       uint64_t flags, void *data)
>   {
>       Error *local_err = NULL, **errp = &local_err;
>       QEMUFileRDMA *rfile = opaque;
> @@ -3267,7 +3283,7 @@ static const QEMUFileOps rdma_read_ops = {
>       .get_buffer    = qemu_rdma_get_buffer,
>       .get_fd        = qemu_rdma_get_fd,
>       .close         = qemu_rdma_close,
> -    .hook_ram_load = qemu_rdma_registration_handle,
> +    .hook_ram_load = rdma_load_hook,
>   };
>
>   static const QEMUFileOps rdma_write_ops = {
> diff --git a/trace-events b/trace-events
> index 07f15da..baf8647 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1416,7 +1416,7 @@ qemu_rdma_registration_handle_register_rkey(int rkey) "%x"
>   qemu_rdma_registration_handle_unregister(int requests) "%d requests"
>   qemu_rdma_registration_handle_unregister_loop(int count, int index, uint64_t chunk) "Unregistration request (%d): index %d, chunk %" PRIu64
>   qemu_rdma_registration_handle_unregister_success(uint64_t chunk) "%" PRIu64
> -qemu_rdma_registration_handle_wait(uint64_t flags) "Waiting for next request %" PRIu64
> +qemu_rdma_registration_handle_wait(void) ""
>   qemu_rdma_registration_start(uint64_t flags) "%" PRIu64
>   qemu_rdma_registration_stop(uint64_t flags) "%" PRIu64
>   qemu_rdma_registration_stop_ram(void) ""

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 06/10] Remove unneeded memset
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 06/10] Remove unneeded memset Dr. David Alan Gilbert (git)
@ 2015-05-19 18:35   ` Michael R. Hines
  0 siblings, 0 replies; 31+ messages in thread
From: Michael R. Hines @ 2015-05-19 18:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: amit.shah, arei.gonglei, mrhines, quintela

On 04/20/2015 10:57 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   migration/rdma.c | 1 -
>   1 file changed, 1 deletion(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index e43fae4..4f7dd0d 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2469,7 +2469,6 @@ static void *qemu_rdma_data_init(const char *host_port, Error **errp)
>
>       if (host_port) {
>           rdma = g_malloc0(sizeof(RDMAContext));
> -        memset(rdma, 0, sizeof(RDMAContext));
>           rdma->current_index = -1;
>           rdma->current_chunk = -1;
>
Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 07/10] Simplify rdma_delete_block and remove it's dependence on the hash
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 07/10] Simplify rdma_delete_block and remove it's dependence on the hash Dr. David Alan Gilbert (git)
@ 2015-05-19 18:44   ` Michael R. Hines
  0 siblings, 0 replies; 31+ messages in thread
From: Michael R. Hines @ 2015-05-19 18:44 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: amit.shah, arei.gonglei, mrhines, quintela

On 04/20/2015 10:57 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> rdma_delete_block is currently very general, but it's only used
> in cleanup at the end.   Simplify it and remove it's dependence
> on the hash table and remove all of the hash-table regeneration
> designed to handle the (unused) case of deleting an arbitrary block.

Can we not delete this? I have a patch to QEMU that allows
us to perform RDMA transfers of arbitrary regions of memory
in QEMU at anytime in the migration process (you might guess
for the purposes of fault tolerance). This function will in fact
get called more often in the future ---- particularly if we want
to allow other subsystems, such as storage to register regions
of memory to be transferred.

> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   migration/rdma.c | 57 +++++++++-----------------------------------------------
>   trace-events     |  2 +-
>   2 files changed, 10 insertions(+), 49 deletions(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 4f7dd0d..fe3b76e 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -617,16 +617,11 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
>       return 0;
>   }
>
> -static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
> +static int rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
>   {
> -    RDMALocalBlocks *local = &rdma->local_ram_blocks;
> -    RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
> -        (void *) block_offset);
> -    RDMALocalBlock *old = local->block;
> -    int x;
> -
> -    assert(block);
> -
> +    if (rdma->blockmap) {
> +        g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)block->offset);
> +    }
>       if (block->pmr) {
>           int j;
>
> @@ -656,51 +651,15 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
>       g_free(block->remote_keys);
>       block->remote_keys = NULL;
>
> -    for (x = 0; x < local->nb_blocks; x++) {
> -        g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)old[x].offset);
> -    }
> -
>       g_free(block->block_name);
>       block->block_name = NULL;
>
> -    if (local->nb_blocks > 1) {
> -
> -        local->block = g_malloc0(sizeof(RDMALocalBlock) *
> -                                    (local->nb_blocks - 1));
> -
> -        if (block->index) {
> -            memcpy(local->block, old, sizeof(RDMALocalBlock) * block->index);
> -        }
> -
> -        if (block->index < (local->nb_blocks - 1)) {
> -            memcpy(local->block + block->index, old + (block->index + 1),
> -                sizeof(RDMALocalBlock) *
> -                    (local->nb_blocks - (block->index + 1)));
> -        }
> -    } else {
> -        assert(block == local->block);
> -        local->block = NULL;
> -    }
> -
> -    trace_rdma_delete_block(local->nb_blocks,
> -                           (uintptr_t)block->local_host_addr,
> +    trace_rdma_delete_block(block, (uintptr_t)block->local_host_addr,
>                              block->offset, block->length,
>                               (uintptr_t)(block->local_host_addr + block->length),
>                              BITS_TO_LONGS(block->nb_chunks) *
>                                  sizeof(unsigned long) * 8, block->nb_chunks);
>
> -    g_free(old);
> -
> -    local->nb_blocks--;
> -
> -    if (local->nb_blocks) {
> -        for (x = 0; x < local->nb_blocks; x++) {
> -            g_hash_table_insert(rdma->blockmap,
> -                                (void *)(uintptr_t)local->block[x].offset,
> -                                &local->block[x]);
> -        }
> -    }
> -
>       return 0;
>   }
>
> @@ -2213,9 +2172,11 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>       }
>
>       if (rdma->local_ram_blocks.block) {
> -        while (rdma->local_ram_blocks.nb_blocks) {
> -            rdma_delete_block(rdma, rdma->local_ram_blocks.block->offset);
> +        for (idx = 0; idx < rdma->local_ram_blocks.nb_blocks; idx++) {
> +            rdma_delete_block(rdma, &rdma->local_ram_blocks.block[idx]);
>           }
> +        g_free(rdma->local_ram_blocks.block);
> +        rdma->local_ram_blocks.block = NULL;
>       }
>
>       if (rdma->qp) {
> diff --git a/trace-events b/trace-events
> index baf8647..bdb0868 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1436,7 +1436,7 @@ qemu_rdma_write_one_sendreg(uint64_t chunk, int len, int index, int64_t offset)
>   qemu_rdma_write_one_top(uint64_t chunks, uint64_t size) "Writing %" PRIu64 " chunks, (%" PRIu64 " MB)"
>   qemu_rdma_write_one_zero(uint64_t chunk, int len, int index, int64_t offset) "Entire chunk is zero, sending compress: %" PRIu64 " for %d bytes, index: %d, offset: %" PRId64
>   rdma_add_block(const char *block_name, int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Added Block: '%s':%d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
> -rdma_delete_block(int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Deleted Block: %d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
> +rdma_delete_block(void *block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Deleted Block: %p, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
>   rdma_start_incoming_migration(void) ""
>   rdma_start_incoming_migration_after_dest_init(void) ""
>   rdma_start_incoming_migration_after_rdma_listen(void) ""

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 04/10] Translate offsets to destination address space
  2015-05-19 18:28   ` Michael R. Hines
@ 2015-05-19 18:44     ` Dr. David Alan Gilbert
  2015-05-19 18:57       ` Michael R. Hines
  0 siblings, 1 reply; 31+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-19 18:44 UTC (permalink / raw)
  To: Michael R. Hines; +Cc: amit.shah, quintela, arei.gonglei, qemu-devel, mrhines

* Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
> On 04/20/2015 10:57 AM, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> >The 'offset' field in RDMACompress and 'current_addr' field
> >in RDMARegister are commented as being offsets within a particular
> >RAMBlock, however they appear to actually be offsets within the
> >ram_addr_t space.
> >
> >The code currently assumes that the offsets on the source/destination
> >match, this change removes the need for the assumption for these
> >structures by translating the addresses into the ram_addr_t space of
> >the destination host.
> 
> I don't understand fully: If the offsets are not the same, then
> why would the RAMBlocks be the same? If a RAMBlock
> is hot-plugged on one side, shouldn't an identical one be
> hotplugged on the other side, including the offset into ram_addr_t?

If a RAMBlock is hotplugged on the source, it's normally passed in on
the command line at startup on the destination, not hotplugged.

This difference in order of allocation of the RAMBlocks can cause
the allocation of space in ram_addr_t to be different.

In addition changes between qemu versions as to the order in which
devices are initialised can cause differences in the allocation in ram_addr_t.

Indeed the allocation algorithm isn't very deterministic, I think it looks
for the smallest gap that will fit for the allocation.
Also, lets say that you hot plugged 6 devices, and hot unplugged 5,
then migrated;  on the destination you only see one of them, not the history
of the other allocations that had to be performed.

> (Even if the base address of the ram_addr_t space is different
> between source and destination, then at least the offsets
> and list of blocks should be the same, no?)

Neither the offsets nor the order is deterministic (as mentioned above);
only the naming.

> Is hotplugging an asynchronous operation for post-copy or
> something?

Postcopy doesn't change that; just don't try hotplugging during the
migration.

Dave



> 
> >
> >Note: An alternative would be to change the fields to actually
> >take the data they're commented for; this would potentially be
> >simpler but would break stream compatibility for those cases
> >that currently work.
> >
> >Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >---
> >  migration/rdma.c | 31 ++++++++++++++++++++++++-------
> >  1 file changed, 24 insertions(+), 7 deletions(-)
> >
> >diff --git a/migration/rdma.c b/migration/rdma.c
> >index c3814c5..2c0d11b 100644
> >--- a/migration/rdma.c
> >+++ b/migration/rdma.c
> >@@ -411,7 +411,7 @@ static void network_to_control(RDMAControlHeader *control)
> >   */
> >  typedef struct QEMU_PACKED {
> >      union QEMU_PACKED {
> >-        uint64_t current_addr;  /* offset into the ramblock of the chunk */
> >+        uint64_t current_addr;  /* offset into the ram_addr_t space */
> >          uint64_t chunk;         /* chunk to lookup if unregistering */
> >      } key;
> >      uint32_t current_index; /* which ramblock the chunk belongs to */
> >@@ -419,8 +419,19 @@ typedef struct QEMU_PACKED {
> >      uint64_t chunks;            /* how many sequential chunks to register */
> >  } RDMARegister;
> 
> The below seems OK, but I would prefer not to do this translation here.
> Can the source and destination apply the offset calculations outside
> of the byte-order functions? Like, before register_to_network, the
> source removes the offset, and then when the message is received,
> the destination then again re-applies the correct offset?
> 
> >-static void register_to_network(RDMARegister *reg)
> >+static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
> >  {
> >+    RDMALocalBlock *local_block;
> >+    local_block  = &rdma->local_ram_blocks.block[reg->current_index];
> >+
> >+    if (local_block->is_ram_block) {
> >+        /*
> >+         * current_addr as passed in is an address in the local ram_addr_t
> >+         * space, we need to translate this for the destination
> >+         */
> >+        reg->key.current_addr -= local_block->offset;
> >+        reg->key.current_addr += rdma->dest_blocks[reg->current_index].offset;
> >+    }
> >      reg->key.current_addr = htonll(reg->key.current_addr);
> >      reg->current_index = htonl(reg->current_index);
> >      reg->chunks = htonll(reg->chunks);
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 03/10] Store block name in local blocks structure
  2015-05-19 18:00   ` Michael R. Hines
@ 2015-05-19 18:46     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 31+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-19 18:46 UTC (permalink / raw)
  To: Michael R. Hines; +Cc: amit.shah, quintela, arei.gonglei, qemu-devel, mrhines

* Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
> On 04/20/2015 10:57 AM, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> >In a later patch the block name will be used to match up two views
> >of the block list.  Keep a copy of the block name with the local block
> >list.
> >
> >(At some point it could be argued that it would be best just to let
> >migration see the innards of RAMBlock and avoid the need to use
> >foreach).
> 
> Maybe a silly question, but is there anything to enforce that
> the string names are always unique?

Good question...hmm; exec.c's qemu_ram_set_idstr looks like it has
a check:

    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
        if (block != new_block && !strcmp(block->idstr, new_block->idstr)) {
            fprintf(stderr, "RAMBlock \"%s\" already registered, abort!\n",
                    new_block->idstr);
            abort();
        }
    }

Dave

> 
> >Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >---
> >  migration/rdma.c | 35 +++++++++++++++++++++--------------
> >  trace-events     |  2 +-
> >  2 files changed, 22 insertions(+), 15 deletions(-)
> >
> >diff --git a/migration/rdma.c b/migration/rdma.c
> >index 38e5f44..c3814c5 100644
> >--- a/migration/rdma.c
> >+++ b/migration/rdma.c
> >@@ -214,17 +214,18 @@ static void network_to_caps(RDMACapabilities *cap)
> >   * the information. It's small anyway, so a list is overkill.
> >   */
> >  typedef struct RDMALocalBlock {
> >-    uint8_t  *local_host_addr; /* local virtual address */
> >-    uint64_t remote_host_addr; /* remote virtual address */
> >-    uint64_t offset;
> >-    uint64_t length;
> >-    struct   ibv_mr **pmr;     /* MRs for chunk-level registration */
> >-    struct   ibv_mr *mr;       /* MR for non-chunk-level registration */
> >-    uint32_t *remote_keys;     /* rkeys for chunk-level registration */
> >-    uint32_t remote_rkey;      /* rkeys for non-chunk-level registration */
> >-    int      index;            /* which block are we */
> >-    bool     is_ram_block;
> >-    int      nb_chunks;
> >+    char          *block_name;
> >+    uint8_t       *local_host_addr; /* local virtual address */
> >+    uint64_t       remote_host_addr; /* remote virtual address */
> >+    uint64_t       offset;
> >+    uint64_t       length;
> >+    struct         ibv_mr **pmr;    /* MRs for chunk-level registration */
> >+    struct         ibv_mr *mr;      /* MR for non-chunk-level registration */
> >+    uint32_t      *remote_keys;     /* rkeys for chunk-level registration */
> >+    uint32_t       remote_rkey;     /* rkeys for non-chunk-level registration */
> >+    int            index;           /* which block are we */
> >+    bool           is_ram_block;
> >+    int            nb_chunks;
> >      unsigned long *transit_bitmap;
> >      unsigned long *unregister_bitmap;
> >  } RDMALocalBlock;
> >@@ -510,7 +511,8 @@ static inline uint8_t *ram_chunk_end(const RDMALocalBlock *rdma_ram_block,
> >      return result;
> >  }
> >
> >-static int rdma_add_block(RDMAContext *rdma, void *host_addr,
> >+static int rdma_add_block(RDMAContext *rdma, const char *block_name,
> >+                         void *host_addr,
> >                           ram_addr_t block_offset, uint64_t length)
> >  {
> >      RDMALocalBlocks *local = &rdma->local_ram_blocks;
> >@@ -538,6 +540,7 @@ static int rdma_add_block(RDMAContext *rdma, void *host_addr,
> >
> >      block = &local->block[local->nb_blocks];
> >
> >+    block->block_name = g_strdup(block_name);
> >      block->local_host_addr = host_addr;
> >      block->offset = block_offset;
> >      block->length = length;
> >@@ -553,7 +556,8 @@ static int rdma_add_block(RDMAContext *rdma, void *host_addr,
> >
> >      g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
> >
> >-    trace_rdma_add_block(local->nb_blocks, (uintptr_t) block->local_host_addr,
> >+    trace_rdma_add_block(block_name, local->nb_blocks,
> >+                         (uintptr_t) block->local_host_addr,
> >                           block->offset, block->length,
> >                           (uintptr_t) (block->local_host_addr + block->length),
> >                           BITS_TO_LONGS(block->nb_chunks) *
> >@@ -573,7 +577,7 @@ static int rdma_add_block(RDMAContext *rdma, void *host_addr,
> >  static int qemu_rdma_init_one_block(const char *block_name, void *host_addr,
> >      ram_addr_t block_offset, ram_addr_t length, void *opaque)
> >  {
> >-    return rdma_add_block(opaque, host_addr, block_offset, length);
> >+    return rdma_add_block(opaque, block_name, host_addr, block_offset, length);
> >  }
> >
> >  /*
> >@@ -639,6 +643,9 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
> >          g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)old[x].offset);
> >      }
> >
> >+    g_free(block->block_name);
> >+    block->block_name = NULL;
> >+
> >      if (local->nb_blocks > 1) {
> >
> >          local->block = g_malloc0(sizeof(RDMALocalBlock) *
> >diff --git a/trace-events b/trace-events
> >index 30eba92..07f15da 100644
> >--- a/trace-events
> >+++ b/trace-events
> >@@ -1435,7 +1435,7 @@ qemu_rdma_write_one_recvregres(int mykey, int theirkey, uint64_t chunk) "Receive
> >  qemu_rdma_write_one_sendreg(uint64_t chunk, int len, int index, int64_t offset) "Sending registration request chunk %" PRIu64 " for %d bytes, index: %d, offset: %" PRId64
> >  qemu_rdma_write_one_top(uint64_t chunks, uint64_t size) "Writing %" PRIu64 " chunks, (%" PRIu64 " MB)"
> >  qemu_rdma_write_one_zero(uint64_t chunk, int len, int index, int64_t offset) "Entire chunk is zero, sending compress: %" PRIu64 " for %d bytes, index: %d, offset: %" PRId64
> >-rdma_add_block(int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Added Block: %d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
> >+rdma_add_block(const char *block_name, int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Added Block: '%s':%d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
> >  rdma_delete_block(int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Deleted Block: %d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " chunks %d"
> >  rdma_start_incoming_migration(void) ""
> >  rdma_start_incoming_migration_after_dest_init(void) ""
> 
> Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 08/10] Rework ram block hash
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 08/10] Rework ram block hash Dr. David Alan Gilbert (git)
@ 2015-05-19 18:49   ` Michael R. Hines
  2015-05-19 18:55     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 31+ messages in thread
From: Michael R. Hines @ 2015-05-19 18:49 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: amit.shah, arei.gonglei, mrhines, quintela

On 04/20/2015 10:57 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> RDMA uses a hash from block offset->RAM Block; this isn't needed
> on the destination, and now that the destination sorts the ramblock
> list, is harder to maintain.

Destination sorts the ramblock list? Is the patchset out-of-order?
I didn't see that yet..... Why is it sorted?

I would like to keep the ramblock list directly addressable by hash
on both sides, because, as I mentioned earlier, we want as much
flexibility in registering RAMBlock memory as possible by being
able to add or delete arbitrary blocks int the list at anytime during
a migration.

I will try to get the patchset that allows anyone to register memory
for transfer out as soon as I can.

>
> Split the hash so that it's only generated on the source.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   migration/rdma.c | 32 ++++++++++++++++++++------------
>   1 file changed, 20 insertions(+), 12 deletions(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index fe3b76e..185817e 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -533,23 +533,22 @@ static int rdma_add_block(RDMAContext *rdma, const char *block_name,
>                            ram_addr_t block_offset, uint64_t length)
>   {
>       RDMALocalBlocks *local = &rdma->local_ram_blocks;
> -    RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
> -        (void *)(uintptr_t)block_offset);
> +    RDMALocalBlock *block;
>       RDMALocalBlock *old = local->block;
>
> -    assert(block == NULL);
> -
>       local->block = g_malloc0(sizeof(RDMALocalBlock) * (local->nb_blocks + 1));
>
>       if (local->nb_blocks) {
>           int x;
>
> -        for (x = 0; x < local->nb_blocks; x++) {
> -            g_hash_table_remove(rdma->blockmap,
> -                                (void *)(uintptr_t)old[x].offset);
> -            g_hash_table_insert(rdma->blockmap,
> -                                (void *)(uintptr_t)old[x].offset,
> -                                &local->block[x]);
> +        if (rdma->blockmap) {
> +            for (x = 0; x < local->nb_blocks; x++) {
> +                g_hash_table_remove(rdma->blockmap,
> +                                    (void *)(uintptr_t)old[x].offset);
> +                g_hash_table_insert(rdma->blockmap,
> +                                    (void *)(uintptr_t)old[x].offset,
> +                                    &local->block[x]);
> +            }
>           }
>           memcpy(local->block, old, sizeof(RDMALocalBlock) * local->nb_blocks);
>           g_free(old);
> @@ -571,7 +570,9 @@ static int rdma_add_block(RDMAContext *rdma, const char *block_name,
>
>       block->is_ram_block = local->init ? false : true;
>
> -    g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
> +    if (rdma->blockmap) {
> +        g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
> +    }
>
>       trace_rdma_add_block(block_name, local->nb_blocks,
>                            (uintptr_t) block->local_host_addr,
> @@ -607,7 +608,6 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
>       RDMALocalBlocks *local = &rdma->local_ram_blocks;
>
>       assert(rdma->blockmap == NULL);
> -    rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
>       memset(local, 0, sizeof *local);
>       qemu_ram_foreach_block(qemu_rdma_init_one_block, rdma);
>       trace_qemu_rdma_init_ram_blocks(local->nb_blocks);
> @@ -2248,6 +2248,14 @@ static int qemu_rdma_source_init(RDMAContext *rdma, Error **errp, bool pin_all)
>           goto err_rdma_source_init;
>       }
>
> +    /* Build the hash that maps from offset to RAMBlock */
> +    rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
> +    for (idx = 0; idx < rdma->local_ram_blocks.nb_blocks; idx++) {
> +        g_hash_table_insert(rdma->blockmap,
> +                (void *)(uintptr_t)rdma->local_ram_blocks.block[idx].offset,
> +                &rdma->local_ram_blocks.block[idx]);
> +    }
> +
>       for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
>           ret = qemu_rdma_reg_control(rdma, idx);
>           if (ret) {

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 05/10] Rework ram_control_load_hook to hook during block load
  2015-05-19 18:35   ` Michael R. Hines
@ 2015-05-19 18:49     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 31+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-19 18:49 UTC (permalink / raw)
  To: Michael R. Hines; +Cc: amit.shah, quintela, arei.gonglei, qemu-devel, mrhines

* Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
> On 04/20/2015 10:57 AM, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> >We need the names of RAMBlocks as they're loaded for RDMA,
> >reuse an existing QEMUFile hook with some small mods.
> >
> >Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >---
> >  arch_init.c                   |  4 +++-
> >  include/migration/migration.h |  2 +-
> >  include/migration/qemu-file.h | 14 +++++++++-----
> >  migration/qemu-file.c         |  8 ++++----
> >  migration/rdma.c              | 28 ++++++++++++++++++++++------
> >  trace-events                  |  2 +-
> >  6 files changed, 40 insertions(+), 18 deletions(-)
> >
> >diff --git a/arch_init.c b/arch_init.c
> >index 4c8fcee..6c0b355 100644
> >--- a/arch_init.c
> >+++ b/arch_init.c
> >@@ -1164,6 +1164,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >                                  error_report_err(local_err);
> >                              }
> >                          }
> >+                        ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG,
> >+                                              block->idstr);
> >                          break;
> >                      }
> >                  }
> >@@ -1215,7 +1217,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >              break;
> >          default:
> >              if (flags & RAM_SAVE_FLAG_HOOK) {
> >-                ram_control_load_hook(f, flags);
> >+                ram_control_load_hook(f, RAM_CONTROL_HOOK, NULL);
> >              } else {
> >                  error_report("Unknown combination of migration flags: %#x",
> >                               flags);
> >diff --git a/include/migration/migration.h b/include/migration/migration.h
> >index bf09968..3c2f91a 100644
> >--- a/include/migration/migration.h
> >+++ b/include/migration/migration.h
> >@@ -154,7 +154,7 @@ int64_t xbzrle_cache_resize(int64_t new_size);
> >
> >  void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
> >  void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
> >-void ram_control_load_hook(QEMUFile *f, uint64_t flags);
> >+void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data);
> >
> >  /* Whenever this is found in the data stream, the flags
> >   * will be passed to ram_control_load_hook in the incoming-migration
> >diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
> >index 745a850..a3ceb3b 100644
> >--- a/include/migration/qemu-file.h
> >+++ b/include/migration/qemu-file.h
> >@@ -63,16 +63,20 @@ typedef ssize_t (QEMUFileWritevBufferFunc)(void *opaque, struct iovec *iov,
> >  /*
> >   * This function provides hooks around different
> >   * stages of RAM migration.
> >+ * 'opaque' is the backend specific data in QEMUFile
> >+ * 'data' is call specific data associated with the 'flags' value
> >   */
> 
> What data is this exactly?

In the case of the RAM_CONTROL_BLOCK_REG case we're passing the name of the block.

> >-typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags);
> >+typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags,
> >+                              void *data);
> >
> >  /*
> >   * Constants used by ram_control_* hooks
> >   */
> >-#define RAM_CONTROL_SETUP    0
> >-#define RAM_CONTROL_ROUND    1
> >-#define RAM_CONTROL_HOOK     2
> >-#define RAM_CONTROL_FINISH   3
> >+#define RAM_CONTROL_SETUP     0
> >+#define RAM_CONTROL_ROUND     1
> >+#define RAM_CONTROL_HOOK      2
> >+#define RAM_CONTROL_FINISH    3
> >+#define RAM_CONTROL_BLOCK_REG 4
> >
> >  /*
> >   * This function allows override of where the RAM page
> >diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> >index 1a4f986..4986e05 100644
> >--- a/migration/qemu-file.c
> >+++ b/migration/qemu-file.c
> >@@ -127,7 +127,7 @@ void ram_control_before_iterate(QEMUFile *f, uint64_t flags)
> >      int ret = 0;
> >
> >      if (f->ops->before_ram_iterate) {
> >-        ret = f->ops->before_ram_iterate(f, f->opaque, flags);
> >+        ret = f->ops->before_ram_iterate(f, f->opaque, flags, NULL);
> >          if (ret < 0) {
> >              qemu_file_set_error(f, ret);
> >          }
> >@@ -139,19 +139,19 @@ void ram_control_after_iterate(QEMUFile *f, uint64_t flags)
> >      int ret = 0;
> >
> >      if (f->ops->after_ram_iterate) {
> >-        ret = f->ops->after_ram_iterate(f, f->opaque, flags);
> >+        ret = f->ops->after_ram_iterate(f, f->opaque, flags, NULL);
> >          if (ret < 0) {
> >              qemu_file_set_error(f, ret);
> >          }
> >      }
> >  }
> >
> >-void ram_control_load_hook(QEMUFile *f, uint64_t flags)
> >+void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data)
> >  {
> >      int ret = -EINVAL;
> >
> >      if (f->ops->hook_ram_load) {
> >-        ret = f->ops->hook_ram_load(f, f->opaque, flags);
> >+        ret = f->ops->hook_ram_load(f, f->opaque, flags, data);
> >          if (ret < 0) {
> >              qemu_file_set_error(f, ret);
> >          }
> >diff --git a/migration/rdma.c b/migration/rdma.c
> >index 2c0d11b..e43fae4 100644
> >--- a/migration/rdma.c
> >+++ b/migration/rdma.c
> >@@ -2906,8 +2906,7 @@ err_rdma_dest_wait:
> >   *
> >   * Keep doing this until the source tells us to stop.
> >   */
> >-static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque,
> >-                                         uint64_t flags)
> >+static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
> >  {
> >      RDMAControlHeader reg_resp = { .len = sizeof(RDMARegisterResult),
> >                                 .type = RDMA_CONTROL_REGISTER_RESULT,
> >@@ -2937,7 +2936,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque,
> >      CHECK_ERROR_STATE();
> >
> >      do {
> >-        trace_qemu_rdma_registration_handle_wait(flags);
> >+        trace_qemu_rdma_registration_handle_wait();
> >
> >          ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_NONE);
> >
> >@@ -3125,8 +3124,25 @@ out:
> >      return ret;
> >  }
> >
> >+static int rdma_load_hook(QEMUFile *f, void *opaque, uint64_t flags, void *data)
> >+{
> >+    switch (flags) {
> >+    case RAM_CONTROL_BLOCK_REG:
> >+        /* TODO A later patch */
> >+        return -1;
> >+        break;
> >+
> >+    case RAM_CONTROL_HOOK:
> >+        return qemu_rdma_registration_handle(f, opaque);
> >+
> >+    default:
> >+        /* Shouldn't be called with any other values */
> >+        abort();
> >+    }
> >+}
> >+
> 
> Rename this to qemu_rdma_load_hook()

I've been trying to remove the qemu_'s generally on static
functions - there's no need to uniqueify the names if they're static.

Dave

> >  static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
> >-                                        uint64_t flags)
> >+                                        uint64_t flags, void *data)
> >  {
> >      QEMUFileRDMA *rfile = opaque;
> >      RDMAContext *rdma = rfile->rdma;
> >@@ -3145,7 +3161,7 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
> >   * First, flush writes, if any.
> >   */
> >  static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
> >-                                       uint64_t flags)
> >+                                       uint64_t flags, void *data)
> >  {
> >      Error *local_err = NULL, **errp = &local_err;
> >      QEMUFileRDMA *rfile = opaque;
> >@@ -3267,7 +3283,7 @@ static const QEMUFileOps rdma_read_ops = {
> >      .get_buffer    = qemu_rdma_get_buffer,
> >      .get_fd        = qemu_rdma_get_fd,
> >      .close         = qemu_rdma_close,
> >-    .hook_ram_load = qemu_rdma_registration_handle,
> >+    .hook_ram_load = rdma_load_hook,
> >  };
> >
> >  static const QEMUFileOps rdma_write_ops = {
> >diff --git a/trace-events b/trace-events
> >index 07f15da..baf8647 100644
> >--- a/trace-events
> >+++ b/trace-events
> >@@ -1416,7 +1416,7 @@ qemu_rdma_registration_handle_register_rkey(int rkey) "%x"
> >  qemu_rdma_registration_handle_unregister(int requests) "%d requests"
> >  qemu_rdma_registration_handle_unregister_loop(int count, int index, uint64_t chunk) "Unregistration request (%d): index %d, chunk %" PRIu64
> >  qemu_rdma_registration_handle_unregister_success(uint64_t chunk) "%" PRIu64
> >-qemu_rdma_registration_handle_wait(uint64_t flags) "Waiting for next request %" PRIu64
> >+qemu_rdma_registration_handle_wait(void) ""
> >  qemu_rdma_registration_start(uint64_t flags) "%" PRIu64
> >  qemu_rdma_registration_stop(uint64_t flags) "%" PRIu64
> >  qemu_rdma_registration_stop_ram(void) ""
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 09/10] Sort destination RAMBlocks to be the same as the source
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 09/10] Sort destination RAMBlocks to be the same as the source Dr. David Alan Gilbert (git)
@ 2015-05-19 18:51   ` Michael R. Hines
  2015-06-01 11:53     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 31+ messages in thread
From: Michael R. Hines @ 2015-05-19 18:51 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: amit.shah, arei.gonglei, mrhines, quintela

On 04/20/2015 10:57 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Use the order of incoming RAMBlocks from the source to record
> an index number; that then allows us to sort the destination
> local RAMBlock list to match the source.
>
> Now that the RAMBlocks are known to be in the same order, this
> simplifies the RDMA Registration step which previously tried to
> match RAMBlocks based on offset (which isn't guaranteed to match).

OK, so, what's the reason for sorting?

If the offset is not gauranteed to match (based on a new patch
that I assume you have coming), then we need to index into
the hashtable based on something that does match, such
as the name you added or some other key.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 10/10] Sanity check RDMA remote data
  2015-04-20 15:57 ` [Qemu-devel] [PATCH 10/10] Sanity check RDMA remote data Dr. David Alan Gilbert (git)
@ 2015-05-19 18:52   ` Michael R. Hines
  0 siblings, 0 replies; 31+ messages in thread
From: Michael R. Hines @ 2015-05-19 18:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: amit.shah, arei.gonglei, mrhines, quintela

On 04/20/2015 10:57 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Perform some basic (but probably not complete) sanity checking on
> requests from the RDMA source.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   migration/rdma.c | 30 ++++++++++++++++++++++++++++++
>   1 file changed, 30 insertions(+)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 36b78dc..0c9d446 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2940,6 +2940,13 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>               trace_qemu_rdma_registration_handle_compress(comp->length,
>                                                            comp->block_idx,
>                                                            comp->offset);
> +            if (comp->block_idx >= rdma->local_ram_blocks.nb_blocks) {
> +                error_report("rdma: 'compress' bad block index %u (vs %d)",
> +                             (unsigned int)comp->block_idx,
> +                             rdma->local_ram_blocks.nb_blocks);
> +                ret = -EIO;
> +                break;
> +            }
>               block = &(rdma->local_ram_blocks.block[comp->block_idx]);
>
>               host_addr = block->local_host_addr +
> @@ -3028,8 +3035,23 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>                   trace_qemu_rdma_registration_handle_register_loop(count,
>                            reg->current_index, reg->key.current_addr, reg->chunks);
>
> +                if (reg->current_index >= rdma->local_ram_blocks.nb_blocks) {
> +                    error_report("rdma: 'register' bad block index %u (vs %d)",
> +                                 (unsigned int)reg->current_index,
> +                                 rdma->local_ram_blocks.nb_blocks);
> +                    ret = -ENOENT;
> +                    break;
> +                }
>                   block = &(rdma->local_ram_blocks.block[reg->current_index]);
>                   if (block->is_ram_block) {
> +                    if (block->offset > reg->key.current_addr) {
> +                        error_report("rdma: bad register address for block %s"
> +                            " offset: %" PRIx64 " current_addr: %" PRIx64,
> +                            block->block_name, block->offset,
> +                            reg->key.current_addr);
> +                        ret = -ERANGE;
> +                        break;
> +                    }
>                       host_addr = (block->local_host_addr +
>                                   (reg->key.current_addr - block->offset));
>                       chunk = ram_chunk_index(block->local_host_addr,
> @@ -3038,6 +3060,14 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>                       chunk = reg->key.chunk;
>                       host_addr = block->local_host_addr +
>                           (reg->key.chunk * (1UL << RDMA_REG_CHUNK_SHIFT));
> +                    /* Check for particularly bad chunk value */
> +                    if (host_addr < (void *)block->local_host_addr) {
> +                        error_report("rdma: bad chunk for block %s"
> +                            " chunk: %" PRIx64,
> +                            block->block_name, reg->key.chunk);
> +                        ret = -ERANGE;
> +                        break;
> +                    }
>                   }
>                   chunk_start = ram_chunk_start(block, chunk);
>                   chunk_end = ram_chunk_end(block, chunk + reg->chunks);

Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 08/10] Rework ram block hash
  2015-05-19 18:49   ` Michael R. Hines
@ 2015-05-19 18:55     ` Dr. David Alan Gilbert
  2015-05-19 19:02       ` Michael R. Hines
  0 siblings, 1 reply; 31+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-19 18:55 UTC (permalink / raw)
  To: Michael R. Hines; +Cc: amit.shah, quintela, arei.gonglei, qemu-devel, mrhines

* Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
> On 04/20/2015 10:57 AM, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> >RDMA uses a hash from block offset->RAM Block; this isn't needed
> >on the destination, and now that the destination sorts the ramblock
> >list, is harder to maintain.
> 
> Destination sorts the ramblock list? Is the patchset out-of-order?
> I didn't see that yet..... Why is it sorted?

It's the next patch in the list - please see that one.
Since I use an index into the list it made it the easiest thing to index
on.

> I would like to keep the ramblock list directly addressable by hash
> on both sides, because, as I mentioned earlier, we want as much
> flexibility in registering RAMBlock memory as possible by being
> able to add or delete arbitrary blocks int the list at anytime during
> a migration.
> 
> I will try to get the patchset that allows anyone to register memory
> for transfer out as soon as I can.

Hmm OK, I think I can rework that to regenerate the hash; it's a little
difficult without knowing how you're intending to use it.

Dave

> 
> >
> >Split the hash so that it's only generated on the source.
> >
> >Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >---
> >  migration/rdma.c | 32 ++++++++++++++++++++------------
> >  1 file changed, 20 insertions(+), 12 deletions(-)
> >
> >diff --git a/migration/rdma.c b/migration/rdma.c
> >index fe3b76e..185817e 100644
> >--- a/migration/rdma.c
> >+++ b/migration/rdma.c
> >@@ -533,23 +533,22 @@ static int rdma_add_block(RDMAContext *rdma, const char *block_name,
> >                           ram_addr_t block_offset, uint64_t length)
> >  {
> >      RDMALocalBlocks *local = &rdma->local_ram_blocks;
> >-    RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
> >-        (void *)(uintptr_t)block_offset);
> >+    RDMALocalBlock *block;
> >      RDMALocalBlock *old = local->block;
> >
> >-    assert(block == NULL);
> >-
> >      local->block = g_malloc0(sizeof(RDMALocalBlock) * (local->nb_blocks + 1));
> >
> >      if (local->nb_blocks) {
> >          int x;
> >
> >-        for (x = 0; x < local->nb_blocks; x++) {
> >-            g_hash_table_remove(rdma->blockmap,
> >-                                (void *)(uintptr_t)old[x].offset);
> >-            g_hash_table_insert(rdma->blockmap,
> >-                                (void *)(uintptr_t)old[x].offset,
> >-                                &local->block[x]);
> >+        if (rdma->blockmap) {
> >+            for (x = 0; x < local->nb_blocks; x++) {
> >+                g_hash_table_remove(rdma->blockmap,
> >+                                    (void *)(uintptr_t)old[x].offset);
> >+                g_hash_table_insert(rdma->blockmap,
> >+                                    (void *)(uintptr_t)old[x].offset,
> >+                                    &local->block[x]);
> >+            }
> >          }
> >          memcpy(local->block, old, sizeof(RDMALocalBlock) * local->nb_blocks);
> >          g_free(old);
> >@@ -571,7 +570,9 @@ static int rdma_add_block(RDMAContext *rdma, const char *block_name,
> >
> >      block->is_ram_block = local->init ? false : true;
> >
> >-    g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
> >+    if (rdma->blockmap) {
> >+        g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
> >+    }
> >
> >      trace_rdma_add_block(block_name, local->nb_blocks,
> >                           (uintptr_t) block->local_host_addr,
> >@@ -607,7 +608,6 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
> >      RDMALocalBlocks *local = &rdma->local_ram_blocks;
> >
> >      assert(rdma->blockmap == NULL);
> >-    rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
> >      memset(local, 0, sizeof *local);
> >      qemu_ram_foreach_block(qemu_rdma_init_one_block, rdma);
> >      trace_qemu_rdma_init_ram_blocks(local->nb_blocks);
> >@@ -2248,6 +2248,14 @@ static int qemu_rdma_source_init(RDMAContext *rdma, Error **errp, bool pin_all)
> >          goto err_rdma_source_init;
> >      }
> >
> >+    /* Build the hash that maps from offset to RAMBlock */
> >+    rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
> >+    for (idx = 0; idx < rdma->local_ram_blocks.nb_blocks; idx++) {
> >+        g_hash_table_insert(rdma->blockmap,
> >+                (void *)(uintptr_t)rdma->local_ram_blocks.block[idx].offset,
> >+                &rdma->local_ram_blocks.block[idx]);
> >+    }
> >+
> >      for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
> >          ret = qemu_rdma_reg_control(rdma, idx);
> >          if (ret) {
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 04/10] Translate offsets to destination address space
  2015-05-19 18:44     ` Dr. David Alan Gilbert
@ 2015-05-19 18:57       ` Michael R. Hines
  2015-05-19 19:02         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 31+ messages in thread
From: Michael R. Hines @ 2015-05-19 18:57 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: amit.shah, quintela, arei.gonglei, qemu-devel, mrhines

On 05/19/2015 01:44 PM, Dr. David Alan Gilbert wrote:
> * Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
>> On 04/20/2015 10:57 AM, Dr. David Alan Gilbert (git) wrote:
>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>
>>> The 'offset' field in RDMACompress and 'current_addr' field
>>> in RDMARegister are commented as being offsets within a particular
>>> RAMBlock, however they appear to actually be offsets within the
>>> ram_addr_t space.
>>>
>>> The code currently assumes that the offsets on the source/destination
>>> match, this change removes the need for the assumption for these
>>> structures by translating the addresses into the ram_addr_t space of
>>> the destination host.
>> I don't understand fully: If the offsets are not the same, then
>> why would the RAMBlocks be the same? If a RAMBlock
>> is hot-plugged on one side, shouldn't an identical one be
>> hotplugged on the other side, including the offset into ram_addr_t?
> If a RAMBlock is hotplugged on the source, it's normally passed in on
> the command line at startup on the destination, not hotplugged.
>
> This difference in order of allocation of the RAMBlocks can cause
> the allocation of space in ram_addr_t to be different.
>
> In addition changes between qemu versions as to the order in which
> devices are initialised can cause differences in the allocation in ram_addr_t.
>
> Indeed the allocation algorithm isn't very deterministic, I think it looks
> for the smallest gap that will fit for the allocation.
> Also, lets say that you hot plugged 6 devices, and hot unplugged 5,
> then migrated;  on the destination you only see one of them, not the history
> of the other allocations that had to be performed.

I see ---- so until now, I just got "lucky" that the RAMBlocks were
in the same order with the same offsets =).

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 04/10] Translate offsets to destination address space
  2015-05-19 18:57       ` Michael R. Hines
@ 2015-05-19 19:02         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 31+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-19 19:02 UTC (permalink / raw)
  To: Michael R. Hines; +Cc: amit.shah, arei.gonglei, mrhines, qemu-devel, quintela

* Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
> On 05/19/2015 01:44 PM, Dr. David Alan Gilbert wrote:
> >* Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
> >>On 04/20/2015 10:57 AM, Dr. David Alan Gilbert (git) wrote:
> >>>From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >>>
> >>>The 'offset' field in RDMACompress and 'current_addr' field
> >>>in RDMARegister are commented as being offsets within a particular
> >>>RAMBlock, however they appear to actually be offsets within the
> >>>ram_addr_t space.
> >>>
> >>>The code currently assumes that the offsets on the source/destination
> >>>match, this change removes the need for the assumption for these
> >>>structures by translating the addresses into the ram_addr_t space of
> >>>the destination host.
> >>I don't understand fully: If the offsets are not the same, then
> >>why would the RAMBlocks be the same? If a RAMBlock
> >>is hot-plugged on one side, shouldn't an identical one be
> >>hotplugged on the other side, including the offset into ram_addr_t?
> >If a RAMBlock is hotplugged on the source, it's normally passed in on
> >the command line at startup on the destination, not hotplugged.
> >
> >This difference in order of allocation of the RAMBlocks can cause
> >the allocation of space in ram_addr_t to be different.
> >
> >In addition changes between qemu versions as to the order in which
> >devices are initialised can cause differences in the allocation in ram_addr_t.
> >
> >Indeed the allocation algorithm isn't very deterministic, I think it looks
> >for the smallest gap that will fit for the allocation.
> >Also, lets say that you hot plugged 6 devices, and hot unplugged 5,
> >then migrated;  on the destination you only see one of them, not the history
> >of the other allocations that had to be performed.
> 
> I see ---- so until now, I just got "lucky" that the RAMBlocks were
> in the same order with the same offsets =).

Yep.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 08/10] Rework ram block hash
  2015-05-19 18:55     ` Dr. David Alan Gilbert
@ 2015-05-19 19:02       ` Michael R. Hines
  2015-05-19 19:07         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 31+ messages in thread
From: Michael R. Hines @ 2015-05-19 19:02 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: amit.shah, arei.gonglei, mrhines, qemu-devel, quintela

On 05/19/2015 01:55 PM, Dr. David Alan Gilbert wrote:
>> I would like to keep the ramblock list directly addressable by hash
>> on both sides, because, as I mentioned earlier, we want as much
>> flexibility in registering RAMBlock memory as possible by being
>> able to add or delete arbitrary blocks int the list at anytime during
>> a migration.
>>
>> I will try to get the patchset that allows anyone to register memory
>> for transfer out as soon as I can.
> Hmm OK, I think I can rework that to regenerate the hash; it's a little
> difficult without knowing how you're intending to use it.
>
> Dave

We can use the RAMBlock name as a key to the hash, right?

I see a "future" where storage replication also uses RDMA,
(not that I'm volunteering to write it), but I don't want to
lose the ability for arbitrary QEMU callers to be able to
register/unregister ramblocks dynamically.

- Michae

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 08/10] Rework ram block hash
  2015-05-19 19:02       ` Michael R. Hines
@ 2015-05-19 19:07         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 31+ messages in thread
From: Dr. David Alan Gilbert @ 2015-05-19 19:07 UTC (permalink / raw)
  To: Michael R. Hines; +Cc: amit.shah, quintela, arei.gonglei, mrhines, qemu-devel

* Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
> On 05/19/2015 01:55 PM, Dr. David Alan Gilbert wrote:
> >>I would like to keep the ramblock list directly addressable by hash
> >>on both sides, because, as I mentioned earlier, we want as much
> >>flexibility in registering RAMBlock memory as possible by being
> >>able to add or delete arbitrary blocks int the list at anytime during
> >>a migration.
> >>
> >>I will try to get the patchset that allows anyone to register memory
> >>for transfer out as soon as I can.
> >Hmm OK, I think I can rework that to regenerate the hash; it's a little
> >difficult without knowing how you're intending to use it.
> >
> >Dave
> 
> We can use the RAMBlock name as a key to the hash, right?

Hmm, OK, I need to look at that; I guess some of those pointers are
constant once received.

> I see a "future" where storage replication also uses RDMA,
> (not that I'm volunteering to write it), but I don't want to
> lose the ability for arbitrary QEMU callers to be able to
> register/unregister ramblocks dynamically.

Yes, I'm going to try and wire RDMA up for COLO like you did
for the microcheckpoint work, so I can see the need to keep
it more general.

Let me go away and think about it; I think I can probably
keep the destination side hash, but it was just easier
to do it like this since it was unused.

If you do have code lieing around that uses it, please post
it and cc me and I can try and keep it so it looks like it might
work.

Dave

> - Michae
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 09/10] Sort destination RAMBlocks to be the same as the source
  2015-05-19 18:51   ` Michael R. Hines
@ 2015-06-01 11:53     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 31+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-01 11:53 UTC (permalink / raw)
  To: Michael R. Hines; +Cc: amit.shah, quintela, arei.gonglei, qemu-devel, mrhines

* Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
> On 04/20/2015 10:57 AM, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> >Use the order of incoming RAMBlocks from the source to record
> >an index number; that then allows us to sort the destination
> >local RAMBlock list to match the source.
> >
> >Now that the RAMBlocks are known to be in the same order, this
> >simplifies the RDMA Registration step which previously tried to
> >match RAMBlocks based on offset (which isn't guaranteed to match).
> 
> OK, so, what's the reason for sorting?

I'm worried that the code might already be using the RAMBlock index
assuming they're the same on both sides and not taking care
to look it up in the RAMBlock list sent over from the destination;
sorting them so they are both the same makes it simple and safe.
For example, 'qemu_rdma_write_one' has a 'current_index' which it does:

  RDMALocalBlock *block = &(rdma->local_ram_blocks.block[current_index]);

then that becomes:
                RDMACompress comp = {
                                        .offset = current_addr,
                                        .value = 0,
                                        .block_idx = current_index,
                                        .length = length,
                                    };

and on the destination:

        case RDMA_CONTROL_COMPRESS:
            comp = (RDMACompress *) rdma->wr_data[idx].control_curr;
            network_to_compress(comp);

            trace_qemu_rdma_registration_handle_compress(comp->length,
                                                         comp->block_idx,
                                                         comp->offset);
            block = &(rdma->local_ram_blocks.block[comp->block_idx]);

So I think that 'current_index' value is assumed to be the same
on both sides.

If, I'm right, and that's wrong, then keeping the destination
RAMBlocks in the same order just makes it work and we don't need
to worry how many other places have the same problem.

> If the offset is not gauranteed to match (based on a new patch
> that I assume you have coming), then we need to index into
> the hashtable based on something that does match, such
> as the name you added or some other key.

or just use a matching index that makes life simple.

Dave

> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2015-06-01 11:53 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-20 15:57 [Qemu-devel] [PATCH 00/10] Remove RDMA migration dependence on RAMBlock offset Dr. David Alan Gilbert (git)
2015-04-20 15:57 ` [Qemu-devel] [PATCH 01/10] Rename RDMA structures to make destination clear Dr. David Alan Gilbert (git)
2015-05-19 17:52   ` Michael R. Hines
2015-04-20 15:57 ` [Qemu-devel] [PATCH 02/10] qemu_ram_foreach_block: pass up error value, and down the ramblock name Dr. David Alan Gilbert (git)
2015-05-19 17:56   ` Michael R. Hines
2015-04-20 15:57 ` [Qemu-devel] [PATCH 03/10] Store block name in local blocks structure Dr. David Alan Gilbert (git)
2015-05-19 18:00   ` Michael R. Hines
2015-05-19 18:46     ` Dr. David Alan Gilbert
2015-04-20 15:57 ` [Qemu-devel] [PATCH 04/10] Translate offsets to destination address space Dr. David Alan Gilbert (git)
2015-05-19 18:28   ` Michael R. Hines
2015-05-19 18:44     ` Dr. David Alan Gilbert
2015-05-19 18:57       ` Michael R. Hines
2015-05-19 19:02         ` Dr. David Alan Gilbert
2015-04-20 15:57 ` [Qemu-devel] [PATCH 05/10] Rework ram_control_load_hook to hook during block load Dr. David Alan Gilbert (git)
2015-05-19 18:35   ` Michael R. Hines
2015-05-19 18:49     ` Dr. David Alan Gilbert
2015-04-20 15:57 ` [Qemu-devel] [PATCH 06/10] Remove unneeded memset Dr. David Alan Gilbert (git)
2015-05-19 18:35   ` Michael R. Hines
2015-04-20 15:57 ` [Qemu-devel] [PATCH 07/10] Simplify rdma_delete_block and remove it's dependence on the hash Dr. David Alan Gilbert (git)
2015-05-19 18:44   ` Michael R. Hines
2015-04-20 15:57 ` [Qemu-devel] [PATCH 08/10] Rework ram block hash Dr. David Alan Gilbert (git)
2015-05-19 18:49   ` Michael R. Hines
2015-05-19 18:55     ` Dr. David Alan Gilbert
2015-05-19 19:02       ` Michael R. Hines
2015-05-19 19:07         ` Dr. David Alan Gilbert
2015-04-20 15:57 ` [Qemu-devel] [PATCH 09/10] Sort destination RAMBlocks to be the same as the source Dr. David Alan Gilbert (git)
2015-05-19 18:51   ` Michael R. Hines
2015-06-01 11:53     ` Dr. David Alan Gilbert
2015-04-20 15:57 ` [Qemu-devel] [PATCH 10/10] Sanity check RDMA remote data Dr. David Alan Gilbert (git)
2015-05-19 18:52   ` Michael R. Hines
2015-05-18 22:01 ` [Qemu-devel] [PATCH 00/10] Remove RDMA migration dependence on RAMBlock offset Michael R. Hines

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.