qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/52] migration/rdma: Error handling fixes
@ 2023-09-18 14:41 Markus Armbruster
  2023-09-18 14:41 ` [PATCH 01/52] migration/rdma: Clean up qemu_rdma_poll()'s return type Markus Armbruster
                   ` (53 more replies)
  0 siblings, 54 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Oh dear, where to start.  There's so much wrong, and in pretty obvious
ways.  This code should never have passed review.  I'm refraining from
saying more; see the commit messages instead.

Issues remaining after this series include:

* Terrible error messages

* Some error message cascades remain

* There is no written contract for QEMUFileHooks, and the
  responsibility for reporting errors is unclear

* There seem to be no tests whatsoever

Related: [PATCH 1/7] migration/rdma: Fix save_page method to fail on
polling error

Markus Armbruster (52):
  migration/rdma: Clean up qemu_rdma_poll()'s return type
  migration/rdma: Clean up qemu_rdma_data_init()'s return type
  migration/rdma: Clean up rdma_delete_block()'s return type
  migration/rdma: Drop fragile wr_id formatting
  migration/rdma: Consistently use uint64_t for work request IDs
  migration/rdma: Clean up two more harmless signed vs. unsigned issues
  migration/rdma: Give qio_channel_rdma_source_funcs internal linkage
  migration/rdma: Fix qemu_rdma_accept() to return failure on errors
  migration/rdma: Put @errp parameter last
  migration/rdma: Eliminate error_propagate()
  migration/rdma: Drop rdma_add_block() error handling
  migration/rdma: Drop qemu_rdma_search_ram_block() error handling
  migration/rdma: Make qemu_rdma_buffer_mergable() return bool
  migration/rdma: Use bool for two RDMAContext flags
  migration/rdma: Ditch useless numeric error codes in error messages
  migration/rdma: Fix io_writev(), io_readv() methods to obey contract
  migration/rdma: Replace dangerous macro CHECK_ERROR_STATE()
  migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error
  migration/rdma: Fix qemu_get_cm_event_timeout() to always set error
  migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port
  migration/rdma: Fix QEMUFileHooks method return values
  migration/rdma: Fix rdma_getaddrinfo() error checking
  migration/rdma: Clean up qemu_rdma_wait_comp_channel()'s error value
  migration/rdma: Return -1 instead of negative errno code
  migration/rdma: Dumb down remaining int error values to -1
  migration/rdma: Replace int error_state by bool errored
  migration/rdma: Drop superfluous assignments to @ret
  migration/rdma: Check negative error values the same way everywhere
  migration/rdma: Plug a memory leak and improve a message
  migration/rdma: Delete inappropriate error_report() in macro ERROR()
  migration/rdma: Retire macro ERROR()
  migration/rdma: Fix error handling around rdma_getaddrinfo()
  migration/rdma: Drop "@errp is clear" guards around error_setg()
  migration/rdma: Convert qemu_rdma_exchange_recv() to Error
  migration/rdma: Convert qemu_rdma_exchange_send() to Error
  migration/rdma: Convert qemu_rdma_exchange_get_response() to Error
  migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() to Error
  migration/rdma: Convert qemu_rdma_write_flush() to Error
  migration/rdma: Convert qemu_rdma_write_one() to Error
  migration/rdma: Convert qemu_rdma_write() to Error
  migration/rdma: Convert qemu_rdma_post_send_control() to Error
  migration/rdma: Convert qemu_rdma_post_recv_control() to Error
  migration/rdma: Convert qemu_rdma_alloc_pd_cq() to Error
  migration/rdma: Silence qemu_rdma_resolve_host()
  migration/rdma: Silence qemu_rdma_connect()
  migration/rdma: Silence qemu_rdma_reg_control()
  migration/rdma: Don't report received completion events as error
  migration/rdma: Silence qemu_rdma_block_for_wrid()
  migration/rdma: Silence qemu_rdma_register_and_get_keys()
  migration/rdma: Silence qemu_rdma_cleanup()
  migration/rdma: Use error_report() & friends instead of stderr
  migration/rdma: Fix how we show device details on open

 migration/rdma.c       | 977 ++++++++++++++++++++---------------------
 migration/trace-events |   8 +-
 2 files changed, 487 insertions(+), 498 deletions(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 174+ messages in thread

* [PATCH 01/52] migration/rdma: Clean up qemu_rdma_poll()'s return type
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-18 16:50   ` Fabiano Rosas
  2023-09-21  8:52   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 02/52] migration/rdma: Clean up qemu_rdma_data_init()'s " Markus Armbruster
                   ` (52 subsequent siblings)
  53 siblings, 2 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

qemu_rdma_poll()'s return type is uint64_t, even though it returns 0,
-1, or @ret, which is int.  Its callers assign the return value to int
variables, then check whether it's negative.  Unclean.

Return int instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index ca430d319d..544d83be7e 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1469,8 +1469,8 @@ static uint64_t qemu_rdma_make_wrid(uint64_t wr_id, uint64_t index,
  * (of any kind) has completed.
  * Return the work request ID that completed.
  */
-static uint64_t qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
-                               uint64_t *wr_id_out, uint32_t *byte_len)
+static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
+                          uint64_t *wr_id_out, uint32_t *byte_len)
 {
     int ret;
     struct ibv_wc wc;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 02/52] migration/rdma: Clean up qemu_rdma_data_init()'s return type
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
  2023-09-18 14:41 ` [PATCH 01/52] migration/rdma: Clean up qemu_rdma_poll()'s return type Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-18 16:51   ` Fabiano Rosas
  2023-09-21  8:52   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 03/52] migration/rdma: Clean up rdma_delete_block()'s " Markus Armbruster
                   ` (51 subsequent siblings)
  53 siblings, 2 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

qemu_rdma_data_init() return type is void *.  It actually returns
RDMAContext *, and all its callers assign the value to an
RDMAContext *.  Unclean.

Return RDMAContext * instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 544d83be7e..3b9e21f8de 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2739,7 +2739,7 @@ static void qemu_rdma_return_path_dest_init(RDMAContext *rdma_return_path,
     rdma_return_path->is_return_path = true;
 }
 
-static void *qemu_rdma_data_init(const char *host_port, Error **errp)
+static RDMAContext *qemu_rdma_data_init(const char *host_port, Error **errp)
 {
     RDMAContext *rdma = NULL;
     InetSocketAddress *addr;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 03/52] migration/rdma: Clean up rdma_delete_block()'s return type
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
  2023-09-18 14:41 ` [PATCH 01/52] migration/rdma: Clean up qemu_rdma_poll()'s return type Markus Armbruster
  2023-09-18 14:41 ` [PATCH 02/52] migration/rdma: Clean up qemu_rdma_data_init()'s " Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-18 16:53   ` Fabiano Rosas
  2023-09-21  8:53   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 04/52] migration/rdma: Drop fragile wr_id formatting Markus Armbruster
                   ` (50 subsequent siblings)
  53 siblings, 2 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

rdma_delete_block() always returns 0, which its only caller ignores.
Return void instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 3b9e21f8de..320c291a96 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -668,7 +668,7 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
  * Note: If used outside of cleanup, the caller must ensure that the destination
  * block structures are also updated
  */
-static int rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
+static void rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
 {
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
     RDMALocalBlock *old = local->block;
@@ -754,8 +754,6 @@ static int rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
                                 &local->block[x]);
         }
     }
-
-    return 0;
 }
 
 /*
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 04/52] migration/rdma: Drop fragile wr_id formatting
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (2 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 03/52] migration/rdma: Clean up rdma_delete_block()'s " Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-18 17:01   ` Fabiano Rosas
  2023-09-21  8:54   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 05/52] migration/rdma: Consistently use uint64_t for work request IDs Markus Armbruster
                   ` (49 subsequent siblings)
  53 siblings, 2 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

wrid_desc[] uses 4001 pointers to map four integer values to strings.

print_wrid() accesses wrid_desc[] out of bounds when passed a negative
argument.  It returns null for values 2..1999 and 2001..3999.

qemu_rdma_poll() and qemu_rdma_block_for_wrid() print wrid_desc[wr_id]
and passes print_wrid(wr_id) to tracepoints.  Could conceivably crash
trying to format a null string.  I believe access out of bounds is not
possible.

Not worth cleaning up.  Dumb down to show just numeric wr_id.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c       | 32 +++++++-------------------------
 migration/trace-events |  8 ++++----
 2 files changed, 11 insertions(+), 29 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 320c291a96..cda22be3f7 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -133,13 +133,6 @@ enum {
     RDMA_WRID_RECV_CONTROL = 4000,
 };
 
-static const char *wrid_desc[] = {
-    [RDMA_WRID_NONE] = "NONE",
-    [RDMA_WRID_RDMA_WRITE] = "WRITE RDMA",
-    [RDMA_WRID_SEND_CONTROL] = "CONTROL SEND",
-    [RDMA_WRID_RECV_CONTROL] = "CONTROL RECV",
-};
-
 /*
  * Work request IDs for IB SEND messages only (not RDMA writes).
  * This is used by the migration protocol to transmit
@@ -535,7 +528,6 @@ static void network_to_result(RDMARegisterResult *result)
     result->host_addr = ntohll(result->host_addr);
 };
 
-const char *print_wrid(int wrid);
 static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    uint8_t *data, RDMAControlHeader *resp,
                                    int *resp_idx,
@@ -1362,14 +1354,6 @@ static int qemu_rdma_reg_control(RDMAContext *rdma, int idx)
     return -1;
 }
 
-const char *print_wrid(int wrid)
-{
-    if (wrid >= RDMA_WRID_RECV_CONTROL) {
-        return wrid_desc[RDMA_WRID_RECV_CONTROL];
-    }
-    return wrid_desc[wrid];
-}
-
 /*
  * Perform a non-optimized memory unregistration after every transfer
  * for demonstration purposes, only if pin-all is not requested.
@@ -1491,15 +1475,15 @@ static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
     if (wc.status != IBV_WC_SUCCESS) {
         fprintf(stderr, "ibv_poll_cq wc.status=%d %s!\n",
                         wc.status, ibv_wc_status_str(wc.status));
-        fprintf(stderr, "ibv_poll_cq wrid=%s!\n", wrid_desc[wr_id]);
+        fprintf(stderr, "ibv_poll_cq wrid=%" PRIu64 "!\n", wr_id);
 
         return -1;
     }
 
     if (rdma->control_ready_expected &&
         (wr_id >= RDMA_WRID_RECV_CONTROL)) {
-        trace_qemu_rdma_poll_recv(wrid_desc[RDMA_WRID_RECV_CONTROL],
-                  wr_id - RDMA_WRID_RECV_CONTROL, wr_id, rdma->nb_sent);
+        trace_qemu_rdma_poll_recv(wr_id - RDMA_WRID_RECV_CONTROL, wr_id,
+                                  rdma->nb_sent);
         rdma->control_ready_expected = 0;
     }
 
@@ -1510,7 +1494,7 @@ static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
             (wc.wr_id & RDMA_WRID_BLOCK_MASK) >> RDMA_WRID_BLOCK_SHIFT;
         RDMALocalBlock *block = &(rdma->local_ram_blocks.block[index]);
 
-        trace_qemu_rdma_poll_write(print_wrid(wr_id), wr_id, rdma->nb_sent,
+        trace_qemu_rdma_poll_write(wr_id, rdma->nb_sent,
                                    index, chunk, block->local_host_addr,
                                    (void *)(uintptr_t)block->remote_host_addr);
 
@@ -1520,7 +1504,7 @@ static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
             rdma->nb_sent--;
         }
     } else {
-        trace_qemu_rdma_poll_other(print_wrid(wr_id), wr_id, rdma->nb_sent);
+        trace_qemu_rdma_poll_other(wr_id, rdma->nb_sent);
     }
 
     *wr_id_out = wc.wr_id;
@@ -1665,8 +1649,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested,
             break;
         }
         if (wr_id != wrid_requested) {
-            trace_qemu_rdma_block_for_wrid_miss(print_wrid(wrid_requested),
-                       wrid_requested, print_wrid(wr_id), wr_id);
+            trace_qemu_rdma_block_for_wrid_miss(wrid_requested, wr_id);
         }
     }
 
@@ -1705,8 +1688,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested,
                 break;
             }
             if (wr_id != wrid_requested) {
-                trace_qemu_rdma_block_for_wrid_miss(print_wrid(wrid_requested),
-                                   wrid_requested, print_wrid(wr_id), wr_id);
+                trace_qemu_rdma_block_for_wrid_miss(wrid_requested, wr_id);
             }
         }
 
diff --git a/migration/trace-events b/migration/trace-events
index 4666f19325..b78808f28b 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -207,7 +207,7 @@ qemu_rdma_accept_incoming_migration(void) ""
 qemu_rdma_accept_incoming_migration_accepted(void) ""
 qemu_rdma_accept_pin_state(bool pin) "%d"
 qemu_rdma_accept_pin_verbsc(void *verbs) "Verbs context after listen: %p"
-qemu_rdma_block_for_wrid_miss(const char *wcompstr, int wcomp, const char *gcompstr, uint64_t req) "A Wanted wrid %s (%d) but got %s (%" PRIu64 ")"
+qemu_rdma_block_for_wrid_miss(int wcomp, uint64_t req) "A Wanted wrid %d but got %" PRIu64
 qemu_rdma_cleanup_disconnect(void) ""
 qemu_rdma_close(void) ""
 qemu_rdma_connect_pin_all_requested(void) ""
@@ -221,9 +221,9 @@ qemu_rdma_exchange_send_waiting(const char *desc) "Waiting for response %s"
 qemu_rdma_exchange_send_received(const char *desc) "Response %s received."
 qemu_rdma_fill(size_t control_len, size_t size) "RDMA %zd of %zd bytes already in buffer"
 qemu_rdma_init_ram_blocks(int blocks) "Allocated %d local ram block structures"
-qemu_rdma_poll_recv(const char *compstr, int64_t comp, int64_t id, int sent) "completion %s #%" PRId64 " received (%" PRId64 ") left %d"
-qemu_rdma_poll_write(const char *compstr, int64_t comp, int left, uint64_t block, uint64_t chunk, void *local, void *remote) "completions %s (%" PRId64 ") left %d, block %" PRIu64 ", chunk: %" PRIu64 " %p %p"
-qemu_rdma_poll_other(const char *compstr, int64_t comp, int left) "other completion %s (%" PRId64 ") received left %d"
+qemu_rdma_poll_recv(int64_t comp, int64_t id, int sent) "completion %" PRId64 " received (%" PRId64 ") left %d"
+qemu_rdma_poll_write(int64_t comp, int left, uint64_t block, uint64_t chunk, void *local, void *remote) "completions %" PRId64 " left %d, block %" PRIu64 ", chunk: %" PRIu64 " %p %p"
+qemu_rdma_poll_other(int64_t comp, int left) "other completion %" PRId64 " received left %d"
 qemu_rdma_post_send_control(const char *desc) "CONTROL: sending %s.."
 qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %" PRIu64 " bytes @ %p"
 qemu_rdma_register_odp_mr(const char *name) "Try to register On-Demand Paging memory region: %s"
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 05/52] migration/rdma: Consistently use uint64_t for work request IDs
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (3 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 04/52] migration/rdma: Drop fragile wr_id formatting Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-18 17:03   ` Fabiano Rosas
  2023-09-19  5:39   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 06/52] migration/rdma: Clean up two more harmless signed vs. unsigned issues Markus Armbruster
                   ` (48 subsequent siblings)
  53 siblings, 2 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

We use int instead of uint64_t in a few places.  Change them to
uint64_t.

This cleans up a comparison of signed qemu_rdma_block_for_wrid()
parameter @wrid_requested with unsigned @wr_id.  Harmless, because the
actual arguments are non-negative enumeration constants.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c       | 7 ++++---
 migration/trace-events | 8 ++++----
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index cda22be3f7..4328610a4c 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1599,13 +1599,13 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
     return rdma->error_state;
 }
 
-static struct ibv_comp_channel *to_channel(RDMAContext *rdma, int wrid)
+static struct ibv_comp_channel *to_channel(RDMAContext *rdma, uint64_t wrid)
 {
     return wrid < RDMA_WRID_RECV_CONTROL ? rdma->send_comp_channel :
            rdma->recv_comp_channel;
 }
 
-static struct ibv_cq *to_cq(RDMAContext *rdma, int wrid)
+static struct ibv_cq *to_cq(RDMAContext *rdma, uint64_t wrid)
 {
     return wrid < RDMA_WRID_RECV_CONTROL ? rdma->send_cq : rdma->recv_cq;
 }
@@ -1623,7 +1623,8 @@ static struct ibv_cq *to_cq(RDMAContext *rdma, int wrid)
  * completions only need to be recorded, but do not actually
  * need further processing.
  */
-static int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested,
+static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
+                                    uint64_t wrid_requested,
                                     uint32_t *byte_len)
 {
     int num_cq_events = 0, ret = 0;
diff --git a/migration/trace-events b/migration/trace-events
index b78808f28b..d733107ec6 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -207,7 +207,7 @@ qemu_rdma_accept_incoming_migration(void) ""
 qemu_rdma_accept_incoming_migration_accepted(void) ""
 qemu_rdma_accept_pin_state(bool pin) "%d"
 qemu_rdma_accept_pin_verbsc(void *verbs) "Verbs context after listen: %p"
-qemu_rdma_block_for_wrid_miss(int wcomp, uint64_t req) "A Wanted wrid %d but got %" PRIu64
+qemu_rdma_block_for_wrid_miss(uint64_t wcomp, uint64_t req) "A Wanted wrid %" PRIu64 " but got %" PRIu64
 qemu_rdma_cleanup_disconnect(void) ""
 qemu_rdma_close(void) ""
 qemu_rdma_connect_pin_all_requested(void) ""
@@ -221,9 +221,9 @@ qemu_rdma_exchange_send_waiting(const char *desc) "Waiting for response %s"
 qemu_rdma_exchange_send_received(const char *desc) "Response %s received."
 qemu_rdma_fill(size_t control_len, size_t size) "RDMA %zd of %zd bytes already in buffer"
 qemu_rdma_init_ram_blocks(int blocks) "Allocated %d local ram block structures"
-qemu_rdma_poll_recv(int64_t comp, int64_t id, int sent) "completion %" PRId64 " received (%" PRId64 ") left %d"
-qemu_rdma_poll_write(int64_t comp, int left, uint64_t block, uint64_t chunk, void *local, void *remote) "completions %" PRId64 " left %d, block %" PRIu64 ", chunk: %" PRIu64 " %p %p"
-qemu_rdma_poll_other(int64_t comp, int left) "other completion %" PRId64 " received left %d"
+qemu_rdma_poll_recv(uint64_t comp, int64_t id, int sent) "completion %" PRIu64 " received (%" PRId64 ") left %d"
+qemu_rdma_poll_write(uint64_t comp, int left, uint64_t block, uint64_t chunk, void *local, void *remote) "completions %" PRIu64 " left %d, block %" PRIu64 ", chunk: %" PRIu64 " %p %p"
+qemu_rdma_poll_other(uint64_t comp, int left) "other completion %" PRIu64 " received left %d"
 qemu_rdma_post_send_control(const char *desc) "CONTROL: sending %s.."
 qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %" PRIu64 " bytes @ %p"
 qemu_rdma_register_odp_mr(const char *name) "Try to register On-Demand Paging memory region: %s"
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 06/52] migration/rdma: Clean up two more harmless signed vs. unsigned issues
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (4 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 05/52] migration/rdma: Consistently use uint64_t for work request IDs Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-18 17:10   ` Fabiano Rosas
  2023-09-18 14:41 ` [PATCH 07/52] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage Markus Armbruster
                   ` (47 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

qemu_rdma_exchange_get_response() compares int parameter @expecting
with uint32_t head->type.  Actual arguments are non-negative
enumeration constants, RDMAControlHeader uint32_t member type, or
qemu_rdma_exchange_recv() int parameter expecting.  Actual arguments
for the latter are non-negative enumeration constants.  Change both
parameters to uint32_t.

In qio_channel_rdma_readv(), loop control variable @i is ssize_t, and
counts from 0 up to @niov, which is size_t.  Change @i to size_t.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 4328610a4c..e3b8d0506c 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1801,7 +1801,7 @@ static int qemu_rdma_post_recv_control(RDMAContext *rdma, int idx)
  * Block and wait for a RECV control channel message to arrive.
  */
 static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
-                RDMAControlHeader *head, int expecting, int idx)
+                RDMAControlHeader *head, uint32_t expecting, int idx)
 {
     uint32_t byte_len;
     int ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RECV_CONTROL + idx,
@@ -1959,7 +1959,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
  * control-channel message.
  */
 static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
-                                int expecting)
+                                   uint32_t expecting)
 {
     RDMAControlHeader ready = {
                                 .len = 0,
@@ -2851,7 +2851,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
     RDMAContext *rdma;
     RDMAControlHeader head;
     int ret = 0;
-    ssize_t i;
+    size_t i;
     size_t done = 0;
 
     RCU_READ_LOCK_GUARD();
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 07/52] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (5 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 06/52] migration/rdma: Clean up two more harmless signed vs. unsigned issues Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-18 17:11   ` Fabiano Rosas
  2023-09-21  9:00   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 08/52] migration/rdma: Fix qemu_rdma_accept() to return failure on errors Markus Armbruster
                   ` (46 subsequent siblings)
  53 siblings, 2 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index e3b8d0506c..177d73a2ba 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3057,7 +3057,7 @@ qio_channel_rdma_source_finalize(GSource *source)
     object_unref(OBJECT(ssource->rioc));
 }
 
-GSourceFuncs qio_channel_rdma_source_funcs = {
+static GSourceFuncs qio_channel_rdma_source_funcs = {
     qio_channel_rdma_source_prepare,
     qio_channel_rdma_source_check,
     qio_channel_rdma_source_dispatch,
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 08/52] migration/rdma: Fix qemu_rdma_accept() to return failure on errors
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (6 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 07/52] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-18 17:15   ` Fabiano Rosas
  2023-09-21  9:07   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 09/52] migration/rdma: Put @errp parameter last Markus Armbruster
                   ` (45 subsequent siblings)
  53 siblings, 2 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

qemu_rdma_accept() returns 0 in some cases even when it didn't
complete its job due to errors.  Impact is not obvious.  I figure the
caller will soon fail again with a misleading error message.

Fix it to return -1 on any failure.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 177d73a2ba..4339fcc7b2 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3352,6 +3352,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 
     if (cm_event->event != RDMA_CM_EVENT_CONNECT_REQUEST) {
         rdma_ack_cm_event(cm_event);
+        ret = -1;
         goto err_rdma_dest_wait;
     }
 
@@ -3364,6 +3365,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
         rdma_return_path = qemu_rdma_data_init(rdma->host_port, NULL);
         if (rdma_return_path == NULL) {
             rdma_ack_cm_event(cm_event);
+            ret = -1;
             goto err_rdma_dest_wait;
         }
 
@@ -3375,10 +3377,11 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     network_to_caps(&cap);
 
     if (cap.version < 1 || cap.version > RDMA_CONTROL_VERSION_CURRENT) {
-            error_report("Unknown source RDMA version: %d, bailing...",
-                            cap.version);
-            rdma_ack_cm_event(cm_event);
-            goto err_rdma_dest_wait;
+        error_report("Unknown source RDMA version: %d, bailing...",
+                     cap.version);
+        rdma_ack_cm_event(cm_event);
+        ret = -1;
+        goto err_rdma_dest_wait;
     }
 
     /*
@@ -3408,9 +3411,10 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     if (!rdma->verbs) {
         rdma->verbs = verbs;
     } else if (rdma->verbs != verbs) {
-            error_report("ibv context not matching %p, %p!", rdma->verbs,
-                         verbs);
-            goto err_rdma_dest_wait;
+        error_report("ibv context not matching %p, %p!", rdma->verbs,
+                     verbs);
+        ret = -1;
+        goto err_rdma_dest_wait;
     }
 
     qemu_rdma_dump_id("dest_init", verbs);
@@ -3467,6 +3471,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     if (cm_event->event != RDMA_CM_EVENT_ESTABLISHED) {
         error_report("rdma_accept not event established");
         rdma_ack_cm_event(cm_event);
+        ret = -1;
         goto err_rdma_dest_wait;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 09/52] migration/rdma: Put @errp parameter last
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (7 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 08/52] migration/rdma: Fix qemu_rdma_accept() to return failure on errors Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-18 17:17   ` Fabiano Rosas
  2023-09-21  9:08   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 10/52] migration/rdma: Eliminate error_propagate() Markus Armbruster
                   ` (44 subsequent siblings)
  53 siblings, 2 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

include/qapi/error.h demands:

 * - Functions that use Error to report errors have an Error **errp
 *   parameter.  It should be the last parameter, except for functions
 *   taking variable arguments.

qemu_rdma_connect() does not conform.  Clean it up.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 4339fcc7b2..2b40bbcbb0 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2532,7 +2532,8 @@ static int qemu_get_cm_event_timeout(RDMAContext *rdma,
     }
 }
 
-static int qemu_rdma_connect(RDMAContext *rdma, Error **errp, bool return_path)
+static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
+                             Error **errp)
 {
     RDMACapabilities cap = {
                                 .version = RDMA_CONTROL_VERSION_CURRENT,
@@ -4175,7 +4176,7 @@ void rdma_start_outgoing_migration(void *opaque,
     }
 
     trace_rdma_start_outgoing_migration_after_rdma_source_init();
-    ret = qemu_rdma_connect(rdma, errp, false);
+    ret = qemu_rdma_connect(rdma, false, errp);
 
     if (ret) {
         goto err;
@@ -4196,7 +4197,7 @@ void rdma_start_outgoing_migration(void *opaque,
             goto return_path_err;
         }
 
-        ret = qemu_rdma_connect(rdma_return_path, errp, true);
+        ret = qemu_rdma_connect(rdma_return_path, true, errp);
 
         if (ret) {
             goto return_path_err;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 10/52] migration/rdma: Eliminate error_propagate()
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (8 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 09/52] migration/rdma: Put @errp parameter last Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-18 17:20   ` Fabiano Rosas
                     ` (2 more replies)
  2023-09-18 14:41 ` [PATCH 11/52] migration/rdma: Drop rdma_add_block() error handling Markus Armbruster
                   ` (43 subsequent siblings)
  53 siblings, 3 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 2b40bbcbb0..960fff5860 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2445,7 +2445,6 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
 static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
 {
     int ret, idx;
-    Error *local_err = NULL, **temp = &local_err;
 
     /*
      * Will be validated against destination's actual capabilities
@@ -2453,14 +2452,14 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
      */
     rdma->pin_all = pin_all;
 
-    ret = qemu_rdma_resolve_host(rdma, temp);
+    ret = qemu_rdma_resolve_host(rdma, errp);
     if (ret) {
         goto err_rdma_source_init;
     }
 
     ret = qemu_rdma_alloc_pd_cq(rdma);
     if (ret) {
-        ERROR(temp, "rdma migration: error allocating pd and cq! Your mlock()"
+        ERROR(errp, "rdma migration: error allocating pd and cq! Your mlock()"
                     " limits may be too low. Please check $ ulimit -a # and "
                     "search for 'ulimit -l' in the output");
         goto err_rdma_source_init;
@@ -2468,13 +2467,13 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
 
     ret = qemu_rdma_alloc_qp(rdma);
     if (ret) {
-        ERROR(temp, "rdma migration: error allocating qp!");
+        ERROR(errp, "rdma migration: error allocating qp!");
         goto err_rdma_source_init;
     }
 
     ret = qemu_rdma_init_ram_blocks(rdma);
     if (ret) {
-        ERROR(temp, "rdma migration: error initializing ram blocks!");
+        ERROR(errp, "rdma migration: error initializing ram blocks!");
         goto err_rdma_source_init;
     }
 
@@ -2489,7 +2488,7 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
         if (ret) {
-            ERROR(temp, "rdma migration: error registering %d control!",
+            ERROR(errp, "rdma migration: error registering %d control!",
                                                             idx);
             goto err_rdma_source_init;
         }
@@ -2498,7 +2497,6 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
     return 0;
 
 err_rdma_source_init:
-    error_propagate(errp, local_err);
     qemu_rdma_cleanup(rdma);
     return -1;
 }
@@ -4103,7 +4101,6 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
 {
     int ret;
     RDMAContext *rdma;
-    Error *local_err = NULL;
 
     trace_rdma_start_incoming_migration();
 
@@ -4113,13 +4110,12 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
         return;
     }
 
-    rdma = qemu_rdma_data_init(host_port, &local_err);
+    rdma = qemu_rdma_data_init(host_port, errp);
     if (rdma == NULL) {
         goto err;
     }
 
-    ret = qemu_rdma_dest_init(rdma, &local_err);
-
+    ret = qemu_rdma_dest_init(rdma, errp);
     if (ret) {
         goto err;
     }
@@ -4142,7 +4138,6 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
 cleanup_rdma:
     qemu_rdma_cleanup(rdma);
 err:
-    error_propagate(errp, local_err);
     if (rdma) {
         g_free(rdma->host);
         g_free(rdma->host_port);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 11/52] migration/rdma: Drop rdma_add_block() error handling
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (9 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 10/52] migration/rdma: Eliminate error_propagate() Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-18 17:32   ` Fabiano Rosas
  2023-09-21  9:39   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 12/52] migration/rdma: Drop qemu_rdma_search_ram_block() " Markus Armbruster
                   ` (42 subsequent siblings)
  53 siblings, 2 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

rdma_add_block() can't fail.  Return void, and drop the unreachable
error handling.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 30 +++++++++---------------------
 1 file changed, 9 insertions(+), 21 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 960fff5860..2b0f9d52d8 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -559,9 +559,9 @@ static inline uint8_t *ram_chunk_end(const RDMALocalBlock *rdma_ram_block,
     return result;
 }
 
-static int rdma_add_block(RDMAContext *rdma, const char *block_name,
-                         void *host_addr,
-                         ram_addr_t block_offset, uint64_t length)
+static void rdma_add_block(RDMAContext *rdma, const char *block_name,
+                           void *host_addr,
+                           ram_addr_t block_offset, uint64_t length)
 {
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
     RDMALocalBlock *block;
@@ -615,8 +615,6 @@ static int rdma_add_block(RDMAContext *rdma, const char *block_name,
                          block->nb_chunks);
 
     local->nb_blocks++;
-
-    return 0;
 }
 
 /*
@@ -630,7 +628,8 @@ static int qemu_rdma_init_one_block(RAMBlock *rb, void *opaque)
     void *host_addr = qemu_ram_get_host_addr(rb);
     ram_addr_t block_offset = qemu_ram_get_offset(rb);
     ram_addr_t length = qemu_ram_get_used_length(rb);
-    return rdma_add_block(opaque, block_name, host_addr, block_offset, length);
+    rdma_add_block(opaque, block_name, host_addr, block_offset, length);
+    return 0;
 }
 
 /*
@@ -638,7 +637,7 @@ static int qemu_rdma_init_one_block(RAMBlock *rb, void *opaque)
  * identify chunk boundaries inside each RAMBlock and also be referenced
  * during dynamic page registration.
  */
-static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
+static void qemu_rdma_init_ram_blocks(RDMAContext *rdma)
 {
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
     int ret;
@@ -646,14 +645,11 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
     assert(rdma->blockmap == NULL);
     memset(local, 0, sizeof *local);
     ret = foreach_not_ignored_block(qemu_rdma_init_one_block, rdma);
-    if (ret) {
-        return ret;
-    }
+    assert(!ret);
     trace_qemu_rdma_init_ram_blocks(local->nb_blocks);
     rdma->dest_blocks = g_new0(RDMADestBlock,
                                rdma->local_ram_blocks.nb_blocks);
     local->init = true;
-    return 0;
 }
 
 /*
@@ -2471,11 +2467,7 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
         goto err_rdma_source_init;
     }
 
-    ret = qemu_rdma_init_ram_blocks(rdma);
-    if (ret) {
-        ERROR(errp, "rdma migration: error initializing ram blocks!");
-        goto err_rdma_source_init;
-    }
+    qemu_rdma_init_ram_blocks(rdma);
 
     /* Build the hash that maps from offset to RAMBlock */
     rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
@@ -3430,11 +3422,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
         goto err_rdma_dest_wait;
     }
 
-    ret = qemu_rdma_init_ram_blocks(rdma);
-    if (ret) {
-        error_report("rdma migration: error initializing ram blocks!");
-        goto err_rdma_dest_wait;
-    }
+    qemu_rdma_init_ram_blocks(rdma);
 
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 12/52] migration/rdma: Drop qemu_rdma_search_ram_block() error handling
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (10 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 11/52] migration/rdma: Drop rdma_add_block() error handling Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-18 17:35   ` Fabiano Rosas
  2023-09-22  7:50   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 13/52] migration/rdma: Make qemu_rdma_buffer_mergable() return bool Markus Armbruster
                   ` (41 subsequent siblings)
  53 siblings, 2 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

qemu_rdma_search_ram_block() can't fail.  Return void, and drop the
unreachable error handling.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 2b0f9d52d8..98520a42b4 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1234,12 +1234,12 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
  *
  * This search cannot fail or the migration will fail.
  */
-static int qemu_rdma_search_ram_block(RDMAContext *rdma,
-                                      uintptr_t block_offset,
-                                      uint64_t offset,
-                                      uint64_t length,
-                                      uint64_t *block_index,
-                                      uint64_t *chunk_index)
+static void qemu_rdma_search_ram_block(RDMAContext *rdma,
+                                       uintptr_t block_offset,
+                                       uint64_t offset,
+                                       uint64_t length,
+                                       uint64_t *block_index,
+                                       uint64_t *chunk_index)
 {
     uint64_t current_addr = block_offset + offset;
     RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
@@ -1251,8 +1251,6 @@ static int qemu_rdma_search_ram_block(RDMAContext *rdma,
     *block_index = block->index;
     *chunk_index = ram_chunk_index(block->local_host_addr,
                 block->local_host_addr + (current_addr - block->offset));
-
-    return 0;
 }
 
 /*
@@ -2321,12 +2319,8 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
         rdma->current_length = 0;
         rdma->current_addr = current_addr;
 
-        ret = qemu_rdma_search_ram_block(rdma, block_offset,
-                                         offset, len, &index, &chunk);
-        if (ret) {
-            error_report("ram block search failed");
-            return ret;
-        }
+        qemu_rdma_search_ram_block(rdma, block_offset,
+                                   offset, len, &index, &chunk);
         rdma->current_index = index;
         rdma->current_chunk = chunk;
     }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 13/52] migration/rdma: Make qemu_rdma_buffer_mergable() return bool
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (11 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 12/52] migration/rdma: Drop qemu_rdma_search_ram_block() " Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-18 17:36   ` Fabiano Rosas
                     ` (2 more replies)
  2023-09-18 14:41 ` [PATCH 14/52] migration/rdma: Use bool for two RDMAContext flags Markus Armbruster
                   ` (40 subsequent siblings)
  53 siblings, 3 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

qemu_rdma_buffer_mergable() is semantically a predicate.  It returns
int 0 or 1.  Return bool instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 98520a42b4..97715dbd78 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2246,7 +2246,7 @@ static int qemu_rdma_write_flush(QEMUFile *f, RDMAContext *rdma)
     return 0;
 }
 
-static inline int qemu_rdma_buffer_mergable(RDMAContext *rdma,
+static inline bool qemu_rdma_buffer_mergable(RDMAContext *rdma,
                     uint64_t offset, uint64_t len)
 {
     RDMALocalBlock *block;
@@ -2254,11 +2254,11 @@ static inline int qemu_rdma_buffer_mergable(RDMAContext *rdma,
     uint8_t *chunk_end;
 
     if (rdma->current_index < 0) {
-        return 0;
+        return false;
     }
 
     if (rdma->current_chunk < 0) {
-        return 0;
+        return false;
     }
 
     block = &(rdma->local_ram_blocks.block[rdma->current_index]);
@@ -2266,29 +2266,29 @@ static inline int qemu_rdma_buffer_mergable(RDMAContext *rdma,
     chunk_end = ram_chunk_end(block, rdma->current_chunk);
 
     if (rdma->current_length == 0) {
-        return 0;
+        return false;
     }
 
     /*
      * Only merge into chunk sequentially.
      */
     if (offset != (rdma->current_addr + rdma->current_length)) {
-        return 0;
+        return false;
     }
 
     if (offset < block->offset) {
-        return 0;
+        return false;
     }
 
     if ((offset + len) > (block->offset + block->length)) {
-        return 0;
+        return false;
     }
 
     if ((host_addr + len) > chunk_end) {
-        return 0;
+        return false;
     }
 
-    return 1;
+    return true;
 }
 
 /*
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 14/52] migration/rdma: Use bool for two RDMAContext flags
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (12 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 13/52] migration/rdma: Make qemu_rdma_buffer_mergable() return bool Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-18 17:37   ` Fabiano Rosas
  2023-09-22  7:54   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 15/52] migration/rdma: Ditch useless numeric error codes in error messages Markus Armbruster
                   ` (39 subsequent siblings)
  53 siblings, 2 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

@error_reported and @received_error are flags.  The latter is even
assigned bool true.  Change them from int to bool.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 97715dbd78..c02a1c83b2 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -91,7 +91,7 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
             if (!rdma->error_reported) { \
                 error_report("RDMA is in an error state waiting migration" \
                                 " to abort!"); \
-                rdma->error_reported = 1; \
+                rdma->error_reported = true; \
             } \
             return rdma->error_state; \
         } \
@@ -365,8 +365,8 @@ typedef struct RDMAContext {
      * and remember the error state.
      */
     int error_state;
-    int error_reported;
-    int received_error;
+    bool error_reported;
+    bool received_error;
 
     /*
      * Description of ram blocks used throughout the code.
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 15/52] migration/rdma: Ditch useless numeric error codes in error messages
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (13 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 14/52] migration/rdma: Use bool for two RDMAContext flags Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-18 18:47   ` Fabiano Rosas
  2023-09-22  8:44   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 16/52] migration/rdma: Fix io_writev(), io_readv() methods to obey contract Markus Armbruster
                   ` (38 subsequent siblings)
  53 siblings, 2 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Several error messages include numeric error codes returned by failed
functions:

* ibv_poll_cq() returns an unspecified negative value.  Useless.

* rdma_accept and rmda_get_cm_event() return -1.  Useless.

* qemu_rdma_poll() returns either -1 or an unspecified negative
  value.  Useless.

* qemu_rdma_block_for_wrid(), qemu_rdma_write_flush(),
  qemu_rdma_exchange_send(), qemu_rdma_exchange_recv(),
  qemu_rdma_write() return a negative value that may or may not be an
  errno value.  While reporting human-readable errno
  information (which a number is not) can be useful, reporting an
  error code that may or may not be an errno value is useless.

Drop these error codes from the error messages.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index c02a1c83b2..2173cb076f 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1460,7 +1460,7 @@ static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
     }
 
     if (ret < 0) {
-        error_report("ibv_poll_cq return %d", ret);
+        error_report("ibv_poll_cq failed");
         return ret;
     }
 
@@ -2194,7 +2194,7 @@ retry:
         ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL);
         if (ret < 0) {
             error_report("rdma migration: failed to make "
-                         "room in full send queue! %d", ret);
+                         "room in full send queue!");
             return ret;
         }
 
@@ -2770,7 +2770,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
     ret = qemu_rdma_write_flush(f, rdma);
     if (ret < 0) {
         rdma->error_state = ret;
-        error_setg(errp, "qemu_rdma_write_flush returned %d", ret);
+        error_setg(errp, "qemu_rdma_write_flush failed");
         return -1;
     }
 
@@ -2790,7 +2790,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
 
             if (ret < 0) {
                 rdma->error_state = ret;
-                error_setg(errp, "qemu_rdma_exchange_send returned %d", ret);
+                error_setg(errp, "qemu_rdma_exchange_send failed");
                 return -1;
             }
 
@@ -2880,7 +2880,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
 
         if (ret < 0) {
             rdma->error_state = ret;
-            error_setg(errp, "qemu_rdma_exchange_recv returned %d", ret);
+            error_setg(errp, "qemu_rdma_exchange_recv failed");
             return -1;
         }
 
@@ -3222,7 +3222,7 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
      */
     ret = qemu_rdma_write(f, rdma, block_offset, offset, size);
     if (ret < 0) {
-        error_report("rdma migration: write error! %d", ret);
+        error_report("rdma migration: write error");
         goto err;
     }
 
@@ -3249,7 +3249,7 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
         uint64_t wr_id, wr_id_in;
         int ret = qemu_rdma_poll(rdma, rdma->recv_cq, &wr_id_in, NULL);
         if (ret < 0) {
-            error_report("rdma migration: polling error! %d", ret);
+            error_report("rdma migration: polling error");
             goto err;
         }
 
@@ -3264,7 +3264,7 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
         uint64_t wr_id, wr_id_in;
         int ret = qemu_rdma_poll(rdma, rdma->send_cq, &wr_id_in, NULL);
         if (ret < 0) {
-            error_report("rdma migration: polling error! %d", ret);
+            error_report("rdma migration: polling error");
             goto err;
         }
 
@@ -3439,13 +3439,13 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 
     ret = rdma_accept(rdma->cm_id, &conn_param);
     if (ret) {
-        error_report("rdma_accept returns %d", ret);
+        error_report("rdma_accept failed");
         goto err_rdma_dest_wait;
     }
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
     if (ret) {
-        error_report("rdma_accept get_cm_event failed %d", ret);
+        error_report("rdma_accept get_cm_event failed");
         goto err_rdma_dest_wait;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 16/52] migration/rdma: Fix io_writev(), io_readv() methods to obey contract
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (14 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 15/52] migration/rdma: Ditch useless numeric error codes in error messages Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-18 18:57   ` Fabiano Rosas
  2023-09-22  8:59   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 17/52] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE() Markus Armbruster
                   ` (37 subsequent siblings)
  53 siblings, 2 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

QIOChannelClass methods qio_channel_rdma_readv() and
qio_channel_rdma_writev() violate their method contract when
rdma->error_state is non-zero:

1. They return whatever is in rdma->error_state then.  Only -1 will be
   fine.  -2 will be misinterpreted as "would block".  Anything less
   than -2 isn't defined in the contract.  A positive value would be
   misinterpreted as success, but I believe that's not actually
   possible.

2. They neglect to set an error then.  If something up the call stack
   dereferences the error when failure is returned, it will crash.  If
   it ignores the return value and checks the error instead, it will
   miss the error.

Crap like this happens when return statements hide in macros,
especially when their uses are far away from the definition.

I elected not to investigate how callers are impacted.

Expand the two bad macro uses, so we can set an error and return -1.
The next commit will then get rid of the macro altogether.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 2173cb076f..30e6dff875 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2761,7 +2761,11 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
         return -1;
     }
 
-    CHECK_ERROR_STATE();
+    if (rdma->error_state) {
+        error_setg(errp,
+                   "RDMA is in an error state waiting migration to abort!");
+        return -1;
+    }
 
     /*
      * Push out any writes that
@@ -2847,7 +2851,11 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
         return -1;
     }
 
-    CHECK_ERROR_STATE();
+    if (rdma->error_state) {
+        error_setg(errp,
+                   "RDMA is in an error state waiting migration to abort!");
+        return -1;
+    }
 
     for (i = 0; i < niov; i++) {
         size_t want = iov[i].iov_len;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 17/52] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE()
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (15 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 16/52] migration/rdma: Fix io_writev(), io_readv() methods to obey contract Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-18 18:57   ` Fabiano Rosas
  2023-09-22  9:01   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 18/52] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error Markus Armbruster
                   ` (36 subsequent siblings)
  53 siblings, 2 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Hiding return statements in macros is a bad idea.  Use a function
instead, and open code the return part.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 43 +++++++++++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 30e6dff875..be66f53489 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -85,18 +85,6 @@
  */
 static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
 
-#define CHECK_ERROR_STATE() \
-    do { \
-        if (rdma->error_state) { \
-            if (!rdma->error_reported) { \
-                error_report("RDMA is in an error state waiting migration" \
-                                " to abort!"); \
-                rdma->error_reported = true; \
-            } \
-            return rdma->error_state; \
-        } \
-    } while (0)
-
 /*
  * A work request ID is 64-bits and we split up these bits
  * into 3 parts:
@@ -451,6 +439,16 @@ typedef struct QEMU_PACKED {
     uint64_t chunks;            /* how many sequential chunks to register */
 } RDMARegister;
 
+static int check_error_state(RDMAContext *rdma)
+{
+    if (rdma->error_state && !rdma->error_reported) {
+        error_report("RDMA is in an error state waiting migration"
+                     " to abort!");
+        rdma->error_reported = true;
+    }
+    return rdma->error_state;
+}
+
 static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
 {
     RDMALocalBlock *local_block;
@@ -3219,7 +3217,10 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
         return -EIO;
     }
 
-    CHECK_ERROR_STATE();
+    ret = check_error_state(rdma);
+    if (ret) {
+        return ret;
+    }
 
     qemu_fflush(f);
 
@@ -3535,7 +3536,10 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
         return -EIO;
     }
 
-    CHECK_ERROR_STATE();
+    ret = check_error_state(rdma);
+    if (ret) {
+        return ret;
+    }
 
     local = &rdma->local_ram_blocks;
     do {
@@ -3839,6 +3843,7 @@ static int qemu_rdma_registration_start(QEMUFile *f,
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
     RDMAContext *rdma;
+    int ret;
 
     if (migration_in_postcopy()) {
         return 0;
@@ -3850,7 +3855,10 @@ static int qemu_rdma_registration_start(QEMUFile *f,
         return -EIO;
     }
 
-    CHECK_ERROR_STATE();
+    ret = check_error_state(rdma);
+    if (ret) {
+        return ret;
+    }
 
     trace_qemu_rdma_registration_start(flags);
     qemu_put_be64(f, RAM_SAVE_FLAG_HOOK);
@@ -3881,7 +3889,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
         return -EIO;
     }
 
-    CHECK_ERROR_STATE();
+    ret = check_error_state(rdma);
+    if (ret) {
+        return ret;
+    }
 
     qemu_fflush(f);
     ret = qemu_rdma_drain_cq(f, rdma);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 18/52] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (16 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 17/52] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE() Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-18 19:00   ` Fabiano Rosas
  2023-09-22  9:10   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 19/52] migration/rdma: Fix qemu_get_cm_event_timeout() to always " Markus Armbruster
                   ` (35 subsequent siblings)
  53 siblings, 2 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

qemu_rdma_resolve_host() and qemu_rdma_dest_init() try addresses until
they find on that works.  If none works, they return the first Error
set by qemu_rdma_broken_ipv6_kernel(), or else return a generic one.

qemu_rdma_broken_ipv6_kernel() neglects to set an Error when
ibv_open_device() fails.  If a later address fails differently, we use
that Error instead, or else the generic one.  Harmless enough, but
needs fixing all the same.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index be66f53489..08cd186385 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -855,6 +855,8 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
                 if (errno == EPERM) {
                     continue;
                 } else {
+                    error_setg_errno(errp, errno,
+                                     "could not open RDMA device context");
                     return -EINVAL;
                 }
             }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 19/52] migration/rdma: Fix qemu_get_cm_event_timeout() to always set error
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (17 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 18/52] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-19 16:02   ` Peter Xu
  2023-09-22  9:12   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 20/52] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port Markus Armbruster
                   ` (34 subsequent siblings)
  53 siblings, 2 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

qemu_get_cm_event_timeout() neglects to set an error when it fails
because rdma_get_cm_event() fails.  Harmless, as its caller
qemu_rdma_connect() substitutes a generic error then.  Fix it anyway.

qemu_rdma_connect() also sets the generic error when its own call of
rdma_get_cm_event() fails.  Make the error handling more obvious: set
a specific error right after rdma_get_cm_event() fails.  Delete the
generic error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 08cd186385..d3dc162363 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2509,7 +2509,11 @@ static int qemu_get_cm_event_timeout(RDMAContext *rdma,
         ERROR(errp, "failed to poll cm event, errno=%i", errno);
         return -1;
     } else if (poll_fd.revents & POLLIN) {
-        return rdma_get_cm_event(rdma->channel, cm_event);
+        if (rdma_get_cm_event(rdma->channel, cm_event) < 0) {
+            ERROR(errp, "failed to get cm event");
+            return -1;
+        }
+        return 0;
     } else {
         ERROR(errp, "no POLLIN event, revent=%x", poll_fd.revents);
         return -1;
@@ -2559,10 +2563,12 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
         ret = qemu_get_cm_event_timeout(rdma, &cm_event, 5000, errp);
     } else {
         ret = rdma_get_cm_event(rdma->channel, &cm_event);
+        if (ret < 0) {
+            ERROR(errp, "failed to get cm event");
+        }
     }
     if (ret) {
         perror("rdma_get_cm_event after rdma_connect");
-        ERROR(errp, "connecting to destination!");
         goto err_rdma_source_connect;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 20/52] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (18 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 19/52] migration/rdma: Fix qemu_get_cm_event_timeout() to always " Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-19 16:02   ` Peter Xu
  2023-09-18 14:41 ` [PATCH 21/52] migration/rdma: Fix QEMUFileHooks method return values Markus Armbruster
                   ` (33 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

qemu_rdma_data_init() neglects to set an Error when it fails because
@host_port is null.  Fortunately, no caller passes null, so this is
merely a latent bug.  Drop the flawed code handling null argument.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index d3dc162363..cc59155a50 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2716,25 +2716,22 @@ static RDMAContext *qemu_rdma_data_init(const char *host_port, Error **errp)
     RDMAContext *rdma = NULL;
     InetSocketAddress *addr;
 
-    if (host_port) {
-        rdma = g_new0(RDMAContext, 1);
-        rdma->current_index = -1;
-        rdma->current_chunk = -1;
+    rdma = g_new0(RDMAContext, 1);
+    rdma->current_index = -1;
+    rdma->current_chunk = -1;
 
-        addr = g_new(InetSocketAddress, 1);
-        if (!inet_parse(addr, host_port, NULL)) {
-            rdma->port = atoi(addr->port);
-            rdma->host = g_strdup(addr->host);
-            rdma->host_port = g_strdup(host_port);
-        } else {
-            ERROR(errp, "bad RDMA migration address '%s'", host_port);
-            g_free(rdma);
-            rdma = NULL;
-        }
-
-        qapi_free_InetSocketAddress(addr);
+    addr = g_new(InetSocketAddress, 1);
+    if (!inet_parse(addr, host_port, NULL)) {
+        rdma->port = atoi(addr->port);
+        rdma->host = g_strdup(addr->host);
+        rdma->host_port = g_strdup(host_port);
+    } else {
+        ERROR(errp, "bad RDMA migration address '%s'", host_port);
+        g_free(rdma);
+        rdma = NULL;
     }
 
+    qapi_free_InetSocketAddress(addr);
     return rdma;
 }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 21/52] migration/rdma: Fix QEMUFileHooks method return values
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (19 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 20/52] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-25  4:08   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 22/52] migration/rdma: Fix rdma_getaddrinfo() error checking Markus Armbruster
                   ` (32 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

The QEMUFileHooks methods don't come with a written contract.  Digging
through the code calling them, we find:

* save_page():

  Negative values RAM_SAVE_CONTROL_DELAYED and
  RAM_SAVE_CONTROL_NOT_SUPP are special.  Any other negative value is
  an unspecified error.

  qemu_rdma_save_page() returns -EIO or rdma->error_state on error.  I
  believe the latter is always negative.  Nothing stops either of them
  to clash with the special values, though.  Feels unlikely, but fix
  it anyway to return only the special values and -1.

* before_ram_iterate(), after_ram_iterate():

  Negative value means error.  qemu_rdma_registration_start() and
  qemu_rdma_registration_stop() comply as far as I can tell.  Make
  them comply *obviously*, by returning -1 on error.

* hook_ram_load:

  Negative value means error.  rdma_load_hook() already returns -1 on
  error.  Leave it alone.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 79 +++++++++++++++++++++++-------------------------
 1 file changed, 37 insertions(+), 42 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index cc59155a50..46b5859268 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3219,12 +3219,11 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
     rdma = qatomic_rcu_read(&rioc->rdmaout);
 
     if (!rdma) {
-        return -EIO;
+        return -1;
     }
 
-    ret = check_error_state(rdma);
-    if (ret) {
-        return ret;
+    if (check_error_state(rdma)) {
+        return -1;
     }
 
     qemu_fflush(f);
@@ -3290,9 +3289,10 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
     }
 
     return RAM_SAVE_CONTROL_DELAYED;
+
 err:
     rdma->error_state = ret;
-    return ret;
+    return -1;
 }
 
 static void rdma_accept_incoming_migration(void *opaque);
@@ -3538,12 +3538,11 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
     rdma = qatomic_rcu_read(&rioc->rdmain);
 
     if (!rdma) {
-        return -EIO;
+        return -1;
     }
 
-    ret = check_error_state(rdma);
-    if (ret) {
-        return ret;
+    if (check_error_state(rdma)) {
+        return -1;
     }
 
     local = &rdma->local_ram_blocks;
@@ -3576,7 +3575,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                              (unsigned int)comp->block_idx,
                              rdma->local_ram_blocks.nb_blocks);
                 ret = -EIO;
-                goto out;
+                goto err;
             }
             block = &(rdma->local_ram_blocks.block[comp->block_idx]);
 
@@ -3588,7 +3587,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
         case RDMA_CONTROL_REGISTER_FINISHED:
             trace_qemu_rdma_registration_handle_finished();
-            goto out;
+            return 0;
 
         case RDMA_CONTROL_RAM_BLOCKS_REQUEST:
             trace_qemu_rdma_registration_handle_ram_blocks();
@@ -3609,7 +3608,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                 if (ret) {
                     error_report("rdma migration: error dest "
                                     "registering ram blocks");
-                    goto out;
+                    goto err;
                 }
             }
 
@@ -3648,7 +3647,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
             if (ret < 0) {
                 error_report("rdma migration: error sending remote info");
-                goto out;
+                goto err;
             }
 
             break;
@@ -3675,7 +3674,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                                  (unsigned int)reg->current_index,
                                  rdma->local_ram_blocks.nb_blocks);
                     ret = -ENOENT;
-                    goto out;
+                    goto err;
                 }
                 block = &(rdma->local_ram_blocks.block[reg->current_index]);
                 if (block->is_ram_block) {
@@ -3685,7 +3684,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                             block->block_name, block->offset,
                             reg->key.current_addr);
                         ret = -ERANGE;
-                        goto out;
+                        goto err;
                     }
                     host_addr = (block->local_host_addr +
                                 (reg->key.current_addr - block->offset));
@@ -3701,7 +3700,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                             " chunk: %" PRIx64,
                             block->block_name, reg->key.chunk);
                         ret = -ERANGE;
-                        goto out;
+                        goto err;
                     }
                 }
                 chunk_start = ram_chunk_start(block, chunk);
@@ -3713,7 +3712,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                             chunk, chunk_start, chunk_end)) {
                     error_report("cannot get rkey");
                     ret = -EINVAL;
-                    goto out;
+                    goto err;
                 }
                 reg_result->rkey = tmp_rkey;
 
@@ -3730,7 +3729,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
             if (ret < 0) {
                 error_report("Failed to send control buffer");
-                goto out;
+                goto err;
             }
             break;
         case RDMA_CONTROL_UNREGISTER_REQUEST:
@@ -3753,7 +3752,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                 if (ret != 0) {
                     perror("rdma unregistration chunk failed");
                     ret = -ret;
-                    goto out;
+                    goto err;
                 }
 
                 rdma->total_registrations--;
@@ -3766,24 +3765,23 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
             if (ret < 0) {
                 error_report("Failed to send control buffer");
-                goto out;
+                goto err;
             }
             break;
         case RDMA_CONTROL_REGISTER_RESULT:
             error_report("Invalid RESULT message at dest.");
             ret = -EIO;
-            goto out;
+            goto err;
         default:
             error_report("Unknown control message %s", control_desc(head.type));
             ret = -EIO;
-            goto out;
+            goto err;
         }
     } while (1);
-out:
-    if (ret < 0) {
-        rdma->error_state = ret;
-    }
-    return ret;
+
+err:
+    rdma->error_state = ret;
+    return -1;
 }
 
 /* Destination:
@@ -3805,7 +3803,7 @@ rdma_block_notification_handle(QEMUFile *f, const char *name)
     rdma = qatomic_rcu_read(&rioc->rdmain);
 
     if (!rdma) {
-        return -EIO;
+        return -1;
     }
 
     /* Find the matching RAMBlock in our local list */
@@ -3818,7 +3816,7 @@ rdma_block_notification_handle(QEMUFile *f, const char *name)
 
     if (found == -1) {
         error_report("RAMBlock '%s' not found on destination", name);
-        return -ENOENT;
+        return -1;
     }
 
     rdma->local_ram_blocks.block[curr].src_index = rdma->next_src_index;
@@ -3848,7 +3846,6 @@ static int qemu_rdma_registration_start(QEMUFile *f,
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
     RDMAContext *rdma;
-    int ret;
 
     if (migration_in_postcopy()) {
         return 0;
@@ -3857,12 +3854,11 @@ static int qemu_rdma_registration_start(QEMUFile *f,
     RCU_READ_LOCK_GUARD();
     rdma = qatomic_rcu_read(&rioc->rdmaout);
     if (!rdma) {
-        return -EIO;
+        return -1;
     }
 
-    ret = check_error_state(rdma);
-    if (ret) {
-        return ret;
+    if (check_error_state(rdma)) {
+        return -1;
     }
 
     trace_qemu_rdma_registration_start(flags);
@@ -3891,12 +3887,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
     RCU_READ_LOCK_GUARD();
     rdma = qatomic_rcu_read(&rioc->rdmaout);
     if (!rdma) {
-        return -EIO;
+        return -1;
     }
 
-    ret = check_error_state(rdma);
-    if (ret) {
-        return ret;
+    if (check_error_state(rdma)) {
+        return -1;
     }
 
     qemu_fflush(f);
@@ -3927,7 +3922,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
                     qemu_rdma_reg_whole_ram_blocks : NULL);
         if (ret < 0) {
             fprintf(stderr, "receiving remote info!");
-            return ret;
+            return -1;
         }
 
         nb_dest_blocks = resp.len / sizeof(RDMADestBlock);
@@ -3950,7 +3945,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
                     "not identical on both the source and destination.",
                     local->nb_blocks, nb_dest_blocks);
             rdma->error_state = -EINVAL;
-            return -EINVAL;
+            return -1;
         }
 
         qemu_rdma_move_header(rdma, reg_result_idx, &resp);
@@ -3966,7 +3961,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
                         local->block[i].length,
                         rdma->dest_blocks[i].length);
                 rdma->error_state = -EINVAL;
-                return -EINVAL;
+                return -1;
             }
             local->block[i].remote_host_addr =
                     rdma->dest_blocks[i].remote_host_addr;
@@ -3986,7 +3981,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
     return 0;
 err:
     rdma->error_state = ret;
-    return ret;
+    return -1;
 }
 
 static const QEMUFileHooks rdma_read_hooks = {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 22/52] migration/rdma: Fix rdma_getaddrinfo() error checking
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (20 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 21/52] migration/rdma: Fix QEMUFileHooks method return values Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-25  5:21   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 23/52] migration/rdma: Clean up qemu_rdma_wait_comp_channel()'s error value Markus Armbruster
                   ` (31 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

rdma_getaddrinfo() returns 0 on success.  On error, it returns one of
the EAI_ error codes like getaddrinfo() does, or -1 with errno set.
This is broken by design: POSIX implicitly specifies the EAI_ error
codes to be non-zero, no more.  They could clash with -1.  Nothing we
can do about this design flaw.

Both callers of rdma_getaddrinfo() only recognize negative values as
error.  Works only because systems elect to make the EAI_ error codes
negative.

Best not to rely on that: change the callers to treat any non-zero
value as failure.  Also change them to return -1 instead of the value
received from getaddrinfo() on failure, to avoid positive error
values.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 46b5859268..3421ae0796 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -935,14 +935,14 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
 
     if (rdma->host == NULL || !strcmp(rdma->host, "")) {
         ERROR(errp, "RDMA hostname has not been set");
-        return -EINVAL;
+        return -1;
     }
 
     /* create CM channel */
     rdma->channel = rdma_create_event_channel();
     if (!rdma->channel) {
         ERROR(errp, "could not create CM channel");
-        return -EINVAL;
+        return -1;
     }
 
     /* create CM id */
@@ -956,7 +956,7 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
     port_str[15] = '\0';
 
     ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
-    if (ret < 0) {
+    if (ret) {
         ERROR(errp, "could not rdma_getaddrinfo address %s", rdma->host);
         goto err_resolve_get_addr;
     }
@@ -998,7 +998,6 @@ route:
                 rdma_event_str(cm_event->event));
         error_report("rdma_resolve_addr");
         rdma_ack_cm_event(cm_event);
-        ret = -EINVAL;
         goto err_resolve_get_addr;
     }
     rdma_ack_cm_event(cm_event);
@@ -1019,7 +1018,6 @@ route:
         ERROR(errp, "result not equal to event_route_resolved: %s",
                         rdma_event_str(cm_event->event));
         rdma_ack_cm_event(cm_event);
-        ret = -EINVAL;
         goto err_resolve_get_addr;
     }
     rdma_ack_cm_event(cm_event);
@@ -1034,7 +1032,7 @@ err_resolve_get_addr:
 err_resolve_create_id:
     rdma_destroy_event_channel(rdma->channel);
     rdma->channel = NULL;
-    return ret;
+    return -1;
 }
 
 /*
@@ -2644,7 +2642,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
     port_str[15] = '\0';
 
     ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
-    if (ret < 0) {
+    if (ret) {
         ERROR(errp, "could not rdma_getaddrinfo address %s", rdma->host);
         goto err_dest_init_bind_addr;
     }
@@ -2688,7 +2686,7 @@ err_dest_init_create_listen_id:
     rdma_destroy_event_channel(rdma->channel);
     rdma->channel = NULL;
     rdma->error_state = ret;
-    return ret;
+    return -1;
 
 }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 23/52] migration/rdma: Clean up qemu_rdma_wait_comp_channel()'s error value
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (21 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 22/52] migration/rdma: Fix rdma_getaddrinfo() error checking Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-25  5:43   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 24/52] migration/rdma: Return -1 instead of negative errno code Markus Armbruster
                   ` (30 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

qemu_rdma_wait_comp_channel() returns 0 on success, and either -1 or
rdma->error_state on failure.  Callers actually expect a negative
error value.  I believe rdma->error_state can't be positive, but let's
make things more obvious by simply returning -1 on any failure.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 3421ae0796..efbb3c7754 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1588,7 +1588,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
     if (rdma->received_error) {
         return -EPIPE;
     }
-    return rdma->error_state;
+    return -!!rdma->error_state;
 }
 
 static struct ibv_comp_channel *to_channel(RDMAContext *rdma, uint64_t wrid)
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 24/52] migration/rdma: Return -1 instead of negative errno code
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (22 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 23/52] migration/rdma: Clean up qemu_rdma_wait_comp_channel()'s error value Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-26 10:15   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 25/52] migration/rdma: Dumb down remaining int error values to -1 Markus Armbruster
                   ` (29 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Several functions return negative errno codes on failure.  Callers
check for specific codes exactly never.  For some of the functions,
callers couldn't check even if they wanted to, because the functions
also return negative values that aren't errno codes, leaving readers
confused on what the function actually returns.

Clean up and simplify: return -1 instead of negative errno code.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 44 ++++++++++++++++++++++----------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index efbb3c7754..d0af258468 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -857,14 +857,14 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
                 } else {
                     error_setg_errno(errp, errno,
                                      "could not open RDMA device context");
-                    return -EINVAL;
+                    return -1;
                 }
             }
 
             if (ibv_query_port(verbs, 1, &port_attr)) {
                 ibv_close_device(verbs);
                 ERROR(errp, "Could not query initial IB port");
-                return -EINVAL;
+                return -1;
             }
 
             if (port_attr.link_layer == IBV_LINK_LAYER_INFINIBAND) {
@@ -889,7 +889,7 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
                 ERROR(errp, "You only have RoCE / iWARP devices in your systems"
                             " and your management software has specified '[::]'"
                             ", but IPv6 over RoCE / iWARP is not supported in Linux.");
-                return -ENONET;
+                return -1;
             }
         }
 
@@ -905,13 +905,13 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
     /* IB ports start with 1, not 0 */
     if (ibv_query_port(verbs, 1, &port_attr)) {
         ERROR(errp, "Could not query initial IB port");
-        return -EINVAL;
+        return -1;
     }
 
     if (port_attr.link_layer == IBV_LINK_LAYER_ETHERNET) {
         ERROR(errp, "Linux kernel's RoCE / iWARP does not support IPv6 "
                     "(but patches on linux-rdma in progress)");
-        return -ENONET;
+        return -1;
     }
 
 #endif
@@ -1409,7 +1409,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
 
         if (ret != 0) {
             perror("unregistration chunk failed");
-            return -ret;
+            return -1;
         }
         rdma->total_registrations--;
 
@@ -1554,7 +1554,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
                     if (ret) {
                         error_report("failed to get cm event while wait "
                                      "completion channel");
-                        return -EPIPE;
+                        return -1;
                     }
 
                     error_report("receive cm event while wait comp channel,"
@@ -1562,7 +1562,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
                     if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
                         cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
                         rdma_ack_cm_event(cm_event);
-                        return -EPIPE;
+                        return -1;
                     }
                     rdma_ack_cm_event(cm_event);
                 }
@@ -1575,18 +1575,18 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
                       * I don't trust errno from qemu_poll_ns
                      */
                 error_report("%s: poll failed", __func__);
-                return -EPIPE;
+                return -1;
             }
 
             if (migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) {
                 /* Bail out and let the cancellation happen */
-                return -EPIPE;
+                return -1;
             }
         }
     }
 
     if (rdma->received_error) {
-        return -EPIPE;
+        return -1;
     }
     return -!!rdma->error_state;
 }
@@ -1751,7 +1751,7 @@ static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
 
     if (ret > 0) {
         error_report("Failed to use post IB SEND for control");
-        return -ret;
+        return -1;
     }
 
     ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_SEND_CONTROL, NULL);
@@ -1820,15 +1820,15 @@ static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
         if (head->type == RDMA_CONTROL_ERROR) {
             rdma->received_error = true;
         }
-        return -EIO;
+        return -1;
     }
     if (head->len > RDMA_CONTROL_MAX_BUFFER - sizeof(*head)) {
         error_report("too long length: %d", head->len);
-        return -EINVAL;
+        return -1;
     }
     if (sizeof(*head) + head->len != byte_len) {
         error_report("Malformed length: %d byte_len %d", head->len, byte_len);
-        return -EINVAL;
+        return -1;
     }
 
     return 0;
@@ -2092,7 +2092,7 @@ retry:
                                 (uint8_t *) &comp, NULL, NULL, NULL);
 
                 if (ret < 0) {
-                    return -EIO;
+                    return -1;
                 }
 
                 stat64_add(&mig_stats.zero_pages,
@@ -2127,7 +2127,7 @@ retry:
                                                 &sge.lkey, NULL, chunk,
                                                 chunk_start, chunk_end)) {
                 error_report("cannot get lkey");
-                return -EINVAL;
+                return -1;
             }
 
             reg_result = (RDMARegisterResult *)
@@ -2146,7 +2146,7 @@ retry:
                                                 &sge.lkey, NULL, chunk,
                                                 chunk_start, chunk_end)) {
                 error_report("cannot get lkey!");
-                return -EINVAL;
+                return -1;
             }
         }
 
@@ -2158,7 +2158,7 @@ retry:
                                                      &sge.lkey, NULL, chunk,
                                                      chunk_start, chunk_end)) {
             error_report("cannot get lkey!");
-            return -EINVAL;
+            return -1;
         }
     }
 
@@ -2200,7 +2200,7 @@ retry:
 
     } else if (ret > 0) {
         perror("rdma migration: post rdma write failed");
-        return -ret;
+        return -1;
     }
 
     set_bit(chunk, block->transit_bitmap);
@@ -2920,14 +2920,14 @@ static int qemu_rdma_drain_cq(QEMUFile *f, RDMAContext *rdma)
     int ret;
 
     if (qemu_rdma_write_flush(f, rdma) < 0) {
-        return -EIO;
+        return -1;
     }
 
     while (rdma->nb_sent) {
         ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL);
         if (ret < 0) {
             error_report("rdma migration: complete polling error!");
-            return -EIO;
+            return -1;
         }
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 25/52] migration/rdma: Dumb down remaining int error values to -1
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (23 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 24/52] migration/rdma: Return -1 instead of negative errno code Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-26 10:16   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 26/52] migration/rdma: Replace int error_state by bool errored Markus Armbruster
                   ` (28 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

This is just to make the error value more obvious.  Callers don't
mind.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 43 ++++++++++++++++++++++---------------------
 1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index d0af258468..ad314cc10a 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1418,7 +1418,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
         ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
                                 &resp, NULL, NULL);
         if (ret < 0) {
-            return ret;
+            return -1;
         }
 
         trace_qemu_rdma_unregister_waiting_complete(chunk);
@@ -1459,7 +1459,7 @@ static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
 
     if (ret < 0) {
         error_report("ibv_poll_cq failed");
-        return ret;
+        return -1;
     }
 
     wr_id = wc.wr_id & RDMA_WRID_TYPE_MASK;
@@ -1633,7 +1633,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
     while (wr_id != wrid_requested) {
         ret = qemu_rdma_poll(rdma, poll_cq, &wr_id_in, byte_len);
         if (ret < 0) {
-            return ret;
+            return -1;
         }
 
         wr_id = wr_id_in & RDMA_WRID_TYPE_MASK;
@@ -1702,7 +1702,7 @@ err_block_for_wrid:
     }
 
     rdma->error_state = ret;
-    return ret;
+    return -1;
 }
 
 /*
@@ -1757,9 +1757,10 @@ static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
     ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_SEND_CONTROL, NULL);
     if (ret < 0) {
         error_report("rdma migration: send polling control error");
+        return -1;
     }
 
-    return ret;
+    return 0;
 }
 
 /*
@@ -1801,7 +1802,7 @@ static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
 
     if (ret < 0) {
         error_report("rdma migration: recv polling control error!");
-        return ret;
+        return -1;
     }
 
     network_to_control((void *) rdma->wr_data[idx].control);
@@ -1879,7 +1880,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
         ret = qemu_rdma_exchange_get_response(rdma,
                                     &resp, RDMA_CONTROL_READY, RDMA_WRID_READY);
         if (ret < 0) {
-            return ret;
+            return -1;
         }
     }
 
@@ -1891,7 +1892,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
         if (ret) {
             error_report("rdma migration: error posting"
                     " extra control recv for anticipated result!");
-            return ret;
+            return -1;
         }
     }
 
@@ -1901,7 +1902,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
     if (ret) {
         error_report("rdma migration: error posting first control recv!");
-        return ret;
+        return -1;
     }
 
     /*
@@ -1911,7 +1912,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
 
     if (ret < 0) {
         error_report("Failed to send control buffer!");
-        return ret;
+        return -1;
     }
 
     /*
@@ -1922,7 +1923,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
             trace_qemu_rdma_exchange_send_issue_callback();
             ret = callback(rdma);
             if (ret < 0) {
-                return ret;
+                return -1;
             }
         }
 
@@ -1931,7 +1932,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                               resp->type, RDMA_WRID_DATA);
 
         if (ret < 0) {
-            return ret;
+            return -1;
         }
 
         qemu_rdma_move_header(rdma, RDMA_WRID_DATA, resp);
@@ -1967,7 +1968,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
 
     if (ret < 0) {
         error_report("Failed to send control buffer!");
-        return ret;
+        return -1;
     }
 
     /*
@@ -1977,7 +1978,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
                                           expecting, RDMA_WRID_READY);
 
     if (ret < 0) {
-        return ret;
+        return -1;
     }
 
     qemu_rdma_move_header(rdma, RDMA_WRID_READY, head);
@@ -1988,7 +1989,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
     if (ret) {
         error_report("rdma migration: error posting second control recv!");
-        return ret;
+        return -1;
     }
 
     return 0;
@@ -2061,7 +2062,7 @@ retry:
                     "block %d chunk %" PRIu64
                     " current %" PRIu64 " len %" PRIu64 " %d",
                     current_index, chunk, sge.addr, length, rdma->nb_sent);
-            return ret;
+            return -1;
         }
     }
 
@@ -2119,7 +2120,7 @@ retry:
             ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
                                     &resp, &reg_result_idx, NULL);
             if (ret < 0) {
-                return ret;
+                return -1;
             }
 
             /* try to overlap this single registration with the one we sent. */
@@ -2193,7 +2194,7 @@ retry:
         if (ret < 0) {
             error_report("rdma migration: failed to make "
                          "room in full send queue!");
-            return ret;
+            return -1;
         }
 
         goto retry;
@@ -2230,7 +2231,7 @@ static int qemu_rdma_write_flush(QEMUFile *f, RDMAContext *rdma)
             rdma->current_index, rdma->current_addr, rdma->current_length);
 
     if (ret < 0) {
-        return ret;
+        return -1;
     }
 
     if (ret == 0) {
@@ -2312,7 +2313,7 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
     if (!qemu_rdma_buffer_mergable(rdma, current_addr, len)) {
         ret = qemu_rdma_write_flush(f, rdma);
         if (ret) {
-            return ret;
+            return -1;
         }
         rdma->current_length = 0;
         rdma->current_addr = current_addr;
@@ -3485,7 +3486,7 @@ err_rdma_dest_wait:
     rdma->error_state = ret;
     qemu_rdma_cleanup(rdma);
     g_free(rdma_return_path);
-    return ret;
+    return -1;
 }
 
 static int dest_ram_sort_func(const void *a, const void *b)
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 26/52] migration/rdma: Replace int error_state by bool errored
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (24 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 25/52] migration/rdma: Dumb down remaining int error values to -1 Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-25  6:17   ` Zhijian Li (Fujitsu)
  2023-09-27 17:38   ` Eric Blake
  2023-09-18 14:41 ` [PATCH 27/52] migration/rdma: Drop superfluous assignments to @ret Markus Armbruster
                   ` (27 subsequent siblings)
  53 siblings, 2 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

All we do with the value of RDMAContext member @error_state is test
whether it's zero.  Change to bool and rename to @errored.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 66 ++++++++++++++++++++++++------------------------
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index ad314cc10a..85f6b274bf 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -352,7 +352,7 @@ typedef struct RDMAContext {
      * memory registration, then do not attempt any future work
      * and remember the error state.
      */
-    int error_state;
+    int errored;
     bool error_reported;
     bool received_error;
 
@@ -439,14 +439,14 @@ typedef struct QEMU_PACKED {
     uint64_t chunks;            /* how many sequential chunks to register */
 } RDMARegister;
 
-static int check_error_state(RDMAContext *rdma)
+static bool rdma_errored(RDMAContext *rdma)
 {
-    if (rdma->error_state && !rdma->error_reported) {
+    if (rdma->errored && !rdma->error_reported) {
         error_report("RDMA is in an error state waiting migration"
                      " to abort!");
         rdma->error_reported = true;
     }
-    return rdma->error_state;
+    return rdma->errored;
 }
 
 static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
@@ -1531,7 +1531,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
          * But we need to be able to handle 'cancel' or an error
          * without hanging forever.
          */
-        while (!rdma->error_state  && !rdma->received_error) {
+        while (!rdma->errored && !rdma->received_error) {
             GPollFD pfds[2];
             pfds[0].fd = comp_channel->fd;
             pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
@@ -1588,7 +1588,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
     if (rdma->received_error) {
         return -1;
     }
-    return -!!rdma->error_state;
+    return -rdma->errored;
 }
 
 static struct ibv_comp_channel *to_channel(RDMAContext *rdma, uint64_t wrid)
@@ -1701,7 +1701,7 @@ err_block_for_wrid:
         ibv_ack_cq_events(cq, num_cq_events);
     }
 
-    rdma->error_state = ret;
+    rdma->errored = true;
     return -1;
 }
 
@@ -2340,7 +2340,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
     int idx;
 
     if (rdma->cm_id && rdma->connected) {
-        if ((rdma->error_state ||
+        if ((rdma->errored ||
              migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) &&
             !rdma->received_error) {
             RDMAControlHeader head = { .len = 0,
@@ -2621,14 +2621,14 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 
     if (!rdma->host || !rdma->host[0]) {
         ERROR(errp, "RDMA host is not set!");
-        rdma->error_state = -EINVAL;
+        rdma->errored = true;
         return -1;
     }
     /* create CM channel */
     rdma->channel = rdma_create_event_channel();
     if (!rdma->channel) {
         ERROR(errp, "could not create rdma event channel");
-        rdma->error_state = -EINVAL;
+        rdma->errored = true;
         return -1;
     }
 
@@ -2686,7 +2686,7 @@ err_dest_init_bind_addr:
 err_dest_init_create_listen_id:
     rdma_destroy_event_channel(rdma->channel);
     rdma->channel = NULL;
-    rdma->error_state = ret;
+    rdma->errored = true;
     return -1;
 
 }
@@ -2763,7 +2763,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
         return -1;
     }
 
-    if (rdma->error_state) {
+    if (rdma->errored) {
         error_setg(errp,
                    "RDMA is in an error state waiting migration to abort!");
         return -1;
@@ -2775,7 +2775,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
      */
     ret = qemu_rdma_write_flush(f, rdma);
     if (ret < 0) {
-        rdma->error_state = ret;
+        rdma->errored = true;
         error_setg(errp, "qemu_rdma_write_flush failed");
         return -1;
     }
@@ -2795,7 +2795,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
             ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
 
             if (ret < 0) {
-                rdma->error_state = ret;
+                rdma->errored = true;
                 error_setg(errp, "qemu_rdma_exchange_send failed");
                 return -1;
             }
@@ -2853,7 +2853,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
         return -1;
     }
 
-    if (rdma->error_state) {
+    if (rdma->errored) {
         error_setg(errp,
                    "RDMA is in an error state waiting migration to abort!");
         return -1;
@@ -2889,7 +2889,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
         ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_QEMU_FILE);
 
         if (ret < 0) {
-            rdma->error_state = ret;
+            rdma->errored = true;
             error_setg(errp, "qemu_rdma_exchange_recv failed");
             return -1;
         }
@@ -3162,21 +3162,21 @@ qio_channel_rdma_shutdown(QIOChannel *ioc,
     switch (how) {
     case QIO_CHANNEL_SHUTDOWN_READ:
         if (rdmain) {
-            rdmain->error_state = -1;
+            rdmain->errored = true;
         }
         break;
     case QIO_CHANNEL_SHUTDOWN_WRITE:
         if (rdmaout) {
-            rdmaout->error_state = -1;
+            rdmaout->errored = true;
         }
         break;
     case QIO_CHANNEL_SHUTDOWN_BOTH:
     default:
         if (rdmain) {
-            rdmain->error_state = -1;
+            rdmain->errored = true;
         }
         if (rdmaout) {
-            rdmaout->error_state = -1;
+            rdmaout->errored = true;
         }
         break;
     }
@@ -3221,7 +3221,7 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
         return -1;
     }
 
-    if (check_error_state(rdma)) {
+    if (rdma_errored(rdma)) {
         return -1;
     }
 
@@ -3290,7 +3290,7 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
     return RAM_SAVE_CONTROL_DELAYED;
 
 err:
-    rdma->error_state = ret;
+    rdma->errored = true;
     return -1;
 }
 
@@ -3311,13 +3311,13 @@ static void rdma_cm_poll_handler(void *opaque)
 
     if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
         cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
-        if (!rdma->error_state &&
+        if (!rdma->errored &&
             migration_incoming_get_current()->state !=
               MIGRATION_STATUS_COMPLETED) {
             error_report("receive cm event, cm event is %d", cm_event->event);
-            rdma->error_state = -EPIPE;
+            rdma->errored = true;
             if (rdma->return_path) {
-                rdma->return_path->error_state = -EPIPE;
+                rdma->return_path->errored = true;
             }
         }
         rdma_ack_cm_event(cm_event);
@@ -3483,7 +3483,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     return 0;
 
 err_rdma_dest_wait:
-    rdma->error_state = ret;
+    rdma->errored = true;
     qemu_rdma_cleanup(rdma);
     g_free(rdma_return_path);
     return -1;
@@ -3540,7 +3540,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
         return -1;
     }
 
-    if (check_error_state(rdma)) {
+    if (rdma_errored(rdma)) {
         return -1;
     }
 
@@ -3779,7 +3779,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
     } while (1);
 
 err:
-    rdma->error_state = ret;
+    rdma->errored = true;
     return -1;
 }
 
@@ -3856,7 +3856,7 @@ static int qemu_rdma_registration_start(QEMUFile *f,
         return -1;
     }
 
-    if (check_error_state(rdma)) {
+    if (rdma_errored(rdma)) {
         return -1;
     }
 
@@ -3889,7 +3889,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
         return -1;
     }
 
-    if (check_error_state(rdma)) {
+    if (rdma_errored(rdma)) {
         return -1;
     }
 
@@ -3943,7 +3943,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
                     "Your QEMU command line parameters are probably "
                     "not identical on both the source and destination.",
                     local->nb_blocks, nb_dest_blocks);
-            rdma->error_state = -EINVAL;
+            rdma->errored = true;
             return -1;
         }
 
@@ -3959,7 +3959,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
                         "vs %" PRIu64, local->block[i].block_name, i,
                         local->block[i].length,
                         rdma->dest_blocks[i].length);
-                rdma->error_state = -EINVAL;
+                rdma->errored = true;
                 return -1;
             }
             local->block[i].remote_host_addr =
@@ -3979,7 +3979,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
 
     return 0;
 err:
-    rdma->error_state = ret;
+    rdma->errored = true;
     return -1;
 }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 27/52] migration/rdma: Drop superfluous assignments to @ret
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (25 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 26/52] migration/rdma: Replace int error_state by bool errored Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-25  6:20   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 28/52] migration/rdma: Check negative error values the same way everywhere Markus Armbruster
                   ` (26 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 35 ++++++++++-------------------------
 1 file changed, 10 insertions(+), 25 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 85f6b274bf..62d95b7d2c 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1514,7 +1514,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
                                        struct ibv_comp_channel *comp_channel)
 {
     struct rdma_cm_event *cm_event;
-    int ret = -1;
+    int ret;
 
     /*
      * Coroutine doesn't start until migration_fd_process_incoming()
@@ -1619,7 +1619,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
                                     uint64_t wrid_requested,
                                     uint32_t *byte_len)
 {
-    int num_cq_events = 0, ret = 0;
+    int num_cq_events = 0, ret;
     struct ibv_cq *cq;
     void *cq_ctx;
     uint64_t wr_id = RDMA_WRID_NONE, wr_id_in;
@@ -1664,8 +1664,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
 
         num_cq_events++;
 
-        ret = -ibv_req_notify_cq(cq, 0);
-        if (ret) {
+        if (ibv_req_notify_cq(cq, 0)) {
             goto err_block_for_wrid;
         }
 
@@ -1712,7 +1711,7 @@ err_block_for_wrid:
 static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
                                        RDMAControlHeader *head)
 {
-    int ret = 0;
+    int ret;
     RDMAWorkRequestData *wr = &rdma->wr_data[RDMA_WRID_CONTROL];
     struct ibv_send_wr *bad_wr;
     struct ibv_sge sge = {
@@ -1869,7 +1868,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    int *resp_idx,
                                    int (*callback)(RDMAContext *rdma))
 {
-    int ret = 0;
+    int ret;
 
     /*
      * Wait until the dest is ready before attempting to deliver the message
@@ -2841,7 +2840,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
     RDMAContext *rdma;
     RDMAControlHeader head;
-    int ret = 0;
+    int ret;
     size_t i;
     size_t done = 0;
 
@@ -3340,7 +3339,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     RDMAContext *rdma_return_path = NULL;
     struct rdma_cm_event *cm_event;
     struct ibv_context *verbs;
-    int ret = -EINVAL;
+    int ret;
     int idx;
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
@@ -3350,7 +3349,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 
     if (cm_event->event != RDMA_CM_EVENT_CONNECT_REQUEST) {
         rdma_ack_cm_event(cm_event);
-        ret = -1;
         goto err_rdma_dest_wait;
     }
 
@@ -3363,7 +3361,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
         rdma_return_path = qemu_rdma_data_init(rdma->host_port, NULL);
         if (rdma_return_path == NULL) {
             rdma_ack_cm_event(cm_event);
-            ret = -1;
             goto err_rdma_dest_wait;
         }
 
@@ -3378,7 +3375,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
         error_report("Unknown source RDMA version: %d, bailing...",
                      cap.version);
         rdma_ack_cm_event(cm_event);
-        ret = -1;
         goto err_rdma_dest_wait;
     }
 
@@ -3411,7 +3407,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     } else if (rdma->verbs != verbs) {
         error_report("ibv context not matching %p, %p!", rdma->verbs,
                      verbs);
-        ret = -1;
         goto err_rdma_dest_wait;
     }
 
@@ -3465,7 +3460,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     if (cm_event->event != RDMA_CM_EVENT_ESTABLISHED) {
         error_report("rdma_accept not event established");
         rdma_ack_cm_event(cm_event);
-        ret = -1;
         goto err_rdma_dest_wait;
     }
 
@@ -3528,7 +3522,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
     static RDMARegisterResult results[RDMA_CONTROL_MAX_COMMANDS_PER_MESSAGE];
     RDMALocalBlock *block;
     void *host_addr;
-    int ret = 0;
+    int ret;
     int idx = 0;
     int count = 0;
     int i = 0;
@@ -3557,7 +3551,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
         if (head.repeat > RDMA_CONTROL_MAX_COMMANDS_PER_MESSAGE) {
             error_report("rdma: Too many requests in this message (%d)."
                             "Bailing.", head.repeat);
-            ret = -EIO;
             break;
         }
 
@@ -3573,7 +3566,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                 error_report("rdma: 'compress' bad block index %u (vs %d)",
                              (unsigned int)comp->block_idx,
                              rdma->local_ram_blocks.nb_blocks);
-                ret = -EIO;
                 goto err;
             }
             block = &(rdma->local_ram_blocks.block[comp->block_idx]);
@@ -3672,7 +3664,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                     error_report("rdma: 'register' bad block index %u (vs %d)",
                                  (unsigned int)reg->current_index,
                                  rdma->local_ram_blocks.nb_blocks);
-                    ret = -ENOENT;
                     goto err;
                 }
                 block = &(rdma->local_ram_blocks.block[reg->current_index]);
@@ -3682,7 +3673,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                             " offset: %" PRIx64 " current_addr: %" PRIx64,
                             block->block_name, block->offset,
                             reg->key.current_addr);
-                        ret = -ERANGE;
                         goto err;
                     }
                     host_addr = (block->local_host_addr +
@@ -3698,7 +3688,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                         error_report("rdma: bad chunk for block %s"
                             " chunk: %" PRIx64,
                             block->block_name, reg->key.chunk);
-                        ret = -ERANGE;
                         goto err;
                     }
                 }
@@ -3710,7 +3699,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                             (uintptr_t)host_addr, NULL, &tmp_rkey,
                             chunk, chunk_start, chunk_end)) {
                     error_report("cannot get rkey");
-                    ret = -EINVAL;
                     goto err;
                 }
                 reg_result->rkey = tmp_rkey;
@@ -3750,7 +3738,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
                 if (ret != 0) {
                     perror("rdma unregistration chunk failed");
-                    ret = -ret;
                     goto err;
                 }
 
@@ -3769,11 +3756,9 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
             break;
         case RDMA_CONTROL_REGISTER_RESULT:
             error_report("Invalid RESULT message at dest.");
-            ret = -EIO;
             goto err;
         default:
             error_report("Unknown control message %s", control_desc(head.type));
-            ret = -EIO;
             goto err;
         }
     } while (1);
@@ -3877,7 +3862,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
     RDMAContext *rdma;
     RDMAControlHeader head = { .len = 0, .repeat = 1 };
-    int ret = 0;
+    int ret;
 
     if (migration_in_postcopy()) {
         return 0;
@@ -4151,7 +4136,7 @@ void rdma_start_outgoing_migration(void *opaque,
     MigrationState *s = opaque;
     RDMAContext *rdma_return_path = NULL;
     RDMAContext *rdma;
-    int ret = 0;
+    int ret;
 
     /* Avoid ram_block_discard_disable(), cannot change during migration. */
     if (ram_block_discard_is_required()) {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 28/52] migration/rdma: Check negative error values the same way everywhere
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (26 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 27/52] migration/rdma: Drop superfluous assignments to @ret Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-25  6:26   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 29/52] migration/rdma: Plug a memory leak and improve a message Markus Armbruster
                   ` (25 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

When a function returns 0 on success, negative value on error,
checking for non-zero suffices, but checking for negative is clearer.
So do that.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 82 ++++++++++++++++++++++++------------------------
 1 file changed, 41 insertions(+), 41 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 62d95b7d2c..6c643a1b30 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -947,7 +947,7 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
 
     /* create CM id */
     ret = rdma_create_id(rdma->channel, &rdma->cm_id, NULL, RDMA_PS_TCP);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "could not create channel id");
         goto err_resolve_create_id;
     }
@@ -968,10 +968,10 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
 
         ret = rdma_resolve_addr(rdma->cm_id, NULL, e->ai_dst_addr,
                 RDMA_RESOLVE_TIMEOUT_MS);
-        if (!ret) {
+        if (ret >= 0) {
             if (e->ai_family == AF_INET6) {
                 ret = qemu_rdma_broken_ipv6_kernel(rdma->cm_id->verbs, errp);
-                if (ret) {
+                if (ret < 0) {
                     continue;
                 }
             }
@@ -988,7 +988,7 @@ route:
     qemu_rdma_dump_gid("source_resolve_addr", rdma->cm_id);
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "could not perform event_addr_resolved");
         goto err_resolve_get_addr;
     }
@@ -1004,13 +1004,13 @@ route:
 
     /* resolve route */
     ret = rdma_resolve_route(rdma->cm_id, RDMA_RESOLVE_TIMEOUT_MS);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "could not resolve rdma route");
         goto err_resolve_get_addr;
     }
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "could not perform event_route_resolved");
         goto err_resolve_get_addr;
     }
@@ -1118,7 +1118,7 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma)
     attr.qp_type = IBV_QPT_RC;
 
     ret = rdma_create_qp(rdma->cm_id, rdma->pd, &attr);
-    if (ret) {
+    if (ret < 0) {
         return -1;
     }
 
@@ -1551,7 +1551,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
 
                 if (pfds[1].revents) {
                     ret = rdma_get_cm_event(rdma->channel, &cm_event);
-                    if (ret) {
+                    if (ret < 0) {
                         error_report("failed to get cm event while wait "
                                      "completion channel");
                         return -1;
@@ -1652,12 +1652,12 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
 
     while (1) {
         ret = qemu_rdma_wait_comp_channel(rdma, ch);
-        if (ret) {
+        if (ret < 0) {
             goto err_block_for_wrid;
         }
 
         ret = ibv_get_cq_event(ch, &cq, &cq_ctx);
-        if (ret) {
+        if (ret < 0) {
             perror("ibv_get_cq_event");
             goto err_block_for_wrid;
         }
@@ -1888,7 +1888,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
      */
     if (resp) {
         ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_DATA);
-        if (ret) {
+        if (ret < 0) {
             error_report("rdma migration: error posting"
                     " extra control recv for anticipated result!");
             return -1;
@@ -1899,7 +1899,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
      * Post a WR to replace the one we just consumed for the READY message.
      */
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma migration: error posting first control recv!");
         return -1;
     }
@@ -1986,7 +1986,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
      * Post a new RECV work request to replace the one we just consumed.
      */
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma migration: error posting second control recv!");
         return -1;
     }
@@ -2311,7 +2311,7 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
     /* If we cannot merge it, we flush the current buffer first. */
     if (!qemu_rdma_buffer_mergable(rdma, current_addr, len)) {
         ret = qemu_rdma_write_flush(f, rdma);
-        if (ret) {
+        if (ret < 0) {
             return -1;
         }
         rdma->current_length = 0;
@@ -2441,12 +2441,12 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
     rdma->pin_all = pin_all;
 
     ret = qemu_rdma_resolve_host(rdma, errp);
-    if (ret) {
+    if (ret < 0) {
         goto err_rdma_source_init;
     }
 
     ret = qemu_rdma_alloc_pd_cq(rdma);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "rdma migration: error allocating pd and cq! Your mlock()"
                     " limits may be too low. Please check $ ulimit -a # and "
                     "search for 'ulimit -l' in the output");
@@ -2454,7 +2454,7 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
     }
 
     ret = qemu_rdma_alloc_qp(rdma);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "rdma migration: error allocating qp!");
         goto err_rdma_source_init;
     }
@@ -2471,7 +2471,7 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
 
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
-        if (ret) {
+        if (ret < 0) {
             ERROR(errp, "rdma migration: error registering %d control!",
                                                             idx);
             goto err_rdma_source_init;
@@ -2545,13 +2545,13 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
     caps_to_network(&cap);
 
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "posting second control recv");
         goto err_rdma_source_connect;
     }
 
     ret = rdma_connect(rdma->cm_id, &conn_param);
-    if (ret) {
+    if (ret < 0) {
         perror("rdma_connect");
         ERROR(errp, "connecting to destination!");
         goto err_rdma_source_connect;
@@ -2565,7 +2565,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
             ERROR(errp, "failed to get cm event");
         }
     }
-    if (ret) {
+    if (ret < 0) {
         perror("rdma_get_cm_event after rdma_connect");
         goto err_rdma_source_connect;
     }
@@ -2633,7 +2633,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 
     /* create CM id */
     ret = rdma_create_id(rdma->channel, &listen_id, NULL, RDMA_PS_TCP);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "could not create cm_id!");
         goto err_dest_init_create_listen_id;
     }
@@ -2649,7 +2649,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 
     ret = rdma_set_option(listen_id, RDMA_OPTION_ID, RDMA_OPTION_ID_REUSEADDR,
                           &reuse, sizeof reuse);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "Error: could not set REUSEADDR option");
         goto err_dest_init_bind_addr;
     }
@@ -2658,12 +2658,12 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
             &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof ip);
         trace_qemu_rdma_dest_init_trying(rdma->host, ip);
         ret = rdma_bind_addr(listen_id, e->ai_dst_addr);
-        if (ret) {
+        if (ret < 0) {
             continue;
         }
         if (e->ai_family == AF_INET6) {
             ret = qemu_rdma_broken_ipv6_kernel(listen_id->verbs, errp);
-            if (ret) {
+            if (ret < 0) {
                 continue;
             }
         }
@@ -3303,7 +3303,7 @@ static void rdma_cm_poll_handler(void *opaque)
     MigrationIncomingState *mis = migration_incoming_get_current();
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
-    if (ret) {
+    if (ret < 0) {
         error_report("get_cm_event failed %d", errno);
         return;
     }
@@ -3343,7 +3343,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     int idx;
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
-    if (ret) {
+    if (ret < 0) {
         goto err_rdma_dest_wait;
     }
 
@@ -3413,13 +3413,13 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     qemu_rdma_dump_id("dest_init", verbs);
 
     ret = qemu_rdma_alloc_pd_cq(rdma);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma migration: error allocating pd and cq!");
         goto err_rdma_dest_wait;
     }
 
     ret = qemu_rdma_alloc_qp(rdma);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma migration: error allocating qp!");
         goto err_rdma_dest_wait;
     }
@@ -3428,7 +3428,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
-        if (ret) {
+        if (ret < 0) {
             error_report("rdma: error registering %d control", idx);
             goto err_rdma_dest_wait;
         }
@@ -3446,13 +3446,13 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     }
 
     ret = rdma_accept(rdma->cm_id, &conn_param);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma_accept failed");
         goto err_rdma_dest_wait;
     }
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma_accept get_cm_event failed");
         goto err_rdma_dest_wait;
     }
@@ -3467,7 +3467,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     rdma->connected = true;
 
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma migration: error posting second control recv");
         goto err_rdma_dest_wait;
     }
@@ -3596,7 +3596,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
             if (rdma->pin_all) {
                 ret = qemu_rdma_reg_whole_ram_blocks(rdma);
-                if (ret) {
+                if (ret < 0) {
                     error_report("rdma migration: error dest "
                                     "registering ram blocks");
                     goto err;
@@ -4057,7 +4057,7 @@ static void rdma_accept_incoming_migration(void *opaque)
     trace_qemu_rdma_accept_incoming_migration();
     ret = qemu_rdma_accept(rdma);
 
-    if (ret) {
+    if (ret < 0) {
         fprintf(stderr, "RDMA ERROR: Migration initialization failed\n");
         return;
     }
@@ -4101,7 +4101,7 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
     }
 
     ret = qemu_rdma_dest_init(rdma, errp);
-    if (ret) {
+    if (ret < 0) {
         goto err;
     }
 
@@ -4109,7 +4109,7 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
 
     ret = rdma_listen(rdma->listen_id, 5);
 
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "listening on socket!");
         goto cleanup_rdma;
     }
@@ -4151,14 +4151,14 @@ void rdma_start_outgoing_migration(void *opaque,
 
     ret = qemu_rdma_source_init(rdma, migrate_rdma_pin_all(), errp);
 
-    if (ret) {
+    if (ret < 0) {
         goto err;
     }
 
     trace_rdma_start_outgoing_migration_after_rdma_source_init();
     ret = qemu_rdma_connect(rdma, false, errp);
 
-    if (ret) {
+    if (ret < 0) {
         goto err;
     }
 
@@ -4173,13 +4173,13 @@ void rdma_start_outgoing_migration(void *opaque,
         ret = qemu_rdma_source_init(rdma_return_path,
                                     migrate_rdma_pin_all(), errp);
 
-        if (ret) {
+        if (ret < 0) {
             goto return_path_err;
         }
 
         ret = qemu_rdma_connect(rdma_return_path, true, errp);
 
-        if (ret) {
+        if (ret < 0) {
             goto return_path_err;
         }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 29/52] migration/rdma: Plug a memory leak and improve a message
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (27 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 28/52] migration/rdma: Check negative error values the same way everywhere Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-25  6:31   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 30/52] migration/rdma: Delete inappropriate error_report() in macro ERROR() Markus Armbruster
                   ` (24 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

When migration capability @rdma-pin-all is true, but the server cannot
honor it, qemu_rdma_connect() calls macro ERROR(), then returns
success.

ERROR() sets an error.  Since qemu_rdma_connect() returns success, its
caller rdma_start_outgoing_migration() duly assumes @errp is still
clear.  The Error object leaks.

ERROR() additionally reports the situation to the user as an error:

    RDMA ERROR: Server cannot support pinning all memory. Will register memory dynamically.

Is this an error or not?  It actually isn't; we disable @rdma-pin-all
and carry on.  "Correcting" the user's configuration decisions that
way feels problematic, but that's a topic for another day.

Replace ERROR() by warn_report().  This plugs the memory leak, and
emits a clearer message to the user.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 6c643a1b30..d52de857c5 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2586,8 +2586,8 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
      * and disable them otherwise.
      */
     if (rdma->pin_all && !(cap.flags & RDMA_CAPABILITY_PIN_ALL)) {
-        ERROR(errp, "Server cannot support pinning all memory. "
-                        "Will register memory dynamically.");
+        warn_report("RDMA: Server cannot support pinning all memory. "
+                    "Will register memory dynamically.");
         rdma->pin_all = false;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 30/52] migration/rdma: Delete inappropriate error_report() in macro ERROR()
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (28 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 29/52] migration/rdma: Plug a memory leak and improve a message Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-25  6:35   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 31/52] migration/rdma: Retire " Markus Armbruster
                   ` (23 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

Macro ERROR() violates this principle.  Delete the error_report()
there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index d52de857c5..be31694d4f 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -40,12 +40,8 @@
 #include "options.h"
 #include <poll.h>
 
-/*
- * Print and error on both the Monitor and the Log file.
- */
 #define ERROR(errp, fmt, ...) \
     do { \
-        fprintf(stderr, "RDMA ERROR: " fmt "\n", ## __VA_ARGS__); \
         if (errp && (*(errp) == NULL)) { \
             error_setg(errp, "RDMA ERROR: " fmt, ## __VA_ARGS__); \
         } \
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 31/52] migration/rdma: Retire macro ERROR()
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (29 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 30/52] migration/rdma: Delete inappropriate error_report() in macro ERROR() Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-25  7:31   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 32/52] migration/rdma: Fix error handling around rdma_getaddrinfo() Markus Armbruster
                   ` (22 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

ERROR() has become "error_setg() unless an error has been set
already".  Hiding the conditional in the macro is in the way of
further work.  Replace the macro uses by their expansion, and delete
the macro.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 168 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 120 insertions(+), 48 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index be31694d4f..df5b3a8e2c 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -40,13 +40,6 @@
 #include "options.h"
 #include <poll.h>
 
-#define ERROR(errp, fmt, ...) \
-    do { \
-        if (errp && (*(errp) == NULL)) { \
-            error_setg(errp, "RDMA ERROR: " fmt, ## __VA_ARGS__); \
-        } \
-    } while (0)
-
 #define RDMA_RESOLVE_TIMEOUT_MS 10000
 
 /* Do not merge data if larger than this. */
@@ -859,7 +852,10 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
 
             if (ibv_query_port(verbs, 1, &port_attr)) {
                 ibv_close_device(verbs);
-                ERROR(errp, "Could not query initial IB port");
+                if (errp && !*errp) {
+                    error_setg(errp,
+                               "RDMA ERROR: Could not query initial IB port");
+                }
                 return -1;
             }
 
@@ -882,9 +878,12 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
                                 " migrate over the IB fabric until the kernel "
                                 " fixes the bug.\n");
             } else {
-                ERROR(errp, "You only have RoCE / iWARP devices in your systems"
-                            " and your management software has specified '[::]'"
-                            ", but IPv6 over RoCE / iWARP is not supported in Linux.");
+                if (errp && !*errp) {
+                    error_setg(errp, "RDMA ERROR: "
+                               "You only have RoCE / iWARP devices in your systems"
+                               " and your management software has specified '[::]'"
+                               ", but IPv6 over RoCE / iWARP is not supported in Linux.");
+                }
                 return -1;
             }
         }
@@ -900,13 +899,18 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
 
     /* IB ports start with 1, not 0 */
     if (ibv_query_port(verbs, 1, &port_attr)) {
-        ERROR(errp, "Could not query initial IB port");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: Could not query initial IB port");
+        }
         return -1;
     }
 
     if (port_attr.link_layer == IBV_LINK_LAYER_ETHERNET) {
-        ERROR(errp, "Linux kernel's RoCE / iWARP does not support IPv6 "
-                    "(but patches on linux-rdma in progress)");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: "
+                       "Linux kernel's RoCE / iWARP does not support IPv6 "
+                       "(but patches on linux-rdma in progress)");
+        }
         return -1;
     }
 
@@ -930,21 +934,27 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
     struct rdma_addrinfo *e;
 
     if (rdma->host == NULL || !strcmp(rdma->host, "")) {
-        ERROR(errp, "RDMA hostname has not been set");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: RDMA hostname has not been set");
+        }
         return -1;
     }
 
     /* create CM channel */
     rdma->channel = rdma_create_event_channel();
     if (!rdma->channel) {
-        ERROR(errp, "could not create CM channel");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not create CM channel");
+        }
         return -1;
     }
 
     /* create CM id */
     ret = rdma_create_id(rdma->channel, &rdma->cm_id, NULL, RDMA_PS_TCP);
     if (ret < 0) {
-        ERROR(errp, "could not create channel id");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not create channel id");
+        }
         goto err_resolve_create_id;
     }
 
@@ -953,7 +963,10 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
 
     ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
     if (ret) {
-        ERROR(errp, "could not rdma_getaddrinfo address %s", rdma->host);
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
+                       rdma->host);
+        }
         goto err_resolve_get_addr;
     }
 
@@ -976,7 +989,10 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
     }
 
     rdma_freeaddrinfo(res);
-    ERROR(errp, "could not resolve address %s", rdma->host);
+    if (errp && !*errp) {
+        error_setg(errp, "RDMA ERROR: could not resolve address %s",
+                   rdma->host);
+    }
     goto err_resolve_get_addr;
 
 route:
@@ -985,13 +1001,18 @@ route:
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
     if (ret < 0) {
-        ERROR(errp, "could not perform event_addr_resolved");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not perform event_addr_resolved");
+        }
         goto err_resolve_get_addr;
     }
 
     if (cm_event->event != RDMA_CM_EVENT_ADDR_RESOLVED) {
-        ERROR(errp, "result not equal to event_addr_resolved %s",
-                rdma_event_str(cm_event->event));
+        if (errp && !*errp) {
+            error_setg(errp,
+                       "RDMA ERROR: result not equal to event_addr_resolved %s",
+                       rdma_event_str(cm_event->event));
+        }
         error_report("rdma_resolve_addr");
         rdma_ack_cm_event(cm_event);
         goto err_resolve_get_addr;
@@ -1001,18 +1022,25 @@ route:
     /* resolve route */
     ret = rdma_resolve_route(rdma->cm_id, RDMA_RESOLVE_TIMEOUT_MS);
     if (ret < 0) {
-        ERROR(errp, "could not resolve rdma route");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not resolve rdma route");
+        }
         goto err_resolve_get_addr;
     }
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
     if (ret < 0) {
-        ERROR(errp, "could not perform event_route_resolved");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not perform event_route_resolved");
+        }
         goto err_resolve_get_addr;
     }
     if (cm_event->event != RDMA_CM_EVENT_ROUTE_RESOLVED) {
-        ERROR(errp, "result not equal to event_route_resolved: %s",
-                        rdma_event_str(cm_event->event));
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: "
+                       "result not equal to event_route_resolved: %s",
+                       rdma_event_str(cm_event->event));
+        }
         rdma_ack_cm_event(cm_event);
         goto err_resolve_get_addr;
     }
@@ -2443,15 +2471,20 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
 
     ret = qemu_rdma_alloc_pd_cq(rdma);
     if (ret < 0) {
-        ERROR(errp, "rdma migration: error allocating pd and cq! Your mlock()"
-                    " limits may be too low. Please check $ ulimit -a # and "
-                    "search for 'ulimit -l' in the output");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: "
+                       "rdma migration: error allocating pd and cq! Your mlock()"
+                       " limits may be too low. Please check $ ulimit -a # and "
+                       "search for 'ulimit -l' in the output");
+        }
         goto err_rdma_source_init;
     }
 
     ret = qemu_rdma_alloc_qp(rdma);
     if (ret < 0) {
-        ERROR(errp, "rdma migration: error allocating qp!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: rdma migration: error allocating qp!");
+        }
         goto err_rdma_source_init;
     }
 
@@ -2468,8 +2501,11 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
         if (ret < 0) {
-            ERROR(errp, "rdma migration: error registering %d control!",
-                                                            idx);
+            if (errp && !*errp) {
+                error_setg(errp,
+                           "RDMA ERROR: rdma migration: error registering %d control!",
+                           idx);
+            }
             goto err_rdma_source_init;
         }
     }
@@ -2497,19 +2533,29 @@ static int qemu_get_cm_event_timeout(RDMAContext *rdma,
     } while (ret < 0 && errno == EINTR);
 
     if (ret == 0) {
-        ERROR(errp, "poll cm event timeout");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: poll cm event timeout");
+        }
         return -1;
     } else if (ret < 0) {
-        ERROR(errp, "failed to poll cm event, errno=%i", errno);
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: failed to poll cm event, errno=%i",
+                       errno);
+        }
         return -1;
     } else if (poll_fd.revents & POLLIN) {
         if (rdma_get_cm_event(rdma->channel, cm_event) < 0) {
-            ERROR(errp, "failed to get cm event");
+            if (errp && !*errp) {
+                error_setg(errp, "RDMA ERROR: failed to get cm event");
+            }
             return -1;
         }
         return 0;
     } else {
-        ERROR(errp, "no POLLIN event, revent=%x", poll_fd.revents);
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: no POLLIN event, revent=%x",
+                       poll_fd.revents);
+        }
         return -1;
     }
 }
@@ -2542,14 +2588,18 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
 
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
     if (ret < 0) {
-        ERROR(errp, "posting second control recv");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: posting second control recv");
+        }
         goto err_rdma_source_connect;
     }
 
     ret = rdma_connect(rdma->cm_id, &conn_param);
     if (ret < 0) {
         perror("rdma_connect");
-        ERROR(errp, "connecting to destination!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: connecting to destination!");
+        }
         goto err_rdma_source_connect;
     }
 
@@ -2558,7 +2608,9 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
     } else {
         ret = rdma_get_cm_event(rdma->channel, &cm_event);
         if (ret < 0) {
-            ERROR(errp, "failed to get cm event");
+            if (errp && !*errp) {
+                error_setg(errp, "RDMA ERROR: failed to get cm event");
+            }
         }
     }
     if (ret < 0) {
@@ -2568,7 +2620,9 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
 
     if (cm_event->event != RDMA_CM_EVENT_ESTABLISHED) {
         error_report("rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect");
-        ERROR(errp, "connecting to destination!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: connecting to destination!");
+        }
         rdma_ack_cm_event(cm_event);
         goto err_rdma_source_connect;
     }
@@ -2615,14 +2669,18 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
     }
 
     if (!rdma->host || !rdma->host[0]) {
-        ERROR(errp, "RDMA host is not set!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: RDMA host is not set!");
+        }
         rdma->errored = true;
         return -1;
     }
     /* create CM channel */
     rdma->channel = rdma_create_event_channel();
     if (!rdma->channel) {
-        ERROR(errp, "could not create rdma event channel");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not create rdma event channel");
+        }
         rdma->errored = true;
         return -1;
     }
@@ -2630,7 +2688,9 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
     /* create CM id */
     ret = rdma_create_id(rdma->channel, &listen_id, NULL, RDMA_PS_TCP);
     if (ret < 0) {
-        ERROR(errp, "could not create cm_id!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not create cm_id!");
+        }
         goto err_dest_init_create_listen_id;
     }
 
@@ -2639,14 +2699,19 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 
     ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
     if (ret) {
-        ERROR(errp, "could not rdma_getaddrinfo address %s", rdma->host);
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
+                       rdma->host);
+        }
         goto err_dest_init_bind_addr;
     }
 
     ret = rdma_set_option(listen_id, RDMA_OPTION_ID, RDMA_OPTION_ID_REUSEADDR,
                           &reuse, sizeof reuse);
     if (ret < 0) {
-        ERROR(errp, "Error: could not set REUSEADDR option");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: Error: could not set REUSEADDR option");
+        }
         goto err_dest_init_bind_addr;
     }
     for (e = res; e != NULL; e = e->ai_next) {
@@ -2668,7 +2733,9 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 
     rdma_freeaddrinfo(res);
     if (!e) {
-        ERROR(errp, "Error: could not rdma_bind_addr!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: Error: could not rdma_bind_addr!");
+        }
         goto err_dest_init_bind_addr;
     }
 
@@ -2720,7 +2787,10 @@ static RDMAContext *qemu_rdma_data_init(const char *host_port, Error **errp)
         rdma->host = g_strdup(addr->host);
         rdma->host_port = g_strdup(host_port);
     } else {
-        ERROR(errp, "bad RDMA migration address '%s'", host_port);
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: bad RDMA migration address '%s'",
+                       host_port);
+        }
         g_free(rdma);
         rdma = NULL;
     }
@@ -4106,7 +4176,9 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
     ret = rdma_listen(rdma->listen_id, 5);
 
     if (ret < 0) {
-        ERROR(errp, "listening on socket!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: listening on socket!");
+        }
         goto cleanup_rdma;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 32/52] migration/rdma: Fix error handling around rdma_getaddrinfo()
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (30 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 31/52] migration/rdma: Retire " Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-25  7:32   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 33/52] migration/rdma: Drop "@errp is clear" guards around error_setg() Markus Armbruster
                   ` (21 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

qemu_rdma_resolve_host() and qemu_rdma_dest_init() iterate over
addresses to find one that works, holding onto the first Error from
qemu_rdma_broken_ipv6_kernel() for use when no address works.  Issues:

1. If @errp was &error_abort or &error_fatal, we'd terminate instead
   of trying the next address.  Can't actually happen, since no caller
   passes these arguments.

2. When @errp is a pointer to a variable containing NULL, and
   qemu_rdma_broken_ipv6_kernel() fails, the variable no longer
   contains NULL.  Subsequent iterations pass it again, violating
   Error usage rules.  Dangerous, as setting an error would then trip
   error_setv()'s assertion.  Works only because
   qemu_rdma_broken_ipv6_kernel() and the code following the loops
   carefully avoids setting a second error.

3. If qemu_rdma_broken_ipv6_kernel() fails, and then a later iteration
   finds a working address, @errp still holds the first error from
   qemu_rdma_broken_ipv6_kernel().  If we then run into another error,
   we report the qemu_rdma_broken_ipv6_kernel() failure instead.

4. If we don't run into another error, we leak the Error object.

Use a local error variable, and propagate to @errp.  This fixes 3. and
also cleans up 1 and partly 2.

Free this error when we have a working address.  This fixes 4.

Pass the local error variable to qemu_rdma_broken_ipv6_kernel() only
until it fails.  Pass null on any later iterations.  This cleans up
the remainder of 2.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index df5b3a8e2c..d29affe410 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -926,6 +926,7 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
  */
 static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
 {
+    Error *err = NULL;
     int ret;
     struct rdma_addrinfo *res;
     char port_str[16];
@@ -970,7 +971,10 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
         goto err_resolve_get_addr;
     }
 
+    /* Try all addresses, saving the first error in @err */
     for (e = res; e != NULL; e = e->ai_next) {
+        Error **local_errp = err ? NULL : &err;
+
         inet_ntop(e->ai_family,
             &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof ip);
         trace_qemu_rdma_resolve_host_trying(rdma->host, ip);
@@ -979,17 +983,21 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
                 RDMA_RESOLVE_TIMEOUT_MS);
         if (ret >= 0) {
             if (e->ai_family == AF_INET6) {
-                ret = qemu_rdma_broken_ipv6_kernel(rdma->cm_id->verbs, errp);
+                ret = qemu_rdma_broken_ipv6_kernel(rdma->cm_id->verbs,
+                                                   local_errp);
                 if (ret < 0) {
                     continue;
                 }
             }
+            error_free(err);
             goto route;
         }
     }
 
     rdma_freeaddrinfo(res);
-    if (errp && !*errp) {
+    if (err) {
+        error_propagate(errp, err);
+    } else {
         error_setg(errp, "RDMA ERROR: could not resolve address %s",
                    rdma->host);
     }
@@ -2656,6 +2664,7 @@ err_rdma_source_connect:
 
 static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 {
+    Error *err = NULL;
     int ret, idx;
     struct rdma_cm_id *listen_id;
     char ip[40] = "unknown";
@@ -2714,7 +2723,11 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
         }
         goto err_dest_init_bind_addr;
     }
+
+    /* Try all addresses, saving the first error in @err */
     for (e = res; e != NULL; e = e->ai_next) {
+        Error **local_errp = err ? NULL : &err;
+
         inet_ntop(e->ai_family,
             &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof ip);
         trace_qemu_rdma_dest_init_trying(rdma->host, ip);
@@ -2723,17 +2736,21 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
             continue;
         }
         if (e->ai_family == AF_INET6) {
-            ret = qemu_rdma_broken_ipv6_kernel(listen_id->verbs, errp);
+            ret = qemu_rdma_broken_ipv6_kernel(listen_id->verbs,
+                                               local_errp);
             if (ret < 0) {
                 continue;
             }
         }
+        error_free(err);
         break;
     }
 
     rdma_freeaddrinfo(res);
     if (!e) {
-        if (errp && !*errp) {
+        if (err) {
+            error_propagate(errp, err);
+        } else {
             error_setg(errp, "RDMA ERROR: Error: could not rdma_bind_addr!");
         }
         goto err_dest_init_bind_addr;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 33/52] migration/rdma: Drop "@errp is clear" guards around error_setg()
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (31 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 32/52] migration/rdma: Fix error handling around rdma_getaddrinfo() Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-25  7:32   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 34/52] migration/rdma: Convert qemu_rdma_exchange_recv() to Error Markus Armbruster
                   ` (20 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

These guards are all redundant now.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 164 +++++++++++++++--------------------------------
 1 file changed, 51 insertions(+), 113 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index d29affe410..c88cd1f468 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -852,10 +852,8 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
 
             if (ibv_query_port(verbs, 1, &port_attr)) {
                 ibv_close_device(verbs);
-                if (errp && !*errp) {
-                    error_setg(errp,
-                               "RDMA ERROR: Could not query initial IB port");
-                }
+                error_setg(errp,
+                           "RDMA ERROR: Could not query initial IB port");
                 return -1;
             }
 
@@ -878,12 +876,10 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
                                 " migrate over the IB fabric until the kernel "
                                 " fixes the bug.\n");
             } else {
-                if (errp && !*errp) {
-                    error_setg(errp, "RDMA ERROR: "
-                               "You only have RoCE / iWARP devices in your systems"
-                               " and your management software has specified '[::]'"
-                               ", but IPv6 over RoCE / iWARP is not supported in Linux.");
-                }
+                error_setg(errp, "RDMA ERROR: "
+                           "You only have RoCE / iWARP devices in your systems"
+                           " and your management software has specified '[::]'"
+                           ", but IPv6 over RoCE / iWARP is not supported in Linux.");
                 return -1;
             }
         }
@@ -899,18 +895,14 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
 
     /* IB ports start with 1, not 0 */
     if (ibv_query_port(verbs, 1, &port_attr)) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: Could not query initial IB port");
-        }
+        error_setg(errp, "RDMA ERROR: Could not query initial IB port");
         return -1;
     }
 
     if (port_attr.link_layer == IBV_LINK_LAYER_ETHERNET) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: "
-                       "Linux kernel's RoCE / iWARP does not support IPv6 "
-                       "(but patches on linux-rdma in progress)");
-        }
+        error_setg(errp, "RDMA ERROR: "
+                   "Linux kernel's RoCE / iWARP does not support IPv6 "
+                   "(but patches on linux-rdma in progress)");
         return -1;
     }
 
@@ -935,27 +927,21 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
     struct rdma_addrinfo *e;
 
     if (rdma->host == NULL || !strcmp(rdma->host, "")) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: RDMA hostname has not been set");
-        }
+        error_setg(errp, "RDMA ERROR: RDMA hostname has not been set");
         return -1;
     }
 
     /* create CM channel */
     rdma->channel = rdma_create_event_channel();
     if (!rdma->channel) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not create CM channel");
-        }
+        error_setg(errp, "RDMA ERROR: could not create CM channel");
         return -1;
     }
 
     /* create CM id */
     ret = rdma_create_id(rdma->channel, &rdma->cm_id, NULL, RDMA_PS_TCP);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not create channel id");
-        }
+        error_setg(errp, "RDMA ERROR: could not create channel id");
         goto err_resolve_create_id;
     }
 
@@ -964,10 +950,8 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
 
     ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
     if (ret) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
-                       rdma->host);
-        }
+        error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
+                   rdma->host);
         goto err_resolve_get_addr;
     }
 
@@ -1009,18 +993,14 @@ route:
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not perform event_addr_resolved");
-        }
+        error_setg(errp, "RDMA ERROR: could not perform event_addr_resolved");
         goto err_resolve_get_addr;
     }
 
     if (cm_event->event != RDMA_CM_EVENT_ADDR_RESOLVED) {
-        if (errp && !*errp) {
-            error_setg(errp,
-                       "RDMA ERROR: result not equal to event_addr_resolved %s",
-                       rdma_event_str(cm_event->event));
-        }
+        error_setg(errp,
+                   "RDMA ERROR: result not equal to event_addr_resolved %s",
+                   rdma_event_str(cm_event->event));
         error_report("rdma_resolve_addr");
         rdma_ack_cm_event(cm_event);
         goto err_resolve_get_addr;
@@ -1030,25 +1010,19 @@ route:
     /* resolve route */
     ret = rdma_resolve_route(rdma->cm_id, RDMA_RESOLVE_TIMEOUT_MS);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not resolve rdma route");
-        }
+        error_setg(errp, "RDMA ERROR: could not resolve rdma route");
         goto err_resolve_get_addr;
     }
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not perform event_route_resolved");
-        }
+        error_setg(errp, "RDMA ERROR: could not perform event_route_resolved");
         goto err_resolve_get_addr;
     }
     if (cm_event->event != RDMA_CM_EVENT_ROUTE_RESOLVED) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: "
-                       "result not equal to event_route_resolved: %s",
-                       rdma_event_str(cm_event->event));
-        }
+        error_setg(errp, "RDMA ERROR: "
+                   "result not equal to event_route_resolved: %s",
+                   rdma_event_str(cm_event->event));
         rdma_ack_cm_event(cm_event);
         goto err_resolve_get_addr;
     }
@@ -2479,20 +2453,16 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
 
     ret = qemu_rdma_alloc_pd_cq(rdma);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: "
-                       "rdma migration: error allocating pd and cq! Your mlock()"
-                       " limits may be too low. Please check $ ulimit -a # and "
-                       "search for 'ulimit -l' in the output");
-        }
+        error_setg(errp, "RDMA ERROR: "
+                   "rdma migration: error allocating pd and cq! Your mlock()"
+                   " limits may be too low. Please check $ ulimit -a # and "
+                   "search for 'ulimit -l' in the output");
         goto err_rdma_source_init;
     }
 
     ret = qemu_rdma_alloc_qp(rdma);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: rdma migration: error allocating qp!");
-        }
+        error_setg(errp, "RDMA ERROR: rdma migration: error allocating qp!");
         goto err_rdma_source_init;
     }
 
@@ -2509,11 +2479,9 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
         if (ret < 0) {
-            if (errp && !*errp) {
-                error_setg(errp,
-                           "RDMA ERROR: rdma migration: error registering %d control!",
-                           idx);
-            }
+            error_setg(errp,
+                       "RDMA ERROR: rdma migration: error registering %d control!",
+                       idx);
             goto err_rdma_source_init;
         }
     }
@@ -2541,29 +2509,21 @@ static int qemu_get_cm_event_timeout(RDMAContext *rdma,
     } while (ret < 0 && errno == EINTR);
 
     if (ret == 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: poll cm event timeout");
-        }
+        error_setg(errp, "RDMA ERROR: poll cm event timeout");
         return -1;
     } else if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: failed to poll cm event, errno=%i",
-                       errno);
-        }
+        error_setg(errp, "RDMA ERROR: failed to poll cm event, errno=%i",
+                   errno);
         return -1;
     } else if (poll_fd.revents & POLLIN) {
         if (rdma_get_cm_event(rdma->channel, cm_event) < 0) {
-            if (errp && !*errp) {
-                error_setg(errp, "RDMA ERROR: failed to get cm event");
-            }
+            error_setg(errp, "RDMA ERROR: failed to get cm event");
             return -1;
         }
         return 0;
     } else {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: no POLLIN event, revent=%x",
-                       poll_fd.revents);
-        }
+        error_setg(errp, "RDMA ERROR: no POLLIN event, revent=%x",
+                   poll_fd.revents);
         return -1;
     }
 }
@@ -2596,18 +2556,14 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
 
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: posting second control recv");
-        }
+        error_setg(errp, "RDMA ERROR: posting second control recv");
         goto err_rdma_source_connect;
     }
 
     ret = rdma_connect(rdma->cm_id, &conn_param);
     if (ret < 0) {
         perror("rdma_connect");
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: connecting to destination!");
-        }
+        error_setg(errp, "RDMA ERROR: connecting to destination!");
         goto err_rdma_source_connect;
     }
 
@@ -2616,9 +2572,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
     } else {
         ret = rdma_get_cm_event(rdma->channel, &cm_event);
         if (ret < 0) {
-            if (errp && !*errp) {
-                error_setg(errp, "RDMA ERROR: failed to get cm event");
-            }
+            error_setg(errp, "RDMA ERROR: failed to get cm event");
         }
     }
     if (ret < 0) {
@@ -2628,9 +2582,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
 
     if (cm_event->event != RDMA_CM_EVENT_ESTABLISHED) {
         error_report("rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect");
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: connecting to destination!");
-        }
+        error_setg(errp, "RDMA ERROR: connecting to destination!");
         rdma_ack_cm_event(cm_event);
         goto err_rdma_source_connect;
     }
@@ -2678,18 +2630,14 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
     }
 
     if (!rdma->host || !rdma->host[0]) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: RDMA host is not set!");
-        }
+        error_setg(errp, "RDMA ERROR: RDMA host is not set!");
         rdma->errored = true;
         return -1;
     }
     /* create CM channel */
     rdma->channel = rdma_create_event_channel();
     if (!rdma->channel) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not create rdma event channel");
-        }
+        error_setg(errp, "RDMA ERROR: could not create rdma event channel");
         rdma->errored = true;
         return -1;
     }
@@ -2697,9 +2645,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
     /* create CM id */
     ret = rdma_create_id(rdma->channel, &listen_id, NULL, RDMA_PS_TCP);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not create cm_id!");
-        }
+        error_setg(errp, "RDMA ERROR: could not create cm_id!");
         goto err_dest_init_create_listen_id;
     }
 
@@ -2708,19 +2654,15 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 
     ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
     if (ret) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
-                       rdma->host);
-        }
+        error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
+                   rdma->host);
         goto err_dest_init_bind_addr;
     }
 
     ret = rdma_set_option(listen_id, RDMA_OPTION_ID, RDMA_OPTION_ID_REUSEADDR,
                           &reuse, sizeof reuse);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: Error: could not set REUSEADDR option");
-        }
+        error_setg(errp, "RDMA ERROR: Error: could not set REUSEADDR option");
         goto err_dest_init_bind_addr;
     }
 
@@ -2804,10 +2746,8 @@ static RDMAContext *qemu_rdma_data_init(const char *host_port, Error **errp)
         rdma->host = g_strdup(addr->host);
         rdma->host_port = g_strdup(host_port);
     } else {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: bad RDMA migration address '%s'",
-                       host_port);
-        }
+        error_setg(errp, "RDMA ERROR: bad RDMA migration address '%s'",
+                   host_port);
         g_free(rdma);
         rdma = NULL;
     }
@@ -4193,9 +4133,7 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
     ret = rdma_listen(rdma->listen_id, 5);
 
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: listening on socket!");
-        }
+        error_setg(errp, "RDMA ERROR: listening on socket!");
         goto cleanup_rdma;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 34/52] migration/rdma: Convert qemu_rdma_exchange_recv() to Error
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (32 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 33/52] migration/rdma: Drop "@errp is clear" guards around error_setg() Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-26  1:37   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 35/52] migration/rdma: Convert qemu_rdma_exchange_send() " Markus Armbruster
                   ` (19 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qio_channel_rdma_readv() violates this principle: it calls
error_report() via qemu_rdma_exchange_recv().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up by converting qemu_rdma_exchange_recv() to Error.

Necessitates setting an error when qemu_rdma_exchange_get_response()
failed.  Since this error will go away later in this series, simply
use "FIXME temporary error message" there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index c88cd1f468..50546b3a27 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1957,7 +1957,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
  * control-channel message.
  */
 static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
-                                   uint32_t expecting)
+                                   uint32_t expecting, Error **errp)
 {
     RDMAControlHeader ready = {
                                 .len = 0,
@@ -1972,7 +1972,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
     ret = qemu_rdma_post_send_control(rdma, NULL, &ready);
 
     if (ret < 0) {
-        error_report("Failed to send control buffer!");
+        error_setg(errp, "Failed to send control buffer!");
         return -1;
     }
 
@@ -1983,6 +1983,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
                                           expecting, RDMA_WRID_READY);
 
     if (ret < 0) {
+        error_setg(errp, "FIXME temporary error message");
         return -1;
     }
 
@@ -1993,7 +1994,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
      */
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
     if (ret < 0) {
-        error_report("rdma migration: error posting second control recv!");
+        error_setg(errp, "rdma migration: error posting second control recv!");
         return -1;
     }
 
@@ -2908,11 +2909,11 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
         /* We've got nothing at all, so lets wait for
          * more to arrive
          */
-        ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_QEMU_FILE);
+        ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_QEMU_FILE,
+                                      errp);
 
         if (ret < 0) {
             rdma->errored = true;
-            error_setg(errp, "qemu_rdma_exchange_recv failed");
             return -1;
         }
 
@@ -3536,6 +3537,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
     RDMAControlHeader blocks = { .type = RDMA_CONTROL_RAM_BLOCKS_RESULT,
                                  .repeat = 1 };
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
+    Error *err = NULL;
     RDMAContext *rdma;
     RDMALocalBlocks *local;
     RDMAControlHeader head;
@@ -3565,9 +3567,10 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
     do {
         trace_qemu_rdma_registration_handle_wait();
 
-        ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_NONE);
+        ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_NONE, &err);
 
         if (ret < 0) {
+            error_report_err(err);
             break;
         }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 35/52] migration/rdma: Convert qemu_rdma_exchange_send() to Error
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (33 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 34/52] migration/rdma: Convert qemu_rdma_exchange_recv() to Error Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-26  1:42   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 36/52] migration/rdma: Convert qemu_rdma_exchange_get_response() " Markus Armbruster
                   ` (18 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qio_channel_rdma_writev() violates this principle: it calls
error_report() via qemu_rdma_exchange_send().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up by converting qemu_rdma_exchange_send() to Error.

Necessitates setting an error when qemu_rdma_post_recv_control(),
callback(), or qemu_rdma_exchange_get_response() failed.  Since these
errors will go away later in this series, simply use "FIXME temporary
error message" there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 40 +++++++++++++++++++++++++++-------------
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 50546b3a27..c1bfc20824 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -518,7 +518,8 @@ static void network_to_result(RDMARegisterResult *result)
 static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    uint8_t *data, RDMAControlHeader *resp,
                                    int *resp_idx,
-                                   int (*callback)(RDMAContext *rdma));
+                                   int (*callback)(RDMAContext *rdma),
+                                   Error **errp);
 
 static inline uint64_t ram_chunk_index(const uint8_t *start,
                                        const uint8_t *host)
@@ -1365,6 +1366,8 @@ static int qemu_rdma_reg_control(RDMAContext *rdma, int idx)
  */
 static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
 {
+    Error *err = NULL;
+
     while (rdma->unregistrations[rdma->unregister_current]) {
         int ret;
         uint64_t wr_id = rdma->unregistrations[rdma->unregister_current];
@@ -1422,8 +1425,9 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
         reg.key.chunk = chunk;
         register_to_network(rdma, &reg);
         ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
-                                &resp, NULL, NULL);
+                                      &resp, NULL, NULL, &err);
         if (ret < 0) {
+            error_report_err(err);
             return -1;
         }
 
@@ -1872,7 +1876,8 @@ static void qemu_rdma_move_header(RDMAContext *rdma, int idx,
 static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    uint8_t *data, RDMAControlHeader *resp,
                                    int *resp_idx,
-                                   int (*callback)(RDMAContext *rdma))
+                                   int (*callback)(RDMAContext *rdma),
+                                   Error **errp)
 {
     int ret;
 
@@ -1885,6 +1890,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
         ret = qemu_rdma_exchange_get_response(rdma,
                                     &resp, RDMA_CONTROL_READY, RDMA_WRID_READY);
         if (ret < 0) {
+            error_setg(errp, "FIXME temporary error message");
             return -1;
         }
     }
@@ -1895,7 +1901,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
     if (resp) {
         ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_DATA);
         if (ret < 0) {
-            error_report("rdma migration: error posting"
+            error_setg(errp, "rdma migration: error posting"
                     " extra control recv for anticipated result!");
             return -1;
         }
@@ -1906,7 +1912,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
      */
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
     if (ret < 0) {
-        error_report("rdma migration: error posting first control recv!");
+        error_setg(errp, "rdma migration: error posting first control recv!");
         return -1;
     }
 
@@ -1916,7 +1922,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
     ret = qemu_rdma_post_send_control(rdma, data, head);
 
     if (ret < 0) {
-        error_report("Failed to send control buffer!");
+        error_setg(errp, "Failed to send control buffer!");
         return -1;
     }
 
@@ -1928,6 +1934,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
             trace_qemu_rdma_exchange_send_issue_callback();
             ret = callback(rdma);
             if (ret < 0) {
+                error_setg(errp, "FIXME temporary error message");
                 return -1;
             }
         }
@@ -1937,6 +1944,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                               resp->type, RDMA_WRID_DATA);
 
         if (ret < 0) {
+            error_setg(errp, "FIXME temporary error message");
             return -1;
         }
 
@@ -2011,6 +2019,7 @@ static int qemu_rdma_write_one(QEMUFile *f, RDMAContext *rdma,
                                int current_index, uint64_t current_addr,
                                uint64_t length)
 {
+    Error *err = NULL;
     struct ibv_sge sge;
     struct ibv_send_wr send_wr = { 0 };
     struct ibv_send_wr *bad_wr;
@@ -2096,9 +2105,10 @@ retry:
 
                 compress_to_network(rdma, &comp);
                 ret = qemu_rdma_exchange_send(rdma, &head,
-                                (uint8_t *) &comp, NULL, NULL, NULL);
+                                (uint8_t *) &comp, NULL, NULL, NULL, &err);
 
                 if (ret < 0) {
+                    error_report_err(err);
                     return -1;
                 }
 
@@ -2124,8 +2134,9 @@ retry:
 
             register_to_network(rdma, &reg);
             ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
-                                    &resp, &reg_result_idx, NULL);
+                                    &resp, &reg_result_idx, NULL, &err);
             if (ret < 0) {
+                error_report_err(err);
                 return -1;
             }
 
@@ -2815,11 +2826,11 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
             head.len = len;
             head.type = RDMA_CONTROL_QEMU_FILE;
 
-            ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
+            ret = qemu_rdma_exchange_send(rdma, &head,
+                                          data, NULL, NULL, NULL, errp);
 
             if (ret < 0) {
                 rdma->errored = true;
-                error_setg(errp, "qemu_rdma_exchange_send failed");
                 return -1;
             }
 
@@ -3886,6 +3897,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
                                        uint64_t flags, void *data)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
+    Error *err = NULL;
     RDMAContext *rdma;
     RDMAControlHeader head = { .len = 0, .repeat = 1 };
     int ret;
@@ -3929,9 +3941,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
          */
         ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp,
                     &reg_result_idx, rdma->pin_all ?
-                    qemu_rdma_reg_whole_ram_blocks : NULL);
+                    qemu_rdma_reg_whole_ram_blocks : NULL,
+                    &err);
         if (ret < 0) {
-            fprintf(stderr, "receiving remote info!");
+            error_report_err(err);
             return -1;
         }
 
@@ -3982,9 +3995,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
     trace_qemu_rdma_registration_stop(flags);
 
     head.type = RDMA_CONTROL_REGISTER_FINISHED;
-    ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
+    ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL, &err);
 
     if (ret < 0) {
+        error_report_err(err);
         goto err;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 36/52] migration/rdma: Convert qemu_rdma_exchange_get_response() to Error
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (34 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 35/52] migration/rdma: Convert qemu_rdma_exchange_send() " Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-26  1:45   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 37/52] migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() " Markus Armbruster
                   ` (17 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_exchange_send() and qemu_rdma_exchange_recv() violate this
principle: they call error_report() via
qemu_rdma_exchange_get_response().  I elected not to investigate how
callers handle the error, i.e. precise impact is not known.

Clean this up by converting qemu_rdma_exchange_get_response() to
Error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index c1bfc20824..34f05dd541 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1803,14 +1803,15 @@ static int qemu_rdma_post_recv_control(RDMAContext *rdma, int idx)
  * Block and wait for a RECV control channel message to arrive.
  */
 static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
-                RDMAControlHeader *head, uint32_t expecting, int idx)
+                RDMAControlHeader *head, uint32_t expecting, int idx,
+                Error **errp)
 {
     uint32_t byte_len;
     int ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RECV_CONTROL + idx,
                                        &byte_len);
 
     if (ret < 0) {
-        error_report("rdma migration: recv polling control error!");
+        error_setg(errp, "rdma migration: recv polling control error!");
         return -1;
     }
 
@@ -1823,7 +1824,7 @@ static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
         trace_qemu_rdma_exchange_get_response_none(control_desc(head->type),
                                              head->type);
     } else if (head->type != expecting || head->type == RDMA_CONTROL_ERROR) {
-        error_report("Was expecting a %s (%d) control message"
+        error_setg(errp, "Was expecting a %s (%d) control message"
                 ", but got: %s (%d), length: %d",
                 control_desc(expecting), expecting,
                 control_desc(head->type), head->type, head->len);
@@ -1833,11 +1834,12 @@ static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
         return -1;
     }
     if (head->len > RDMA_CONTROL_MAX_BUFFER - sizeof(*head)) {
-        error_report("too long length: %d", head->len);
+        error_setg(errp, "too long length: %d", head->len);
         return -1;
     }
     if (sizeof(*head) + head->len != byte_len) {
-        error_report("Malformed length: %d byte_len %d", head->len, byte_len);
+        error_setg(errp, "Malformed length: %d byte_len %d",
+                   head->len, byte_len);
         return -1;
     }
 
@@ -1887,10 +1889,10 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
      */
     if (rdma->control_ready_expected) {
         RDMAControlHeader resp;
-        ret = qemu_rdma_exchange_get_response(rdma,
-                                    &resp, RDMA_CONTROL_READY, RDMA_WRID_READY);
+        ret = qemu_rdma_exchange_get_response(rdma, &resp,
+                                              RDMA_CONTROL_READY,
+                                              RDMA_WRID_READY, errp);
         if (ret < 0) {
-            error_setg(errp, "FIXME temporary error message");
             return -1;
         }
     }
@@ -1941,10 +1943,10 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
 
         trace_qemu_rdma_exchange_send_waiting(control_desc(resp->type));
         ret = qemu_rdma_exchange_get_response(rdma, resp,
-                                              resp->type, RDMA_WRID_DATA);
+                                              resp->type, RDMA_WRID_DATA,
+                                              errp);
 
         if (ret < 0) {
-            error_setg(errp, "FIXME temporary error message");
             return -1;
         }
 
@@ -1988,10 +1990,9 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
      * Block and wait for the message.
      */
     ret = qemu_rdma_exchange_get_response(rdma, head,
-                                          expecting, RDMA_WRID_READY);
+                                          expecting, RDMA_WRID_READY, errp);
 
     if (ret < 0) {
-        error_setg(errp, "FIXME temporary error message");
         return -1;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 37/52] migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() to Error
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (35 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 36/52] migration/rdma: Convert qemu_rdma_exchange_get_response() " Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-26  1:51   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 38/52] migration/rdma: Convert qemu_rdma_write_flush() " Markus Armbruster
                   ` (16 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_exchange_send() violates this principle: it calls
error_report() via callback qemu_rdma_reg_whole_ram_blocks().  I
elected not to investigate how callers handle the error, i.e. precise
impact is not known.

Clean this up by converting the callback to Error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 34f05dd541..f1cd659a1f 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -518,7 +518,8 @@ static void network_to_result(RDMARegisterResult *result)
 static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    uint8_t *data, RDMAControlHeader *resp,
                                    int *resp_idx,
-                                   int (*callback)(RDMAContext *rdma),
+                                   int (*callback)(RDMAContext *rdma,
+                                                   Error **errp),
                                    Error **errp);
 
 static inline uint64_t ram_chunk_index(const uint8_t *start,
@@ -1175,7 +1176,7 @@ static void qemu_rdma_advise_prefetch_mr(struct ibv_pd *pd, uint64_t addr,
 #endif
 }
 
-static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
+static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma, Error **errp)
 {
     int i;
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
@@ -1210,16 +1211,16 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
         }
 
         if (!local->block[i].mr) {
-            perror("Failed to register local dest ram block!");
-            break;
+            error_setg_errno(errp, errno,
+                             "Failed to register local dest ram block!");
+            goto err;
         }
         rdma->total_registrations++;
     }
 
-    if (i >= local->nb_blocks) {
-        return 0;
-    }
+    return 0;
 
+err:
     for (i--; i >= 0; i--) {
         ibv_dereg_mr(local->block[i].mr);
         local->block[i].mr = NULL;
@@ -1878,7 +1879,8 @@ static void qemu_rdma_move_header(RDMAContext *rdma, int idx,
 static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    uint8_t *data, RDMAControlHeader *resp,
                                    int *resp_idx,
-                                   int (*callback)(RDMAContext *rdma),
+                                   int (*callback)(RDMAContext *rdma,
+                                                   Error **errp),
                                    Error **errp)
 {
     int ret;
@@ -1934,9 +1936,8 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
     if (resp) {
         if (callback) {
             trace_qemu_rdma_exchange_send_issue_callback();
-            ret = callback(rdma);
+            ret = callback(rdma, errp);
             if (ret < 0) {
-                error_setg(errp, "FIXME temporary error message");
                 return -1;
             }
         }
@@ -3633,10 +3634,9 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
             }
 
             if (rdma->pin_all) {
-                ret = qemu_rdma_reg_whole_ram_blocks(rdma);
+                ret = qemu_rdma_reg_whole_ram_blocks(rdma, &err);
                 if (ret < 0) {
-                    error_report("rdma migration: error dest "
-                                    "registering ram blocks");
+                    error_report_err(err);
                     goto err;
                 }
             }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 38/52] migration/rdma: Convert qemu_rdma_write_flush() to Error
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (36 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 37/52] migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() " Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-26  1:56   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() " Markus Armbruster
                   ` (15 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qio_channel_rdma_writev() violates this principle: it calls
error_report() via qemu_rdma_write_flush().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up by converting qemu_rdma_write_flush() to Error.

Necessitates setting an error when qemu_rdma_write_one() failed.
Since this error will go away later in this series, simply use "FIXME
temporary error message" there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index f1cd659a1f..c3c33fe242 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2238,7 +2238,8 @@ retry:
  * We support sending out multiple chunks at the same time.
  * Not all of them need to get signaled in the completion queue.
  */
-static int qemu_rdma_write_flush(QEMUFile *f, RDMAContext *rdma)
+static int qemu_rdma_write_flush(QEMUFile *f, RDMAContext *rdma,
+                                 Error **errp)
 {
     int ret;
 
@@ -2250,6 +2251,7 @@ static int qemu_rdma_write_flush(QEMUFile *f, RDMAContext *rdma)
             rdma->current_index, rdma->current_addr, rdma->current_length);
 
     if (ret < 0) {
+        error_setg(errp, "FIXME temporary error message");
         return -1;
     }
 
@@ -2323,6 +2325,7 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
                            uint64_t block_offset, uint64_t offset,
                            uint64_t len)
 {
+    Error *err = NULL;
     uint64_t current_addr = block_offset + offset;
     uint64_t index = rdma->current_index;
     uint64_t chunk = rdma->current_chunk;
@@ -2330,8 +2333,9 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
 
     /* If we cannot merge it, we flush the current buffer first. */
     if (!qemu_rdma_buffer_mergable(rdma, current_addr, len)) {
-        ret = qemu_rdma_write_flush(f, rdma);
+        ret = qemu_rdma_write_flush(f, rdma, &err);
         if (ret < 0) {
+            error_report_err(err);
             return -1;
         }
         rdma->current_length = 0;
@@ -2348,7 +2352,10 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
 
     /* flush it if buffer is too large */
     if (rdma->current_length >= RDMA_MERGE_MAX) {
-        return qemu_rdma_write_flush(f, rdma);
+        if (qemu_rdma_write_flush(f, rdma, &err) < 0) {
+            error_report_err(err);
+            return -1;
+        }
     }
 
     return 0;
@@ -2809,10 +2816,9 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
      * Push out any writes that
      * we're queued up for VM's ram.
      */
-    ret = qemu_rdma_write_flush(f, rdma);
+    ret = qemu_rdma_write_flush(f, rdma, errp);
     if (ret < 0) {
         rdma->errored = true;
-        error_setg(errp, "qemu_rdma_write_flush failed");
         return -1;
     }
 
@@ -2954,9 +2960,11 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
  */
 static int qemu_rdma_drain_cq(QEMUFile *f, RDMAContext *rdma)
 {
+    Error *err = NULL;
     int ret;
 
-    if (qemu_rdma_write_flush(f, rdma) < 0) {
+    if (qemu_rdma_write_flush(f, rdma, &err) < 0) {
+        error_report_err(err);
         return -1;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (37 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 38/52] migration/rdma: Convert qemu_rdma_write_flush() " Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-26  5:24   ` Zhijian Li (Fujitsu)
  2023-09-26  5:50   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 40/52] migration/rdma: Convert qemu_rdma_write() " Markus Armbruster
                   ` (14 subsequent siblings)
  53 siblings, 2 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_write_flush() violates this principle: it calls
error_report() via qemu_rdma_write_one().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up by converting qemu_rdma_write_one() to Error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index c3c33fe242..9b8cbadfcd 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2019,9 +2019,8 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
  */
 static int qemu_rdma_write_one(QEMUFile *f, RDMAContext *rdma,
                                int current_index, uint64_t current_addr,
-                               uint64_t length)
+                               uint64_t length, Error **errp)
 {
-    Error *err = NULL;
     struct ibv_sge sge;
     struct ibv_send_wr send_wr = { 0 };
     struct ibv_send_wr *bad_wr;
@@ -2075,7 +2074,7 @@ retry:
         ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL);
 
         if (ret < 0) {
-            error_report("Failed to Wait for previous write to complete "
+            error_setg(errp, "Failed to Wait for previous write to complete "
                     "block %d chunk %" PRIu64
                     " current %" PRIu64 " len %" PRIu64 " %d",
                     current_index, chunk, sge.addr, length, rdma->nb_sent);
@@ -2107,10 +2106,9 @@ retry:
 
                 compress_to_network(rdma, &comp);
                 ret = qemu_rdma_exchange_send(rdma, &head,
-                                (uint8_t *) &comp, NULL, NULL, NULL, &err);
+                                (uint8_t *) &comp, NULL, NULL, NULL, errp);
 
                 if (ret < 0) {
-                    error_report_err(err);
                     return -1;
                 }
 
@@ -2136,9 +2134,8 @@ retry:
 
             register_to_network(rdma, &reg);
             ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
-                                    &resp, &reg_result_idx, NULL, &err);
+                                    &resp, &reg_result_idx, NULL, errp);
             if (ret < 0) {
-                error_report_err(err);
                 return -1;
             }
 
@@ -2146,7 +2143,7 @@ retry:
             if (qemu_rdma_register_and_get_keys(rdma, block, sge.addr,
                                                 &sge.lkey, NULL, chunk,
                                                 chunk_start, chunk_end)) {
-                error_report("cannot get lkey");
+                error_setg(errp, "cannot get lkey");
                 return -1;
             }
 
@@ -2165,7 +2162,7 @@ retry:
             if (qemu_rdma_register_and_get_keys(rdma, block, sge.addr,
                                                 &sge.lkey, NULL, chunk,
                                                 chunk_start, chunk_end)) {
-                error_report("cannot get lkey!");
+                error_setg(errp, "cannot get lkey!");
                 return -1;
             }
         }
@@ -2177,7 +2174,7 @@ retry:
         if (qemu_rdma_register_and_get_keys(rdma, block, sge.addr,
                                                      &sge.lkey, NULL, chunk,
                                                      chunk_start, chunk_end)) {
-            error_report("cannot get lkey!");
+            error_setg(errp, "cannot get lkey!");
             return -1;
         }
     }
@@ -2211,7 +2208,7 @@ retry:
         trace_qemu_rdma_write_one_queue_full();
         ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL);
         if (ret < 0) {
-            error_report("rdma migration: failed to make "
+            error_setg(errp, "rdma migration: failed to make "
                          "room in full send queue!");
             return -1;
         }
@@ -2219,7 +2216,7 @@ retry:
         goto retry;
 
     } else if (ret > 0) {
-        perror("rdma migration: post rdma write failed");
+        error_setg(errp, "rdma migration: post rdma write failed");
         return -1;
     }
 
@@ -2248,10 +2245,10 @@ static int qemu_rdma_write_flush(QEMUFile *f, RDMAContext *rdma,
     }
 
     ret = qemu_rdma_write_one(f, rdma,
-            rdma->current_index, rdma->current_addr, rdma->current_length);
+            rdma->current_index, rdma->current_addr, rdma->current_length,
+            errp);
 
     if (ret < 0) {
-        error_setg(errp, "FIXME temporary error message");
         return -1;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 40/52] migration/rdma: Convert qemu_rdma_write() to Error
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (38 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() " Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-26  5:25   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 41/52] migration/rdma: Convert qemu_rdma_post_send_control() " Markus Armbruster
                   ` (13 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Just for consistency with qemu_rdma_write_one() and
qemu_rdma_write_flush(), and for slightly simpler code.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 9b8cbadfcd..5bb78a6ad8 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2320,9 +2320,8 @@ static inline bool qemu_rdma_buffer_mergable(RDMAContext *rdma,
  */
 static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
                            uint64_t block_offset, uint64_t offset,
-                           uint64_t len)
+                           uint64_t len, Error **errp)
 {
-    Error *err = NULL;
     uint64_t current_addr = block_offset + offset;
     uint64_t index = rdma->current_index;
     uint64_t chunk = rdma->current_chunk;
@@ -2330,9 +2329,8 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
 
     /* If we cannot merge it, we flush the current buffer first. */
     if (!qemu_rdma_buffer_mergable(rdma, current_addr, len)) {
-        ret = qemu_rdma_write_flush(f, rdma, &err);
+        ret = qemu_rdma_write_flush(f, rdma, errp);
         if (ret < 0) {
-            error_report_err(err);
             return -1;
         }
         rdma->current_length = 0;
@@ -2349,10 +2347,7 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
 
     /* flush it if buffer is too large */
     if (rdma->current_length >= RDMA_MERGE_MAX) {
-        if (qemu_rdma_write_flush(f, rdma, &err) < 0) {
-            error_report_err(err);
-            return -1;
-        }
+        return qemu_rdma_write_flush(f, rdma, errp);
     }
 
     return 0;
@@ -3248,6 +3243,7 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
                                   size_t size, uint64_t *bytes_sent)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
+    Error *err = NULL;
     RDMAContext *rdma;
     int ret;
 
@@ -3273,9 +3269,9 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
      * is full, or the page doesn't belong to the current chunk,
      * an actual RDMA write will occur and a new chunk will be formed.
      */
-    ret = qemu_rdma_write(f, rdma, block_offset, offset, size);
+    ret = qemu_rdma_write(f, rdma, block_offset, offset, size, &err);
     if (ret < 0) {
-        error_report("rdma migration: write error");
+        error_report_err(err);
         goto err;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 41/52] migration/rdma: Convert qemu_rdma_post_send_control() to Error
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (39 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 40/52] migration/rdma: Convert qemu_rdma_write() " Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-26  5:29   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 42/52] migration/rdma: Convert qemu_rdma_post_recv_control() " Markus Armbruster
                   ` (12 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_exchange_send() violates this principle: it calls
error_report() via qemu_rdma_post_send_control().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up by converting qemu_rdma_post_send_control() to Error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 5bb78a6ad8..25caf67aac 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1720,7 +1720,8 @@ err_block_for_wrid:
  * containing some data and block until the post completes.
  */
 static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
-                                       RDMAControlHeader *head)
+                                       RDMAControlHeader *head,
+                                       Error **errp)
 {
     int ret;
     RDMAWorkRequestData *wr = &rdma->wr_data[RDMA_WRID_CONTROL];
@@ -1760,13 +1761,13 @@ static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
     ret = ibv_post_send(rdma->qp, &send_wr, &bad_wr);
 
     if (ret > 0) {
-        error_report("Failed to use post IB SEND for control");
+        error_setg(errp, "Failed to use post IB SEND for control");
         return -1;
     }
 
     ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_SEND_CONTROL, NULL);
     if (ret < 0) {
-        error_report("rdma migration: send polling control error");
+        error_setg(errp, "rdma migration: send polling control error");
         return -1;
     }
 
@@ -1923,10 +1924,9 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
     /*
      * Deliver the control message that was requested.
      */
-    ret = qemu_rdma_post_send_control(rdma, data, head);
+    ret = qemu_rdma_post_send_control(rdma, data, head, errp);
 
     if (ret < 0) {
-        error_setg(errp, "Failed to send control buffer!");
         return -1;
     }
 
@@ -1980,10 +1980,9 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
     /*
      * Inform the source that we're ready to receive a message.
      */
-    ret = qemu_rdma_post_send_control(rdma, NULL, &ready);
+    ret = qemu_rdma_post_send_control(rdma, NULL, &ready, errp);
 
     if (ret < 0) {
-        error_setg(errp, "Failed to send control buffer!");
         return -1;
     }
 
@@ -2355,6 +2354,7 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
 
 static void qemu_rdma_cleanup(RDMAContext *rdma)
 {
+    Error *err = NULL;
     int idx;
 
     if (rdma->cm_id && rdma->connected) {
@@ -2366,7 +2366,9 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
                                        .repeat = 1,
                                      };
             error_report("Early error. Sending error.");
-            qemu_rdma_post_send_control(rdma, NULL, &head);
+            if (qemu_rdma_post_send_control(rdma, NULL, &head, &err) < 0) {
+                error_report_err(err);
+            }
         }
 
         rdma_disconnect(rdma->cm_id);
@@ -3673,10 +3675,11 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
 
             ret = qemu_rdma_post_send_control(rdma,
-                                        (uint8_t *) rdma->dest_blocks, &blocks);
+                                    (uint8_t *) rdma->dest_blocks, &blocks,
+                                    &err);
 
             if (ret < 0) {
-                error_report("rdma migration: error sending remote info");
+                error_report_err(err);
                 goto err;
             }
 
@@ -3751,10 +3754,10 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
             }
 
             ret = qemu_rdma_post_send_control(rdma,
-                            (uint8_t *) results, &reg_resp);
+                            (uint8_t *) results, &reg_resp, &err);
 
             if (ret < 0) {
-                error_report("Failed to send control buffer");
+                error_report_err(err);
                 goto err;
             }
             break;
@@ -3786,10 +3789,10 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                                                        reg->key.chunk);
             }
 
-            ret = qemu_rdma_post_send_control(rdma, NULL, &unreg_resp);
+            ret = qemu_rdma_post_send_control(rdma, NULL, &unreg_resp, &err);
 
             if (ret < 0) {
-                error_report("Failed to send control buffer");
+                error_report_err(err);
                 goto err;
             }
             break;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 42/52] migration/rdma: Convert qemu_rdma_post_recv_control() to Error
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (40 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 41/52] migration/rdma: Convert qemu_rdma_post_send_control() " Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-26  5:31   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 43/52] migration/rdma: Convert qemu_rdma_alloc_pd_cq() " Markus Armbruster
                   ` (11 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Just for symmetry with qemu_rdma_post_send_control().  Error messages
lose detail I consider of no use to users.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 25caf67aac..a727aa35d1 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1778,7 +1778,8 @@ static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
  * Post a RECV work request in anticipation of some future receipt
  * of data on the control channel.
  */
-static int qemu_rdma_post_recv_control(RDMAContext *rdma, int idx)
+static int qemu_rdma_post_recv_control(RDMAContext *rdma, int idx,
+                                       Error **errp)
 {
     struct ibv_recv_wr *bad_wr;
     struct ibv_sge sge = {
@@ -1795,6 +1796,7 @@ static int qemu_rdma_post_recv_control(RDMAContext *rdma, int idx)
 
 
     if (ibv_post_recv(rdma->qp, &recv_wr, &bad_wr)) {
+        error_setg(errp, "error posting control recv");
         return -1;
     }
 
@@ -1904,10 +1906,8 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
      * If the user is expecting a response, post a WR in anticipation of it.
      */
     if (resp) {
-        ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_DATA);
+        ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_DATA, errp);
         if (ret < 0) {
-            error_setg(errp, "rdma migration: error posting"
-                    " extra control recv for anticipated result!");
             return -1;
         }
     }
@@ -1915,9 +1915,8 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
     /*
      * Post a WR to replace the one we just consumed for the READY message.
      */
-    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
+    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY, errp);
     if (ret < 0) {
-        error_setg(errp, "rdma migration: error posting first control recv!");
         return -1;
     }
 
@@ -2001,9 +2000,8 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
     /*
      * Post a new RECV work request to replace the one we just consumed.
      */
-    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
+    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY, errp);
     if (ret < 0) {
-        error_setg(errp, "rdma migration: error posting second control recv!");
         return -1;
     }
 
@@ -2569,9 +2567,8 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
 
     caps_to_network(&cap);
 
-    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
+    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY, errp);
     if (ret < 0) {
-        error_setg(errp, "RDMA ERROR: posting second control recv");
         goto err_rdma_source_connect;
     }
 
@@ -3370,6 +3367,7 @@ static void rdma_cm_poll_handler(void *opaque)
 
 static int qemu_rdma_accept(RDMAContext *rdma)
 {
+    Error *err = NULL;
     RDMACapabilities cap;
     struct rdma_conn_param conn_param = {
                                             .responder_resources = 2,
@@ -3506,9 +3504,9 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     rdma_ack_cm_event(cm_event);
     rdma->connected = true;
 
-    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
+    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY, &err);
     if (ret < 0) {
-        error_report("rdma migration: error posting second control recv");
+        error_report_err(err);
         goto err_rdma_dest_wait;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 43/52] migration/rdma: Convert qemu_rdma_alloc_pd_cq() to Error
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (41 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 42/52] migration/rdma: Convert qemu_rdma_post_recv_control() " Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-26  5:43   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 44/52] migration/rdma: Silence qemu_rdma_resolve_host() Markus Armbruster
                   ` (10 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_source_init() violates this principle: it calls
error_report() via qemu_rdma_alloc_pd_cq().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up by converting qemu_rdma_alloc_pd_cq() to Error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index a727aa35d1..41f0ae4ddb 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1046,19 +1046,19 @@ err_resolve_create_id:
 /*
  * Create protection domain and completion queues
  */
-static int qemu_rdma_alloc_pd_cq(RDMAContext *rdma)
+static int qemu_rdma_alloc_pd_cq(RDMAContext *rdma, Error **errp)
 {
     /* allocate pd */
     rdma->pd = ibv_alloc_pd(rdma->verbs);
     if (!rdma->pd) {
-        error_report("failed to allocate protection domain");
+        error_setg(errp, "failed to allocate protection domain");
         return -1;
     }
 
     /* create receive completion channel */
     rdma->recv_comp_channel = ibv_create_comp_channel(rdma->verbs);
     if (!rdma->recv_comp_channel) {
-        error_report("failed to allocate receive completion channel");
+        error_setg(errp, "failed to allocate receive completion channel");
         goto err_alloc_pd_cq;
     }
 
@@ -1068,21 +1068,21 @@ static int qemu_rdma_alloc_pd_cq(RDMAContext *rdma)
     rdma->recv_cq = ibv_create_cq(rdma->verbs, (RDMA_SIGNALED_SEND_MAX * 3),
                                   NULL, rdma->recv_comp_channel, 0);
     if (!rdma->recv_cq) {
-        error_report("failed to allocate receive completion queue");
+        error_setg(errp, "failed to allocate receive completion queue");
         goto err_alloc_pd_cq;
     }
 
     /* create send completion channel */
     rdma->send_comp_channel = ibv_create_comp_channel(rdma->verbs);
     if (!rdma->send_comp_channel) {
-        error_report("failed to allocate send completion channel");
+        error_setg(errp, "failed to allocate send completion channel");
         goto err_alloc_pd_cq;
     }
 
     rdma->send_cq = ibv_create_cq(rdma->verbs, (RDMA_SIGNALED_SEND_MAX * 3),
                                   NULL, rdma->send_comp_channel, 0);
     if (!rdma->send_cq) {
-        error_report("failed to allocate send completion queue");
+        error_setg(errp, "failed to allocate send completion queue");
         goto err_alloc_pd_cq;
     }
 
@@ -2451,6 +2451,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
 
 static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
 {
+    ERRP_GUARD();
     int ret, idx;
 
     /*
@@ -2464,12 +2465,12 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
         goto err_rdma_source_init;
     }
 
-    ret = qemu_rdma_alloc_pd_cq(rdma);
+    ret = qemu_rdma_alloc_pd_cq(rdma, errp);
     if (ret < 0) {
-        error_setg(errp, "RDMA ERROR: "
-                   "rdma migration: error allocating pd and cq! Your mlock()"
-                   " limits may be too low. Please check $ ulimit -a # and "
-                   "search for 'ulimit -l' in the output");
+        error_append_hint(errp,
+                          "Your mlock() limits may be too low. "
+                          "Please check $ ulimit -a # and "
+                          "search for 'ulimit -l' in the output\n");
         goto err_rdma_source_init;
     }
 
@@ -3450,9 +3451,9 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 
     qemu_rdma_dump_id("dest_init", verbs);
 
-    ret = qemu_rdma_alloc_pd_cq(rdma);
+    ret = qemu_rdma_alloc_pd_cq(rdma, &err);
     if (ret < 0) {
-        error_report("rdma migration: error allocating pd and cq!");
+        error_report_err(err);
         goto err_rdma_dest_wait;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 44/52] migration/rdma: Silence qemu_rdma_resolve_host()
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (42 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 43/52] migration/rdma: Convert qemu_rdma_alloc_pd_cq() " Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-26  5:44   ` Zhijian Li (Fujitsu)
  2023-09-18 14:41 ` [PATCH 45/52] migration/rdma: Silence qemu_rdma_connect() Markus Armbruster
                   ` (9 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_resolve_host() violates this principle: it calls
error_report().

Clean this up: drop error_report().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 41f0ae4ddb..0e365db06a 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1003,7 +1003,6 @@ route:
         error_setg(errp,
                    "RDMA ERROR: result not equal to event_addr_resolved %s",
                    rdma_event_str(cm_event->event));
-        error_report("rdma_resolve_addr");
         rdma_ack_cm_event(cm_event);
         goto err_resolve_get_addr;
     }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 45/52] migration/rdma: Silence qemu_rdma_connect()
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (43 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 44/52] migration/rdma: Silence qemu_rdma_resolve_host() Markus Armbruster
@ 2023-09-18 14:41 ` Markus Armbruster
  2023-09-26  6:00   ` Zhijian Li (Fujitsu)
  2023-09-18 14:42 ` [PATCH 46/52] migration/rdma: Silence qemu_rdma_reg_control() Markus Armbruster
                   ` (8 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_connect() violates this principle: it calls error_report()
and perror().  I elected not to investigate how callers handle the
error, i.e. precise impact is not known.

Clean this up: replace perror() by changing error_setg() to
error_setg_errno(), and drop error_report().  I believe the callers'
error reports suffice then.  If they don't, we need to convert to
Error instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 0e365db06a..bf4e67d68d 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2574,8 +2574,8 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
 
     ret = rdma_connect(rdma->cm_id, &conn_param);
     if (ret < 0) {
-        perror("rdma_connect");
-        error_setg(errp, "RDMA ERROR: connecting to destination!");
+        error_setg_errno(errp, errno,
+                         "RDMA ERROR: connecting to destination!");
         goto err_rdma_source_connect;
     }
 
@@ -2584,16 +2584,15 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
     } else {
         ret = rdma_get_cm_event(rdma->channel, &cm_event);
         if (ret < 0) {
-            error_setg(errp, "RDMA ERROR: failed to get cm event");
+            error_setg_errno(errp, errno,
+                             "RDMA ERROR: failed to get cm event");
         }
     }
     if (ret < 0) {
-        perror("rdma_get_cm_event after rdma_connect");
         goto err_rdma_source_connect;
     }
 
     if (cm_event->event != RDMA_CM_EVENT_ESTABLISHED) {
-        error_report("rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect");
         error_setg(errp, "RDMA ERROR: connecting to destination!");
         rdma_ack_cm_event(cm_event);
         goto err_rdma_source_connect;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 46/52] migration/rdma: Silence qemu_rdma_reg_control()
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (44 preceding siblings ...)
  2023-09-18 14:41 ` [PATCH 45/52] migration/rdma: Silence qemu_rdma_connect() Markus Armbruster
@ 2023-09-18 14:42 ` Markus Armbruster
  2023-09-26  6:00   ` Zhijian Li (Fujitsu)
  2023-09-18 14:42 ` [PATCH 47/52] migration/rdma: Don't report received completion events as error Markus Armbruster
                   ` (7 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_source_init() and qemu_rdma_accept() violate this principle:
they call error_report() via qemu_rdma_reg_control().  I elected not
to investigate how callers handle the error, i.e. precise impact is
not known.

Clean this up by dropping the error reporting from
qemu_rdma_reg_control().  I believe the callers' error reports
suffice.  If they don't, we need to convert to Error instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index bf4e67d68d..29ad8ae832 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1349,7 +1349,6 @@ static int qemu_rdma_reg_control(RDMAContext *rdma, int idx)
         rdma->total_registrations++;
         return 0;
     }
-    error_report("qemu_rdma_reg_control failed");
     return -1;
 }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 47/52] migration/rdma: Don't report received completion events as error
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (45 preceding siblings ...)
  2023-09-18 14:42 ` [PATCH 46/52] migration/rdma: Silence qemu_rdma_reg_control() Markus Armbruster
@ 2023-09-18 14:42 ` Markus Armbruster
  2023-09-26  6:06   ` Zhijian Li (Fujitsu)
  2023-09-18 14:42 ` [PATCH 48/52] migration/rdma: Silence qemu_rdma_block_for_wrid() Markus Armbruster
                   ` (6 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

When qemu_rdma_wait_comp_channel() receives an event from the
completion channel, it reports an error "receive cm event while wait
comp channel,cm event is T", where T is the numeric event type.
However, the function fails only when T is a disconnect or device
removal.  Events other than these two are not actually an error, and
reporting them as an error is wrong.  If we need to report them to the
user, we should use something else, and what to use depends on why we
need to report them to the user.

For now, report this error only when the function actually fails.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 29ad8ae832..cbf5e6b9a8 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1566,11 +1566,11 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
                         return -1;
                     }
 
-                    error_report("receive cm event while wait comp channel,"
-                                 "cm event is %d", cm_event->event);
                     if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
                         cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
                         rdma_ack_cm_event(cm_event);
+                        error_report("receive cm event while wait comp channel,"
+                                     "cm event is %d", cm_event->event);
                         return -1;
                     }
                     rdma_ack_cm_event(cm_event);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 48/52] migration/rdma: Silence qemu_rdma_block_for_wrid()
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (46 preceding siblings ...)
  2023-09-18 14:42 ` [PATCH 47/52] migration/rdma: Don't report received completion events as error Markus Armbruster
@ 2023-09-18 14:42 ` Markus Armbruster
  2023-09-26  6:17   ` Zhijian Li (Fujitsu)
  2023-09-18 14:42 ` [PATCH 49/52] migration/rdma: Silence qemu_rdma_register_and_get_keys() Markus Armbruster
                   ` (5 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_post_send_control(), qemu_rdma_exchange_get_response(), and
qemu_rdma_write_one() violate this principle: they call
error_report(), fprintf(stderr, ...), and perror() via
qemu_rdma_block_for_wrid(), qemu_rdma_poll(), and
qemu_rdma_wait_comp_channel().  I elected not to investigate how
callers handle the error, i.e. precise impact is not known.

Clean this up by dropping the error reporting from qemu_rdma_poll(),
qemu_rdma_wait_comp_channel(), and qemu_rdma_block_for_wrid().  I
believe the callers' error reports suffice.  If they don't, we need to
convert to Error instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index cbf5e6b9a8..99dccdeae5 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1467,17 +1467,12 @@ static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
     }
 
     if (ret < 0) {
-        error_report("ibv_poll_cq failed");
         return -1;
     }
 
     wr_id = wc.wr_id & RDMA_WRID_TYPE_MASK;
 
     if (wc.status != IBV_WC_SUCCESS) {
-        fprintf(stderr, "ibv_poll_cq wc.status=%d %s!\n",
-                        wc.status, ibv_wc_status_str(wc.status));
-        fprintf(stderr, "ibv_poll_cq wrid=%" PRIu64 "!\n", wr_id);
-
         return -1;
     }
 
@@ -1561,16 +1556,12 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
                 if (pfds[1].revents) {
                     ret = rdma_get_cm_event(rdma->channel, &cm_event);
                     if (ret < 0) {
-                        error_report("failed to get cm event while wait "
-                                     "completion channel");
                         return -1;
                     }
 
                     if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
                         cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
                         rdma_ack_cm_event(cm_event);
-                        error_report("receive cm event while wait comp channel,"
-                                     "cm event is %d", cm_event->event);
                         return -1;
                     }
                     rdma_ack_cm_event(cm_event);
@@ -1583,7 +1574,6 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
             default: /* Error of some type -
                       * I don't trust errno from qemu_poll_ns
                      */
-                error_report("%s: poll failed", __func__);
                 return -1;
             }
 
@@ -1667,7 +1657,6 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
 
         ret = ibv_get_cq_event(ch, &cq, &cq_ctx);
         if (ret < 0) {
-            perror("ibv_get_cq_event");
             goto err_block_for_wrid;
         }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 49/52] migration/rdma: Silence qemu_rdma_register_and_get_keys()
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (47 preceding siblings ...)
  2023-09-18 14:42 ` [PATCH 48/52] migration/rdma: Silence qemu_rdma_block_for_wrid() Markus Armbruster
@ 2023-09-18 14:42 ` Markus Armbruster
  2023-09-26  6:21   ` Zhijian Li (Fujitsu)
  2023-09-18 14:42 ` [PATCH 50/52] migration/rdma: Silence qemu_rdma_cleanup() Markus Armbruster
                   ` (4 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_write_one() violates this principle: it reports errors to
stderr via qemu_rdma_register_and_get_keys().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up: silence qemu_rdma_register_and_get_keys().  I believe
the caller's error reports suffice.  If they don't, we need to convert
to Error instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 99dccdeae5..d9f80ef390 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1314,15 +1314,6 @@ static int qemu_rdma_register_and_get_keys(RDMAContext *rdma,
         }
     }
     if (!block->pmr[chunk]) {
-        perror("Failed to register chunk!");
-        fprintf(stderr, "Chunk details: block: %d chunk index %d"
-                        " start %" PRIuPTR " end %" PRIuPTR
-                        " host %" PRIuPTR
-                        " local %" PRIuPTR " registrations: %d\n",
-                        block->index, chunk, (uintptr_t)chunk_start,
-                        (uintptr_t)chunk_end, host_addr,
-                        (uintptr_t)block->local_host_addr,
-                        rdma->total_registrations);
         return -1;
     }
     rdma->total_registrations++;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 50/52] migration/rdma: Silence qemu_rdma_cleanup()
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (48 preceding siblings ...)
  2023-09-18 14:42 ` [PATCH 49/52] migration/rdma: Silence qemu_rdma_register_and_get_keys() Markus Armbruster
@ 2023-09-18 14:42 ` Markus Armbruster
  2023-09-26 10:12   ` Zhijian Li (Fujitsu)
  2023-09-18 14:42 ` [PATCH 51/52] migration/rdma: Use error_report() & friends instead of stderr Markus Armbruster
                   ` (3 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_source_init(), qemu_rdma_connect(),
rdma_start_incoming_migration(), and rdma_start_outgoing_migration()
violate this principle: they call error_report() via
qemu_rdma_cleanup().

Moreover, qemu_rdma_cleanup() can't fail.  It is called on error
paths, and QIOChannel close and finalization.  Are the conditions it
reports really errors?  I doubt it.

Clean this up: silence qemu_rdma_cleanup().  I believe that's fine for
all these callers.  If it isn't, we need to convert to Error instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index d9f80ef390..be2db7946d 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2330,7 +2330,6 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
 
 static void qemu_rdma_cleanup(RDMAContext *rdma)
 {
-    Error *err = NULL;
     int idx;
 
     if (rdma->cm_id && rdma->connected) {
@@ -2341,10 +2340,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
                                        .type = RDMA_CONTROL_ERROR,
                                        .repeat = 1,
                                      };
-            error_report("Early error. Sending error.");
-            if (qemu_rdma_post_send_control(rdma, NULL, &head, &err) < 0) {
-                error_report_err(err);
-            }
+            qemu_rdma_post_send_control(rdma, NULL, &head, NULL);
         }
 
         rdma_disconnect(rdma->cm_id);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 51/52] migration/rdma: Use error_report() & friends instead of stderr
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (49 preceding siblings ...)
  2023-09-18 14:42 ` [PATCH 50/52] migration/rdma: Silence qemu_rdma_cleanup() Markus Armbruster
@ 2023-09-18 14:42 ` Markus Armbruster
  2023-09-26  6:35   ` Zhijian Li (Fujitsu)
  2023-09-18 14:42 ` [PATCH 52/52] migration/rdma: Fix how we show device details on open Markus Armbruster
                   ` (2 subsequent siblings)
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

error_report() obeys -msg, reports the current error location if any,
and reports to the current monitor if any.  Reporting to stderr
directly with fprintf() or perror() is wrong, because it loses all
this.

Fix the offenders.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 39 +++++++++++++++++++++------------------
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index be2db7946d..9e9904984e 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -871,12 +871,12 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
 
         if (roce_found) {
             if (ib_found) {
-                fprintf(stderr, "WARN: migrations may fail:"
-                                " IPv6 over RoCE / iWARP in linux"
-                                " is broken. But since you appear to have a"
-                                " mixed RoCE / IB environment, be sure to only"
-                                " migrate over the IB fabric until the kernel "
-                                " fixes the bug.\n");
+                warn_report("WARN: migrations may fail:"
+                            " IPv6 over RoCE / iWARP in linux"
+                            " is broken. But since you appear to have a"
+                            " mixed RoCE / IB environment, be sure to only"
+                            " migrate over the IB fabric until the kernel "
+                            " fixes the bug.");
             } else {
                 error_setg(errp, "RDMA ERROR: "
                            "You only have RoCE / iWARP devices in your systems"
@@ -1407,7 +1407,8 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
         block->remote_keys[chunk] = 0;
 
         if (ret != 0) {
-            perror("unregistration chunk failed");
+            error_report("unregistration chunk failed: %s",
+                         strerror(errno));
             return -1;
         }
         rdma->total_registrations--;
@@ -3751,7 +3752,8 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                 block->pmr[reg->key.chunk] = NULL;
 
                 if (ret != 0) {
-                    perror("rdma unregistration chunk failed");
+                    error_report("rdma unregistration chunk failed: %s",
+                                 strerror(errno));
                     goto err;
                 }
 
@@ -3940,10 +3942,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
          */
 
         if (local->nb_blocks != nb_dest_blocks) {
-            fprintf(stderr, "ram blocks mismatch (Number of blocks %d vs %d) "
-                    "Your QEMU command line parameters are probably "
-                    "not identical on both the source and destination.",
-                    local->nb_blocks, nb_dest_blocks);
+            error_report("ram blocks mismatch (Number of blocks %d vs %d)",
+                         local->nb_blocks, nb_dest_blocks);
+            error_printf("Your QEMU command line parameters are probably "
+                         "not identical on both the source and destination.");
             rdma->errored = true;
             return -1;
         }
@@ -3956,10 +3958,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
 
             /* We require that the blocks are in the same order */
             if (rdma->dest_blocks[i].length != local->block[i].length) {
-                fprintf(stderr, "Block %s/%d has a different length %" PRIu64
-                        "vs %" PRIu64, local->block[i].block_name, i,
-                        local->block[i].length,
-                        rdma->dest_blocks[i].length);
+                error_report("Block %s/%d has a different length %" PRIu64
+                             "vs %" PRIu64,
+                             local->block[i].block_name, i,
+                             local->block[i].length,
+                             rdma->dest_blocks[i].length);
                 rdma->errored = true;
                 return -1;
             }
@@ -4075,7 +4078,7 @@ static void rdma_accept_incoming_migration(void *opaque)
     ret = qemu_rdma_accept(rdma);
 
     if (ret < 0) {
-        fprintf(stderr, "RDMA ERROR: Migration initialization failed\n");
+        error_report("RDMA ERROR: Migration initialization failed");
         return;
     }
 
@@ -4087,7 +4090,7 @@ static void rdma_accept_incoming_migration(void *opaque)
 
     f = rdma_new_input(rdma);
     if (f == NULL) {
-        fprintf(stderr, "RDMA ERROR: could not open RDMA for input\n");
+        error_report("RDMA ERROR: could not open RDMA for input");
         qemu_rdma_cleanup(rdma);
         return;
     }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* [PATCH 52/52] migration/rdma: Fix how we show device details on open
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (50 preceding siblings ...)
  2023-09-18 14:42 ` [PATCH 51/52] migration/rdma: Use error_report() & friends instead of stderr Markus Armbruster
@ 2023-09-18 14:42 ` Markus Armbruster
  2023-09-26  6:49   ` Zhijian Li (Fujitsu)
  2023-09-19 16:49 ` [PATCH 00/52] migration/rdma: Error handling fixes Peter Xu
  2023-09-28 11:08 ` Markus Armbruster
  53 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-18 14:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras

qemu_rdma_dump_id() dumps RDMA device details to stdout.

rdma_start_outgoing_migration() calls it via qemu_rdma_source_init()
and qemu_rdma_resolve_host() to show source device details.
rdma_start_incoming_migration() arranges its call via
rdma_accept_incoming_migration() and qemu_rdma_accept() to show
destination device details.

Two issues:

1. rdma_start_outgoing_migration() can run in HMP context.  The
   information should arguably go the monitor, not stdout.

2. ibv_query_port() failure is reported as error.  Its callers remain
   unaware of this failure (qemu_rdma_dump_id() can't fail), so
   reporting this to the user as an error is problematic.

Use qemu_printf() instead of printf() and error_report().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 9e9904984e..8c84fbab7a 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -30,6 +30,7 @@
 #include "qemu/sockets.h"
 #include "qemu/bitmap.h"
 #include "qemu/coroutine.h"
+#include "qemu/qemu-print.h"
 #include "exec/memory.h"
 #include <sys/socket.h>
 #include <netdb.h>
@@ -742,24 +743,25 @@ static void qemu_rdma_dump_id(const char *who, struct ibv_context *verbs)
     struct ibv_port_attr port;
 
     if (ibv_query_port(verbs, 1, &port)) {
-        error_report("Failed to query port information");
+        qemu_printf("%s RDMA Device opened, but can't query port information",
+                    who);
         return;
     }
 
-    printf("%s RDMA Device opened: kernel name %s "
-           "uverbs device name %s, "
-           "infiniband_verbs class device path %s, "
-           "infiniband class device path %s, "
-           "transport: (%d) %s\n",
+    qemu_printf("%s RDMA Device opened: kernel name %s "
+                "uverbs device name %s, "
+                "infiniband_verbs class device path %s, "
+                "infiniband class device path %s, "
+                "transport: (%d) %s\n",
                 who,
                 verbs->device->name,
                 verbs->device->dev_name,
                 verbs->device->dev_path,
                 verbs->device->ibdev_path,
                 port.link_layer,
-                (port.link_layer == IBV_LINK_LAYER_INFINIBAND) ? "Infiniband" :
-                 ((port.link_layer == IBV_LINK_LAYER_ETHERNET)
-                    ? "Ethernet" : "Unknown"));
+                port.link_layer == IBV_LINK_LAYER_INFINIBAND ? "Infiniband"
+                : port.link_layer == IBV_LINK_LAYER_ETHERNET ? "Ethernet"
+                : "Unknown");
 }
 
 /*
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 174+ messages in thread

* Re: [PATCH 01/52] migration/rdma: Clean up qemu_rdma_poll()'s return type
  2023-09-18 14:41 ` [PATCH 01/52] migration/rdma: Clean up qemu_rdma_poll()'s return type Markus Armbruster
@ 2023-09-18 16:50   ` Fabiano Rosas
  2023-09-21  8:52   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Fabiano Rosas @ 2023-09-18 16:50 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras

Markus Armbruster <armbru@redhat.com> writes:

> qemu_rdma_poll()'s return type is uint64_t, even though it returns 0,
> -1, or @ret, which is int.  Its callers assign the return value to int
> variables, then check whether it's negative.  Unclean.
>
> Return int instead.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 02/52] migration/rdma: Clean up qemu_rdma_data_init()'s return type
  2023-09-18 14:41 ` [PATCH 02/52] migration/rdma: Clean up qemu_rdma_data_init()'s " Markus Armbruster
@ 2023-09-18 16:51   ` Fabiano Rosas
  2023-09-21  8:52   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Fabiano Rosas @ 2023-09-18 16:51 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras

Markus Armbruster <armbru@redhat.com> writes:

> qemu_rdma_data_init() return type is void *.  It actually returns
> RDMAContext *, and all its callers assign the value to an
> RDMAContext *.  Unclean.
>
> Return RDMAContext * instead.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 03/52] migration/rdma: Clean up rdma_delete_block()'s return type
  2023-09-18 14:41 ` [PATCH 03/52] migration/rdma: Clean up rdma_delete_block()'s " Markus Armbruster
@ 2023-09-18 16:53   ` Fabiano Rosas
  2023-09-21  8:53   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Fabiano Rosas @ 2023-09-18 16:53 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras

Markus Armbruster <armbru@redhat.com> writes:

> rdma_delete_block() always returns 0, which its only caller ignores.
> Return void instead.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 04/52] migration/rdma: Drop fragile wr_id formatting
  2023-09-18 14:41 ` [PATCH 04/52] migration/rdma: Drop fragile wr_id formatting Markus Armbruster
@ 2023-09-18 17:01   ` Fabiano Rosas
  2023-09-21  8:54   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Fabiano Rosas @ 2023-09-18 17:01 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras

Markus Armbruster <armbru@redhat.com> writes:

> wrid_desc[] uses 4001 pointers to map four integer values to strings.
>
> print_wrid() accesses wrid_desc[] out of bounds when passed a negative
> argument.  It returns null for values 2..1999 and 2001..3999.
>
> qemu_rdma_poll() and qemu_rdma_block_for_wrid() print wrid_desc[wr_id]
> and passes print_wrid(wr_id) to tracepoints.  Could conceivably crash
> trying to format a null string.  I believe access out of bounds is not
> possible.
>
> Not worth cleaning up.  Dumb down to show just numeric wr_id.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 05/52] migration/rdma: Consistently use uint64_t for work request IDs
  2023-09-18 14:41 ` [PATCH 05/52] migration/rdma: Consistently use uint64_t for work request IDs Markus Armbruster
@ 2023-09-18 17:03   ` Fabiano Rosas
  2023-09-19  5:39   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Fabiano Rosas @ 2023-09-18 17:03 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras

Markus Armbruster <armbru@redhat.com> writes:

> We use int instead of uint64_t in a few places.  Change them to
> uint64_t.
>
> This cleans up a comparison of signed qemu_rdma_block_for_wrid()
> parameter @wrid_requested with unsigned @wr_id.  Harmless, because the
> actual arguments are non-negative enumeration constants.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 06/52] migration/rdma: Clean up two more harmless signed vs. unsigned issues
  2023-09-18 14:41 ` [PATCH 06/52] migration/rdma: Clean up two more harmless signed vs. unsigned issues Markus Armbruster
@ 2023-09-18 17:10   ` Fabiano Rosas
  2023-09-20 13:11     ` Markus Armbruster
  0 siblings, 1 reply; 174+ messages in thread
From: Fabiano Rosas @ 2023-09-18 17:10 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras

Markus Armbruster <armbru@redhat.com> writes:

> qemu_rdma_exchange_get_response() compares int parameter @expecting
> with uint32_t head->type.  Actual arguments are non-negative
> enumeration constants, RDMAControlHeader uint32_t member type, or
> qemu_rdma_exchange_recv() int parameter expecting.  Actual arguments
> for the latter are non-negative enumeration constants.  Change both
> parameters to uint32_t.
>
> In qio_channel_rdma_readv(), loop control variable @i is ssize_t, and
> counts from 0 up to @niov, which is size_t.  Change @i to size_t.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>

just a comment...

>  static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
> -                                int expecting)
> +                                   uint32_t expecting)
>  {
>      RDMAControlHeader ready = {
>                                  .len = 0,
> @@ -2851,7 +2851,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>      RDMAContext *rdma;
>      RDMAControlHeader head;
>      int ret = 0;
> -    ssize_t i;
> +    size_t i;
>      size_t done = 0;

It seems the idea was for 'done' to be ssize_t like in
qio_channel_rdma_writev()



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 07/52] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage
  2023-09-18 14:41 ` [PATCH 07/52] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage Markus Armbruster
@ 2023-09-18 17:11   ` Fabiano Rosas
  2023-09-21  9:00   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Fabiano Rosas @ 2023-09-18 17:11 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras

Markus Armbruster <armbru@redhat.com> writes:

> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 08/52] migration/rdma: Fix qemu_rdma_accept() to return failure on errors
  2023-09-18 14:41 ` [PATCH 08/52] migration/rdma: Fix qemu_rdma_accept() to return failure on errors Markus Armbruster
@ 2023-09-18 17:15   ` Fabiano Rosas
  2023-09-21  9:07   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Fabiano Rosas @ 2023-09-18 17:15 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras

Markus Armbruster <armbru@redhat.com> writes:

> qemu_rdma_accept() returns 0 in some cases even when it didn't
> complete its job due to errors.  Impact is not obvious.  I figure the
> caller will soon fail again with a misleading error message.
>
> Fix it to return -1 on any failure.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 09/52] migration/rdma: Put @errp parameter last
  2023-09-18 14:41 ` [PATCH 09/52] migration/rdma: Put @errp parameter last Markus Armbruster
@ 2023-09-18 17:17   ` Fabiano Rosas
  2023-09-21  9:08   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Fabiano Rosas @ 2023-09-18 17:17 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras

Markus Armbruster <armbru@redhat.com> writes:

> include/qapi/error.h demands:
>
>  * - Functions that use Error to report errors have an Error **errp
>  *   parameter.  It should be the last parameter, except for functions
>  *   taking variable arguments.
>
> qemu_rdma_connect() does not conform.  Clean it up.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 10/52] migration/rdma: Eliminate error_propagate()
  2023-09-18 14:41 ` [PATCH 10/52] migration/rdma: Eliminate error_propagate() Markus Armbruster
@ 2023-09-18 17:20   ` Fabiano Rosas
  2023-09-21  9:31   ` Zhijian Li (Fujitsu)
  2023-09-27 16:20   ` Eric Blake
  2 siblings, 0 replies; 174+ messages in thread
From: Fabiano Rosas @ 2023-09-18 17:20 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras

Markus Armbruster <armbru@redhat.com> writes:

> When all we do with an Error we receive into a local variable is
> propagating to somewhere else, we can just as well receive it there
> right away.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 11/52] migration/rdma: Drop rdma_add_block() error handling
  2023-09-18 14:41 ` [PATCH 11/52] migration/rdma: Drop rdma_add_block() error handling Markus Armbruster
@ 2023-09-18 17:32   ` Fabiano Rosas
  2023-09-21  9:39   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Fabiano Rosas @ 2023-09-18 17:32 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras

Markus Armbruster <armbru@redhat.com> writes:

> rdma_add_block() can't fail.  Return void, and drop the unreachable
> error handling.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 12/52] migration/rdma: Drop qemu_rdma_search_ram_block() error handling
  2023-09-18 14:41 ` [PATCH 12/52] migration/rdma: Drop qemu_rdma_search_ram_block() " Markus Armbruster
@ 2023-09-18 17:35   ` Fabiano Rosas
  2023-09-20 13:11     ` Markus Armbruster
  2023-09-22  7:50   ` Zhijian Li (Fujitsu)
  1 sibling, 1 reply; 174+ messages in thread
From: Fabiano Rosas @ 2023-09-18 17:35 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras

Markus Armbruster <armbru@redhat.com> writes:

> qemu_rdma_search_ram_block() can't fail.  Return void, and drop the
> unreachable error handling.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  migration/rdma.c | 22 ++++++++--------------
>  1 file changed, 8 insertions(+), 14 deletions(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 2b0f9d52d8..98520a42b4 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1234,12 +1234,12 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
>   *
>   * This search cannot fail or the migration will fail.
>   */

This comment can be removed as well.

> -static int qemu_rdma_search_ram_block(RDMAContext *rdma,
> -                                      uintptr_t block_offset,
> -                                      uint64_t offset,
> -                                      uint64_t length,
> -                                      uint64_t *block_index,
> -                                      uint64_t *chunk_index)
> +static void qemu_rdma_search_ram_block(RDMAContext *rdma,
> +                                       uintptr_t block_offset,
> +                                       uint64_t offset,
> +                                       uint64_t length,
> +                                       uint64_t *block_index,
> +                                       uint64_t *chunk_index)
>  {
>      uint64_t current_addr = block_offset + offset;
>      RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
> @@ -1251,8 +1251,6 @@ static int qemu_rdma_search_ram_block(RDMAContext *rdma,
>      *block_index = block->index;
>      *chunk_index = ram_chunk_index(block->local_host_addr,
>                  block->local_host_addr + (current_addr - block->offset));
> -
> -    return 0;
>  }
>  
>  /*
> @@ -2321,12 +2319,8 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
>          rdma->current_length = 0;
>          rdma->current_addr = current_addr;
>  
> -        ret = qemu_rdma_search_ram_block(rdma, block_offset,
> -                                         offset, len, &index, &chunk);
> -        if (ret) {
> -            error_report("ram block search failed");
> -            return ret;
> -        }
> +        qemu_rdma_search_ram_block(rdma, block_offset,
> +                                   offset, len, &index, &chunk);
>          rdma->current_index = index;
>          rdma->current_chunk = chunk;
>      }


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 13/52] migration/rdma: Make qemu_rdma_buffer_mergable() return bool
  2023-09-18 14:41 ` [PATCH 13/52] migration/rdma: Make qemu_rdma_buffer_mergable() return bool Markus Armbruster
@ 2023-09-18 17:36   ` Fabiano Rosas
  2023-09-22  7:51   ` Zhijian Li (Fujitsu)
  2023-09-27 16:26   ` Eric Blake
  2 siblings, 0 replies; 174+ messages in thread
From: Fabiano Rosas @ 2023-09-18 17:36 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras

Markus Armbruster <armbru@redhat.com> writes:

> qemu_rdma_buffer_mergable() is semantically a predicate.  It returns
> int 0 or 1.  Return bool instead.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 14/52] migration/rdma: Use bool for two RDMAContext flags
  2023-09-18 14:41 ` [PATCH 14/52] migration/rdma: Use bool for two RDMAContext flags Markus Armbruster
@ 2023-09-18 17:37   ` Fabiano Rosas
  2023-09-22  7:54   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Fabiano Rosas @ 2023-09-18 17:37 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras

Markus Armbruster <armbru@redhat.com> writes:

> @error_reported and @received_error are flags.  The latter is even
> assigned bool true.  Change them from int to bool.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 15/52] migration/rdma: Ditch useless numeric error codes in error messages
  2023-09-18 14:41 ` [PATCH 15/52] migration/rdma: Ditch useless numeric error codes in error messages Markus Armbruster
@ 2023-09-18 18:47   ` Fabiano Rosas
  2023-09-22  8:44   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Fabiano Rosas @ 2023-09-18 18:47 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras

Markus Armbruster <armbru@redhat.com> writes:

> Several error messages include numeric error codes returned by failed
> functions:
>
> * ibv_poll_cq() returns an unspecified negative value.  Useless.
>
> * rdma_accept and rmda_get_cm_event() return -1.  Useless.
>
> * qemu_rdma_poll() returns either -1 or an unspecified negative
>   value.  Useless.
>
> * qemu_rdma_block_for_wrid(), qemu_rdma_write_flush(),
>   qemu_rdma_exchange_send(), qemu_rdma_exchange_recv(),
>   qemu_rdma_write() return a negative value that may or may not be an
>   errno value.  While reporting human-readable errno
>   information (which a number is not) can be useful, reporting an
>   error code that may or may not be an errno value is useless.
>
> Drop these error codes from the error messages.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 16/52] migration/rdma: Fix io_writev(), io_readv() methods to obey contract
  2023-09-18 14:41 ` [PATCH 16/52] migration/rdma: Fix io_writev(), io_readv() methods to obey contract Markus Armbruster
@ 2023-09-18 18:57   ` Fabiano Rosas
  2023-09-22  8:59   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Fabiano Rosas @ 2023-09-18 18:57 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras

Markus Armbruster <armbru@redhat.com> writes:

> QIOChannelClass methods qio_channel_rdma_readv() and
> qio_channel_rdma_writev() violate their method contract when
> rdma->error_state is non-zero:
>
> 1. They return whatever is in rdma->error_state then.  Only -1 will be
>    fine.  -2 will be misinterpreted as "would block".  Anything less
>    than -2 isn't defined in the contract.  A positive value would be
>    misinterpreted as success, but I believe that's not actually
>    possible.
>
> 2. They neglect to set an error then.  If something up the call stack
>    dereferences the error when failure is returned, it will crash.  If
>    it ignores the return value and checks the error instead, it will
>    miss the error.
>
> Crap like this happens when return statements hide in macros,
> especially when their uses are far away from the definition.
>
> I elected not to investigate how callers are impacted.
>
> Expand the two bad macro uses, so we can set an error and return -1.
> The next commit will then get rid of the macro altogether.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 17/52] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE()
  2023-09-18 14:41 ` [PATCH 17/52] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE() Markus Armbruster
@ 2023-09-18 18:57   ` Fabiano Rosas
  2023-09-22  9:01   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Fabiano Rosas @ 2023-09-18 18:57 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras

Markus Armbruster <armbru@redhat.com> writes:

> Hiding return statements in macros is a bad idea.  Use a function
> instead, and open code the return part.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 18/52] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error
  2023-09-18 14:41 ` [PATCH 18/52] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error Markus Armbruster
@ 2023-09-18 19:00   ` Fabiano Rosas
  2023-09-22  9:10   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Fabiano Rosas @ 2023-09-18 19:00 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras

Markus Armbruster <armbru@redhat.com> writes:

> qemu_rdma_resolve_host() and qemu_rdma_dest_init() try addresses until
> they find on that works.  If none works, they return the first Error
> set by qemu_rdma_broken_ipv6_kernel(), or else return a generic one.
>
> qemu_rdma_broken_ipv6_kernel() neglects to set an Error when
> ibv_open_device() fails.  If a later address fails differently, we use
> that Error instead, or else the generic one.  Harmless enough, but
> needs fixing all the same.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 05/52] migration/rdma: Consistently use uint64_t for work request IDs
  2023-09-18 14:41 ` [PATCH 05/52] migration/rdma: Consistently use uint64_t for work request IDs Markus Armbruster
  2023-09-18 17:03   ` Fabiano Rosas
@ 2023-09-19  5:39   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-19  5:39 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> We use int instead of uint64_t in a few places.  Change them to
> uint64_t.
> 
> This cleans up a comparison of signed qemu_rdma_block_for_wrid()
> parameter @wrid_requested with unsigned @wr_id.  Harmless, because the
> actual arguments are non-negative enumeration constants.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>


Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Thanks


> ---
>   migration/rdma.c       | 7 ++++---
>   migration/trace-events | 8 ++++----
>   2 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index cda22be3f7..4328610a4c 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1599,13 +1599,13 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
>       return rdma->error_state;
>   }
>   
> -static struct ibv_comp_channel *to_channel(RDMAContext *rdma, int wrid)
> +static struct ibv_comp_channel *to_channel(RDMAContext *rdma, uint64_t wrid)
>   {
>       return wrid < RDMA_WRID_RECV_CONTROL ? rdma->send_comp_channel :
>              rdma->recv_comp_channel;
>   }
>   
> -static struct ibv_cq *to_cq(RDMAContext *rdma, int wrid)
> +static struct ibv_cq *to_cq(RDMAContext *rdma, uint64_t wrid)
>   {
>       return wrid < RDMA_WRID_RECV_CONTROL ? rdma->send_cq : rdma->recv_cq;
>   }
> @@ -1623,7 +1623,8 @@ static struct ibv_cq *to_cq(RDMAContext *rdma, int wrid)
>    * completions only need to be recorded, but do not actually
>    * need further processing.
>    */
> -static int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested,
> +static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
> +                                    uint64_t wrid_requested,
>                                       uint32_t *byte_len)
>   {
>       int num_cq_events = 0, ret = 0;
> diff --git a/migration/trace-events b/migration/trace-events
> index b78808f28b..d733107ec6 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -207,7 +207,7 @@ qemu_rdma_accept_incoming_migration(void) ""
>   qemu_rdma_accept_incoming_migration_accepted(void) ""
>   qemu_rdma_accept_pin_state(bool pin) "%d"
>   qemu_rdma_accept_pin_verbsc(void *verbs) "Verbs context after listen: %p"
> -qemu_rdma_block_for_wrid_miss(int wcomp, uint64_t req) "A Wanted wrid %d but got %" PRIu64
> +qemu_rdma_block_for_wrid_miss(uint64_t wcomp, uint64_t req) "A Wanted wrid %" PRIu64 " but got %" PRIu64
>   qemu_rdma_cleanup_disconnect(void) ""
>   qemu_rdma_close(void) ""
>   qemu_rdma_connect_pin_all_requested(void) ""
> @@ -221,9 +221,9 @@ qemu_rdma_exchange_send_waiting(const char *desc) "Waiting for response %s"
>   qemu_rdma_exchange_send_received(const char *desc) "Response %s received."
>   qemu_rdma_fill(size_t control_len, size_t size) "RDMA %zd of %zd bytes already in buffer"
>   qemu_rdma_init_ram_blocks(int blocks) "Allocated %d local ram block structures"
> -qemu_rdma_poll_recv(int64_t comp, int64_t id, int sent) "completion %" PRId64 " received (%" PRId64 ") left %d"
> -qemu_rdma_poll_write(int64_t comp, int left, uint64_t block, uint64_t chunk, void *local, void *remote) "completions %" PRId64 " left %d, block %" PRIu64 ", chunk: %" PRIu64 " %p %p"
> -qemu_rdma_poll_other(int64_t comp, int left) "other completion %" PRId64 " received left %d"
> +qemu_rdma_poll_recv(uint64_t comp, int64_t id, int sent) "completion %" PRIu64 " received (%" PRId64 ") left %d"
> +qemu_rdma_poll_write(uint64_t comp, int left, uint64_t block, uint64_t chunk, void *local, void *remote) "completions %" PRIu64 " left %d, block %" PRIu64 ", chunk: %" PRIu64 " %p %p"
> +qemu_rdma_poll_other(uint64_t comp, int left) "other completion %" PRIu64 " received left %d"
>   qemu_rdma_post_send_control(const char *desc) "CONTROL: sending %s.."
>   qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %" PRIu64 " bytes @ %p"
>   qemu_rdma_register_odp_mr(const char *name) "Try to register On-Demand Paging memory region: %s"

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 20/52] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port
  2023-09-18 14:41 ` [PATCH 20/52] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port Markus Armbruster
@ 2023-09-19 16:02   ` Peter Xu
  2023-09-20 13:13     ` Markus Armbruster
  0 siblings, 1 reply; 174+ messages in thread
From: Peter Xu @ 2023-09-19 16:02 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, quintela, leobras

On Mon, Sep 18, 2023 at 04:41:34PM +0200, Markus Armbruster wrote:
> qemu_rdma_data_init() neglects to set an Error when it fails because
> @host_port is null.  Fortunately, no caller passes null, so this is

Indeed they all seem to be non-null.

Before this patch, qemu_rdma_data_init() can still tolerant NULL, not
setting errp but still returning NULL showing an error.

After this patch, qemu_rdma_data_init() should crash at inet_parse() if
it's null.

Would it be simpler and clearer if we just set ERROR() for !host_port?

Thanks,

> merely a latent bug.  Drop the flawed code handling null argument.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  migration/rdma.c | 29 +++++++++++++----------------
>  1 file changed, 13 insertions(+), 16 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index d3dc162363..cc59155a50 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2716,25 +2716,22 @@ static RDMAContext *qemu_rdma_data_init(const char *host_port, Error **errp)
>      RDMAContext *rdma = NULL;
>      InetSocketAddress *addr;
>  
> -    if (host_port) {
> -        rdma = g_new0(RDMAContext, 1);
> -        rdma->current_index = -1;
> -        rdma->current_chunk = -1;
> +    rdma = g_new0(RDMAContext, 1);
> +    rdma->current_index = -1;
> +    rdma->current_chunk = -1;
>  
> -        addr = g_new(InetSocketAddress, 1);
> -        if (!inet_parse(addr, host_port, NULL)) {
> -            rdma->port = atoi(addr->port);
> -            rdma->host = g_strdup(addr->host);
> -            rdma->host_port = g_strdup(host_port);
> -        } else {
> -            ERROR(errp, "bad RDMA migration address '%s'", host_port);
> -            g_free(rdma);
> -            rdma = NULL;
> -        }
> -
> -        qapi_free_InetSocketAddress(addr);
> +    addr = g_new(InetSocketAddress, 1);
> +    if (!inet_parse(addr, host_port, NULL)) {
> +        rdma->port = atoi(addr->port);
> +        rdma->host = g_strdup(addr->host);
> +        rdma->host_port = g_strdup(host_port);
> +    } else {
> +        ERROR(errp, "bad RDMA migration address '%s'", host_port);
> +        g_free(rdma);
> +        rdma = NULL;
>      }
>  
> +    qapi_free_InetSocketAddress(addr);
>      return rdma;
>  }
>  
> -- 
> 2.41.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 19/52] migration/rdma: Fix qemu_get_cm_event_timeout() to always set error
  2023-09-18 14:41 ` [PATCH 19/52] migration/rdma: Fix qemu_get_cm_event_timeout() to always " Markus Armbruster
@ 2023-09-19 16:02   ` Peter Xu
  2023-09-22  9:12   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Peter Xu @ 2023-09-19 16:02 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, quintela, leobras

On Mon, Sep 18, 2023 at 04:41:33PM +0200, Markus Armbruster wrote:
> qemu_get_cm_event_timeout() neglects to set an error when it fails
> because rdma_get_cm_event() fails.  Harmless, as its caller
> qemu_rdma_connect() substitutes a generic error then.  Fix it anyway.
> 
> qemu_rdma_connect() also sets the generic error when its own call of
> rdma_get_cm_event() fails.  Make the error handling more obvious: set
> a specific error right after rdma_get_cm_event() fails.  Delete the
> generic error.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 00/52] migration/rdma: Error handling fixes
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (51 preceding siblings ...)
  2023-09-18 14:42 ` [PATCH 52/52] migration/rdma: Fix how we show device details on open Markus Armbruster
@ 2023-09-19 16:49 ` Peter Xu
  2023-09-19 18:29   ` Daniel P. Berrangé
  2023-09-21  8:27   ` Zhijian Li (Fujitsu)
  2023-09-28 11:08 ` Markus Armbruster
  53 siblings, 2 replies; 174+ messages in thread
From: Peter Xu @ 2023-09-19 16:49 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, quintela, leobras, Li Zhijian

On Mon, Sep 18, 2023 at 04:41:14PM +0200, Markus Armbruster wrote:
> Oh dear, where to start.  There's so much wrong, and in pretty obvious
> ways.  This code should never have passed review.  I'm refraining from
> saying more; see the commit messages instead.
> 
> Issues remaining after this series include:
> 
> * Terrible error messages
> 
> * Some error message cascades remain
> 
> * There is no written contract for QEMUFileHooks, and the
>   responsibility for reporting errors is unclear

Even being removed.. because no one is really extending that..

https://lore.kernel.org/all/20230509120700.78359-1-quintela@redhat.com/#t

> 
> * There seem to be no tests whatsoever

I always see rdma as "odd fixes" stage.. for a long time.  But maybe I was
wrong.

Copying Zhijian for status of rdma; Zhijian, I saw that you just replied to
the hwpoison issue.  Maybe we should have one entry for rdma too, just like
colo?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 00/52] migration/rdma: Error handling fixes
  2023-09-19 16:49 ` [PATCH 00/52] migration/rdma: Error handling fixes Peter Xu
@ 2023-09-19 18:29   ` Daniel P. Berrangé
  2023-09-19 18:57     ` Fabiano Rosas
                       ` (2 more replies)
  2023-09-21  8:27   ` Zhijian Li (Fujitsu)
  1 sibling, 3 replies; 174+ messages in thread
From: Daniel P. Berrangé @ 2023-09-19 18:29 UTC (permalink / raw)
  To: Peter Xu; +Cc: Markus Armbruster, qemu-devel, quintela, leobras, Li Zhijian

On Tue, Sep 19, 2023 at 12:49:46PM -0400, Peter Xu wrote:
> On Mon, Sep 18, 2023 at 04:41:14PM +0200, Markus Armbruster wrote:
> > Oh dear, where to start.  There's so much wrong, and in pretty obvious
> > ways.  This code should never have passed review.  I'm refraining from
> > saying more; see the commit messages instead.
> > 
> > Issues remaining after this series include:
> > 
> > * Terrible error messages
> > 
> > * Some error message cascades remain
> > 
> > * There is no written contract for QEMUFileHooks, and the
> >   responsibility for reporting errors is unclear
> 
> Even being removed.. because no one is really extending that..
> 
> https://lore.kernel.org/all/20230509120700.78359-1-quintela@redhat.com/#t

One day (in another 5-10 years) I still hope we'll get to
the point where QEMUFile itself is obsolete :-) Getting
rid of QEMUFileHooks is a great step in that direction.
Me finishing a old PoC to bring buffering to QIOChannel
would be another big step.

The data rate limiting would be the biggest missing piece
to enable migration/vmstate logic to directly consume
a QIOChannel.

Eliminating QEMUFile would help to bring Error **errp
to all the vmstate codepaths.

> > * There seem to be no tests whatsoever
> 
> I always see rdma as "odd fixes" stage.. for a long time.  But maybe I was
> wrong.

In the MAINTAINERS file RDMA still get classified as formally
supported under the migration maintainers.  I'm not convinced
that is an accurate description of its status.  I tend to agree
with you that it is 'odd fixes' at the very best.

Dave Gilbert had previously speculated about whether we should
even consider deprecating it on the basis that latest non-RDMA
migration is too much better than in the past, with multifd
and zerocopy, that RDMA might not even offer a significant
enough peformance win to justify.

> Copying Zhijian for status of rdma; Zhijian, I saw that you just replied to
> the hwpoison issue.  Maybe we should have one entry for rdma too, just like
> colo?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 00/52] migration/rdma: Error handling fixes
  2023-09-19 18:29   ` Daniel P. Berrangé
@ 2023-09-19 18:57     ` Fabiano Rosas
  2023-09-20 13:22     ` Markus Armbruster
  2023-10-04 18:00     ` Juan Quintela
  2 siblings, 0 replies; 174+ messages in thread
From: Fabiano Rosas @ 2023-09-19 18:57 UTC (permalink / raw)
  To: Daniel P. Berrangé, Peter Xu
  Cc: Markus Armbruster, qemu-devel, quintela, leobras, Li Zhijian

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Tue, Sep 19, 2023 at 12:49:46PM -0400, Peter Xu wrote:
>> On Mon, Sep 18, 2023 at 04:41:14PM +0200, Markus Armbruster wrote:
>> > Oh dear, where to start.  There's so much wrong, and in pretty obvious
>> > ways.  This code should never have passed review.  I'm refraining from
>> > saying more; see the commit messages instead.
>> > 
>> > Issues remaining after this series include:
>> > 
>> > * Terrible error messages
>> > 
>> > * Some error message cascades remain
>> > 
>> > * There is no written contract for QEMUFileHooks, and the
>> >   responsibility for reporting errors is unclear
>> 
>> Even being removed.. because no one is really extending that..
>> 
>> https://lore.kernel.org/all/20230509120700.78359-1-quintela@redhat.com/#t
>
> One day (in another 5-10 years) I still hope we'll get to
> the point where QEMUFile itself is obsolete :-) Getting
> rid of QEMUFileHooks is a great step in that direction.
> Me finishing a old PoC to bring buffering to QIOChannel
> would be another big step.
>

If you need any help with that let me know. I've been tripping over
QEMUFile weirdness on a daily basis.

Just last week I was looking into restricting the usage of
qemu_file_set_error() to qemu-file.c so we can get rid of this situation
where any piece of code that has a pointer to the QEMUFile can put
whatever it wants in f->last_error* and the rest of the code has to
guess when to call qemu_file_get_error().

*last_error actually stores the first error

Moving all the interesting parts into the channel and removing QEMUFile
would of course be the better solution. Multifd already ignores it
completly, so there's probably more code that could be made generic
after that change.

Also, looking at what people do with iovs in the block layer, it seems
the migration code is a little behind.

> The data rate limiting would be the biggest missing piece
> to enable migration/vmstate logic to directly consume
> a QIOChannel.
>
> Eliminating QEMUFile would help to bring Error **errp
> to all the vmstate codepaths.
>
>> > * There seem to be no tests whatsoever
>> 
>> I always see rdma as "odd fixes" stage.. for a long time.  But maybe I was
>> wrong.
>
> In the MAINTAINERS file RDMA still get classified as formally
> supported under the migration maintainers.  I'm not convinced
> that is an accurate description of its status.  I tend to agree
> with you that it is 'odd fixes' at the very best.
>
> Dave Gilbert had previously speculated about whether we should
> even consider deprecating it on the basis that latest non-RDMA
> migration is too much better than in the past, with multifd
> and zerocopy, that RDMA might not even offer a significant
> enough peformance win to justify.
>
>> Copying Zhijian for status of rdma; Zhijian, I saw that you just replied to
>> the hwpoison issue.  Maybe we should have one entry for rdma too, just like
>> colo?
>
> With regards,
> Daniel


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 06/52] migration/rdma: Clean up two more harmless signed vs. unsigned issues
  2023-09-18 17:10   ` Fabiano Rosas
@ 2023-09-20 13:11     ` Markus Armbruster
  0 siblings, 0 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-20 13:11 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, quintela, peterx, leobras

Fabiano Rosas <farosas@suse.de> writes:

> Markus Armbruster <armbru@redhat.com> writes:
>
>> qemu_rdma_exchange_get_response() compares int parameter @expecting
>> with uint32_t head->type.  Actual arguments are non-negative
>> enumeration constants, RDMAControlHeader uint32_t member type, or
>> qemu_rdma_exchange_recv() int parameter expecting.  Actual arguments
>> for the latter are non-negative enumeration constants.  Change both
>> parameters to uint32_t.
>>
>> In qio_channel_rdma_readv(), loop control variable @i is ssize_t, and
>> counts from 0 up to @niov, which is size_t.  Change @i to size_t.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
>
> just a comment...
>
>>  static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
>> -                                int expecting)
>> +                                   uint32_t expecting)
>>  {
>>      RDMAControlHeader ready = {
>>                                  .len = 0,
>> @@ -2851,7 +2851,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>>      RDMAContext *rdma;
>>      RDMAControlHeader head;
>>      int ret = 0;
>> -    ssize_t i;
>> +    size_t i;
>>      size_t done = 0;
>
> It seems the idea was for 'done' to be ssize_t like in
> qio_channel_rdma_writev()

You're right, the two functions are still inconsistent: @done is size_t
here and ssize_t there.

Hmm, there's yet another mess:

        ret = qemu_rdma_fill(rdma, data, want, 0);
        done += ret;
        want -= ret;

qemu_rdma_fill() returns size_t, @done is size_t or ssize_t, @want is
@size_t, but @ret is int.  Unwanted truncation is theoretically
possible.

Separate patch.

Thanks!



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 12/52] migration/rdma: Drop qemu_rdma_search_ram_block() error handling
  2023-09-18 17:35   ` Fabiano Rosas
@ 2023-09-20 13:11     ` Markus Armbruster
  0 siblings, 0 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-20 13:11 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, quintela, peterx, leobras

Fabiano Rosas <farosas@suse.de> writes:

> Markus Armbruster <armbru@redhat.com> writes:
>
>> qemu_rdma_search_ram_block() can't fail.  Return void, and drop the
>> unreachable error handling.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  migration/rdma.c | 22 ++++++++--------------
>>  1 file changed, 8 insertions(+), 14 deletions(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index 2b0f9d52d8..98520a42b4 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -1234,12 +1234,12 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
>>   *
>>   * This search cannot fail or the migration will fail.
>>   */
>
> This comment can be removed as well.

Will do, thanks!

[...]



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 20/52] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port
  2023-09-19 16:02   ` Peter Xu
@ 2023-09-20 13:13     ` Markus Armbruster
  0 siblings, 0 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-20 13:13 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, quintela, leobras

Peter Xu <peterx@redhat.com> writes:

> On Mon, Sep 18, 2023 at 04:41:34PM +0200, Markus Armbruster wrote:
>> qemu_rdma_data_init() neglects to set an Error when it fails because
>> @host_port is null.  Fortunately, no caller passes null, so this is
>
> Indeed they all seem to be non-null.
>
> Before this patch, qemu_rdma_data_init() can still tolerant NULL, not
> setting errp but still returning NULL showing an error.

Returning failure without setting an error is wrong :)

> After this patch, qemu_rdma_data_init() should crash at inet_parse() if
> it's null.

Yes.

> Would it be simpler and clearer if we just set ERROR() for !host_port?

I dislike impossible error paths, because they are untestable.

> Thanks,
>
>> merely a latent bug.  Drop the flawed code handling null argument.
>> 
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 00/52] migration/rdma: Error handling fixes
  2023-09-19 18:29   ` Daniel P. Berrangé
  2023-09-19 18:57     ` Fabiano Rosas
@ 2023-09-20 13:22     ` Markus Armbruster
  2023-10-04 18:00     ` Juan Quintela
  2 siblings, 0 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-20 13:22 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Peter Xu, qemu-devel, quintela, leobras, Li Zhijian

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Tue, Sep 19, 2023 at 12:49:46PM -0400, Peter Xu wrote:
>> On Mon, Sep 18, 2023 at 04:41:14PM +0200, Markus Armbruster wrote:
>> > Oh dear, where to start.  There's so much wrong, and in pretty obvious
>> > ways.  This code should never have passed review.  I'm refraining from
>> > saying more; see the commit messages instead.
>> > 
>> > Issues remaining after this series include:
>> > 
>> > * Terrible error messages
>> > 
>> > * Some error message cascades remain
>> > 
>> > * There is no written contract for QEMUFileHooks, and the
>> >   responsibility for reporting errors is unclear
>> 
>> Even being removed.. because no one is really extending that..
>> 
>> https://lore.kernel.org/all/20230509120700.78359-1-quintela@redhat.com/#t
>
> One day (in another 5-10 years) I still hope we'll get to
> the point where QEMUFile itself is obsolete :-) Getting
> rid of QEMUFileHooks is a great step in that direction.
> Me finishing a old PoC to bring buffering to QIOChannel
> would be another big step.
>
> The data rate limiting would be the biggest missing piece
> to enable migration/vmstate logic to directly consume
> a QIOChannel.
>
> Eliminating QEMUFile would help to bring Error **errp
> to all the vmstate codepaths.

Sounds like improvement to me.

>> > * There seem to be no tests whatsoever
>> 
>> I always see rdma as "odd fixes" stage.. for a long time.  But maybe I was
>> wrong.

To be honest, it doesn't look or smell maintained to me.  More like
thrown over the fence and left to rot.  Given the shape it is in, I
wouldn't let friends use it in production.

> In the MAINTAINERS file RDMA still get classified as formally
> supported under the migration maintainers.  I'm not convinced
> that is an accurate description of its status.  I tend to agree
> with you that it is 'odd fixes' at the very best.

Let's fix MAINTAINERS not to raise unrealistic expectations.

> Dave Gilbert had previously speculated about whether we should
> even consider deprecating it on the basis that latest non-RDMA
> migration is too much better than in the past, with multifd
> and zerocopy, that RDMA might not even offer a significant
> enough peformance win to justify.

I provided approximately 52 additional arguments for deprecating it :)

>> Copying Zhijian for status of rdma; Zhijian, I saw that you just replied to
>> the hwpoison issue.  Maybe we should have one entry for rdma too, just like
>> colo?
>
> With regards,
> Daniel



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 00/52] migration/rdma: Error handling fixes
  2023-09-19 16:49 ` [PATCH 00/52] migration/rdma: Error handling fixes Peter Xu
  2023-09-19 18:29   ` Daniel P. Berrangé
@ 2023-09-21  8:27   ` Zhijian Li (Fujitsu)
  2023-09-22 15:21     ` Peter Xu
  1 sibling, 1 reply; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-21  8:27 UTC (permalink / raw)
  To: Peter Xu, Markus Armbruster; +Cc: qemu-devel, quintela, leobras

Perter,


On 20/09/2023 00:49, Peter Xu wrote:
> On Mon, Sep 18, 2023 at 04:41:14PM +0200, Markus Armbruster wrote:
>> Oh dear, where to start.  There's so much wrong, and in pretty obvious
>> ways.  This code should never have passed review.  I'm refraining from
>> saying more; see the commit messages instead.
>>
>> Issues remaining after this series include:
>>
>> * Terrible error messages
>>
>> * Some error message cascades remain
>>
>> * There is no written contract for QEMUFileHooks, and the
>>    responsibility for reporting errors is unclear
> 
> Even being removed.. because no one is really extending that..
> 
> https://lore.kernel.org/all/20230509120700.78359-1-quintela@redhat.com/#t
> 
>>
>> * There seem to be no tests whatsoever
> 
> I always see rdma as "odd fixes" stage.. for a long time.  But maybe I was
> wrong.
> 
> Copying Zhijian for status of rdma; 

Thanks,

Yeah, sometimes I will pay attention to migration, especially patches related
to RDMA and COLO. I just knew i have missed so much patches to RDMA, most of
them had got RVB, but dropped at PULL phase at last. What a pity.


Zhijian, I saw that you just replied to
> the hwpoison issue.  Maybe we should have one entry for rdma too, just like
> colo?

I'm worried that I may not have enough time, ability, or environment to review/test
the RDMA patches. but for this patch set, i will take a look later.


Thanks
Zhijian

> > Thanks,
> 

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 01/52] migration/rdma: Clean up qemu_rdma_poll()'s return type
  2023-09-18 14:41 ` [PATCH 01/52] migration/rdma: Clean up qemu_rdma_poll()'s return type Markus Armbruster
  2023-09-18 16:50   ` Fabiano Rosas
@ 2023-09-21  8:52   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-21  8:52 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> qemu_rdma_poll()'s return type is uint64_t, even though it returns 0,
> -1, or @ret, which is int.  Its callers assign the return value to int
> variables, then check whether it's negative.  Unclean.
> 
> Return int instead.
> 
> Signed-off-by: Markus Armbruster<armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 02/52] migration/rdma: Clean up qemu_rdma_data_init()'s return type
  2023-09-18 14:41 ` [PATCH 02/52] migration/rdma: Clean up qemu_rdma_data_init()'s " Markus Armbruster
  2023-09-18 16:51   ` Fabiano Rosas
@ 2023-09-21  8:52   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-21  8:52 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> qemu_rdma_data_init() return type is void *.  It actually returns
> RDMAContext *, and all its callers assign the value to an
> RDMAContext *.  Unclean.
> 
> Return RDMAContext * instead.
> 
> Signed-off-by: Markus Armbruster<armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 03/52] migration/rdma: Clean up rdma_delete_block()'s return type
  2023-09-18 14:41 ` [PATCH 03/52] migration/rdma: Clean up rdma_delete_block()'s " Markus Armbruster
  2023-09-18 16:53   ` Fabiano Rosas
@ 2023-09-21  8:53   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-21  8:53 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> rdma_delete_block() always returns 0, which its only caller ignores.
> Return void instead.
> 
> Signed-off-by: Markus Armbruster<armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 04/52] migration/rdma: Drop fragile wr_id formatting
  2023-09-18 14:41 ` [PATCH 04/52] migration/rdma: Drop fragile wr_id formatting Markus Armbruster
  2023-09-18 17:01   ` Fabiano Rosas
@ 2023-09-21  8:54   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-21  8:54 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> wrid_desc[] uses 4001 pointers to map four integer values to strings.
> 
> print_wrid() accesses wrid_desc[] out of bounds when passed a negative
> argument.  It returns null for values 2..1999 and 2001..3999.
> 
> qemu_rdma_poll() and qemu_rdma_block_for_wrid() print wrid_desc[wr_id]
> and passes print_wrid(wr_id) to tracepoints.  Could conceivably crash
> trying to format a null string.  I believe access out of bounds is not
> possible.
> 
> Not worth cleaning up.  Dumb down to show just numeric wr_id.

Yeah, a numeric wr_id is enough


> 
> Signed-off-by: Markus Armbruster<armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 07/52] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage
  2023-09-18 14:41 ` [PATCH 07/52] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage Markus Armbruster
  2023-09-18 17:11   ` Fabiano Rosas
@ 2023-09-21  9:00   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-21  9:00 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 08/52] migration/rdma: Fix qemu_rdma_accept() to return failure on errors
  2023-09-18 14:41 ` [PATCH 08/52] migration/rdma: Fix qemu_rdma_accept() to return failure on errors Markus Armbruster
  2023-09-18 17:15   ` Fabiano Rosas
@ 2023-09-21  9:07   ` Zhijian Li (Fujitsu)
  2023-09-28 10:53     ` Markus Armbruster
  1 sibling, 1 reply; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-21  9:07 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> qemu_rdma_accept() returns 0 in some cases even when it didn't
> complete its job due to errors.  Impact is not obvious.  I figure the
> caller will soon fail again with a misleading error message.
> 
> Fix it to return -1 on any failure.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

I noticed that ret initialization is also meaningless in qemu_rdma_accept()

3354 static int qemu_rdma_accept(RDMAContext *rdma)
3355 {
3356     RDMACapabilities cap;
3357     struct rdma_conn_param conn_param = {
3358                                             .responder_resources = 2,
3359                                             .private_data = &cap,
3360                                             .private_data_len = sizeof(cap),
3361                                          };
3362     RDMAContext *rdma_return_path = NULL;
3363     struct rdma_cm_event *cm_event;
3364     struct ibv_context *verbs;
3365     int ret = -EINVAL;     <<<<< drop it ?
3366     int idx;


Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 09/52] migration/rdma: Put @errp parameter last
  2023-09-18 14:41 ` [PATCH 09/52] migration/rdma: Put @errp parameter last Markus Armbruster
  2023-09-18 17:17   ` Fabiano Rosas
@ 2023-09-21  9:08   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-21  9:08 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> include/qapi/error.h demands:
> 
>   * - Functions that use Error to report errors have an Error **errp
>   *   parameter.  It should be the last parameter, except for functions
>   *   taking variable arguments.
> 
> qemu_rdma_connect() does not conform.  Clean it up.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 10/52] migration/rdma: Eliminate error_propagate()
  2023-09-18 14:41 ` [PATCH 10/52] migration/rdma: Eliminate error_propagate() Markus Armbruster
  2023-09-18 17:20   ` Fabiano Rosas
@ 2023-09-21  9:31   ` Zhijian Li (Fujitsu)
  2023-09-27 16:20   ` Eric Blake
  2 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-21  9:31 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> When all we do with an Error we receive into a local variable is
> propagating to somewhere else, we can just as well receive it there
> right away.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> ---
>   migration/rdma.c | 19 +++++++------------
>   1 file changed, 7 insertions(+), 12 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 2b40bbcbb0..960fff5860 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2445,7 +2445,6 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>   static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
>   {
>       int ret, idx;
> -    Error *local_err = NULL, **temp = &local_err;
>   
>       /*
>        * Will be validated against destination's actual capabilities
> @@ -2453,14 +2452,14 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
>        */
>       rdma->pin_all = pin_all;
>   
> -    ret = qemu_rdma_resolve_host(rdma, temp);
> +    ret = qemu_rdma_resolve_host(rdma, errp);
>       if (ret) {
>           goto err_rdma_source_init;
>       }
>   
>       ret = qemu_rdma_alloc_pd_cq(rdma);
>       if (ret) {
> -        ERROR(temp, "rdma migration: error allocating pd and cq! Your mlock()"
> +        ERROR(errp, "rdma migration: error allocating pd and cq! Your mlock()"
>                       " limits may be too low. Please check $ ulimit -a # and "
>                       "search for 'ulimit -l' in the output");
>           goto err_rdma_source_init;
> @@ -2468,13 +2467,13 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
>   
>       ret = qemu_rdma_alloc_qp(rdma);
>       if (ret) {
> -        ERROR(temp, "rdma migration: error allocating qp!");
> +        ERROR(errp, "rdma migration: error allocating qp!");
>           goto err_rdma_source_init;
>       }
>   
>       ret = qemu_rdma_init_ram_blocks(rdma);
>       if (ret) {
> -        ERROR(temp, "rdma migration: error initializing ram blocks!");
> +        ERROR(errp, "rdma migration: error initializing ram blocks!");
>           goto err_rdma_source_init;
>       }
>   
> @@ -2489,7 +2488,7 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
>       for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
>           ret = qemu_rdma_reg_control(rdma, idx);
>           if (ret) {
> -            ERROR(temp, "rdma migration: error registering %d control!",
> +            ERROR(errp, "rdma migration: error registering %d control!",
>                                                               idx);
>               goto err_rdma_source_init;
>           }
> @@ -2498,7 +2497,6 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
>       return 0;
>   
>   err_rdma_source_init:
> -    error_propagate(errp, local_err);
>       qemu_rdma_cleanup(rdma);
>       return -1;
>   }
> @@ -4103,7 +4101,6 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
>   {
>       int ret;
>       RDMAContext *rdma;
> -    Error *local_err = NULL;
>   
>       trace_rdma_start_incoming_migration();
>   
> @@ -4113,13 +4110,12 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
>           return;
>       }
>   
> -    rdma = qemu_rdma_data_init(host_port, &local_err);
> +    rdma = qemu_rdma_data_init(host_port, errp);
>       if (rdma == NULL) {
>           goto err;
>       }
>   
> -    ret = qemu_rdma_dest_init(rdma, &local_err);
> -
> +    ret = qemu_rdma_dest_init(rdma, errp);
>       if (ret) {
>           goto err;
>       }
> @@ -4142,7 +4138,6 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
>   cleanup_rdma:
>       qemu_rdma_cleanup(rdma);
>   err:
> -    error_propagate(errp, local_err);
>       if (rdma) {
>           g_free(rdma->host);
>           g_free(rdma->host_port);

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 11/52] migration/rdma: Drop rdma_add_block() error handling
  2023-09-18 14:41 ` [PATCH 11/52] migration/rdma: Drop rdma_add_block() error handling Markus Armbruster
  2023-09-18 17:32   ` Fabiano Rosas
@ 2023-09-21  9:39   ` Zhijian Li (Fujitsu)
  2023-09-21 11:15     ` Markus Armbruster
  1 sibling, 1 reply; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-21  9:39 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> rdma_add_block() can't fail.  Return void, and drop the unreachable
> error handling.
> 
> Signed-off-by: Markus Armbruster<armbru@redhat.com>
> ---
>   migration/rdma.c | 30 +++++++++---------------------
>   1 file changed, 9 insertions(+), 21 deletions(-)
> 

[...]

>    * during dynamic page registration.
>    */
> -static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
> +static void qemu_rdma_init_ram_blocks(RDMAContext *rdma)
>   {
>       RDMALocalBlocks *local = &rdma->local_ram_blocks;
>       int ret;
> @@ -646,14 +645,11 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
>       assert(rdma->blockmap == NULL);
>       memset(local, 0, sizeof *local);
>       ret = foreach_not_ignored_block(qemu_rdma_init_one_block, rdma);
> -    if (ret) {
> -        return ret;
> -    }
> +    assert(!ret);

Why we still need a new assert(), can we remove the ret together.

     foreach_not_ignored_block(qemu_rdma_init_one_block, rdma);
     trace_qemu_rdma_init_ram_blocks(local->nb_blocks);


Thanks
Zhijian

>       trace_qemu_rdma_init_ram_blocks(local->nb_blocks);

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 11/52] migration/rdma: Drop rdma_add_block() error handling
  2023-09-21  9:39   ` Zhijian Li (Fujitsu)
@ 2023-09-21 11:15     ` Markus Armbruster
  2023-09-22  7:49       ` Zhijian Li (Fujitsu)
  0 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-21 11:15 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu); +Cc: qemu-devel, quintela, peterx, leobras

"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> On 18/09/2023 22:41, Markus Armbruster wrote:
>> rdma_add_block() can't fail.  Return void, and drop the unreachable
>> error handling.
>> 
>> Signed-off-by: Markus Armbruster<armbru@redhat.com>
>> ---
>>   migration/rdma.c | 30 +++++++++---------------------
>>   1 file changed, 9 insertions(+), 21 deletions(-)
>> 
>
> [...]
>
>>    * during dynamic page registration.
>>    */
>> -static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
>> +static void qemu_rdma_init_ram_blocks(RDMAContext *rdma)
>>   {
>>       RDMALocalBlocks *local = &rdma->local_ram_blocks;
>>       int ret;
>> @@ -646,14 +645,11 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
>>       assert(rdma->blockmap == NULL);
>>       memset(local, 0, sizeof *local);
>>       ret = foreach_not_ignored_block(qemu_rdma_init_one_block, rdma);
>> -    if (ret) {
>> -        return ret;
>> -    }
>> +    assert(!ret);
>
> Why we still need a new assert(), can we remove the ret together.
>
>      foreach_not_ignored_block(qemu_rdma_init_one_block, rdma);
>      trace_qemu_rdma_init_ram_blocks(local->nb_blocks);

The "the callback doesn't fail" is a non-local argument.  The assertion
checks it.  I'd be fine with dropping it, since the argument is
straightforward enough.  Thoughts?



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 11/52] migration/rdma: Drop rdma_add_block() error handling
  2023-09-21 11:15     ` Markus Armbruster
@ 2023-09-22  7:49       ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-22  7:49 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, quintela, peterx, leobras



On 21/09/2023 19:15, Markus Armbruster wrote:
> "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:
> 
>> On 18/09/2023 22:41, Markus Armbruster wrote:
>>> rdma_add_block() can't fail.  Return void, and drop the unreachable
>>> error handling.
>>>
>>> Signed-off-by: Markus Armbruster<armbru@redhat.com>
>>> ---
>>>    migration/rdma.c | 30 +++++++++---------------------
>>>    1 file changed, 9 insertions(+), 21 deletions(-)
>>>
>>
>> [...]
>>
>>>     * during dynamic page registration.
>>>     */
>>> -static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
>>> +static void qemu_rdma_init_ram_blocks(RDMAContext *rdma)
>>>    {
>>>        RDMALocalBlocks *local = &rdma->local_ram_blocks;
>>>        int ret;
>>> @@ -646,14 +645,11 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
>>>        assert(rdma->blockmap == NULL);
>>>        memset(local, 0, sizeof *local);
>>>        ret = foreach_not_ignored_block(qemu_rdma_init_one_block, rdma);
>>> -    if (ret) {
>>> -        return ret;
>>> -    }
>>> +    assert(!ret);
>>
>> Why we still need a new assert(), can we remove the ret together.
>>
>>       foreach_not_ignored_block(qemu_rdma_init_one_block, rdma);
>>       trace_qemu_rdma_init_ram_blocks(local->nb_blocks);
> 
> The "the callback doesn't fail" is a non-local argument.  The assertion
> checks it.  I'd be fine with dropping it, since the argument is
> straightforward enough.  Thoughts?
> 

Both are fine, I prefer to drop it personally. :)

Anyway,

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 12/52] migration/rdma: Drop qemu_rdma_search_ram_block() error handling
  2023-09-18 14:41 ` [PATCH 12/52] migration/rdma: Drop qemu_rdma_search_ram_block() " Markus Armbruster
  2023-09-18 17:35   ` Fabiano Rosas
@ 2023-09-22  7:50   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-22  7:50 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> qemu_rdma_search_ram_block() can't fail.  Return void, and drop the
> unreachable error handling.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 13/52] migration/rdma: Make qemu_rdma_buffer_mergable() return bool
  2023-09-18 14:41 ` [PATCH 13/52] migration/rdma: Make qemu_rdma_buffer_mergable() return bool Markus Armbruster
  2023-09-18 17:36   ` Fabiano Rosas
@ 2023-09-22  7:51   ` Zhijian Li (Fujitsu)
  2023-09-27 16:26   ` Eric Blake
  2 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-22  7:51 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> qemu_rdma_buffer_mergable() is semantically a predicate.  It returns
> int 0 or 1.  Return bool instead.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 14/52] migration/rdma: Use bool for two RDMAContext flags
  2023-09-18 14:41 ` [PATCH 14/52] migration/rdma: Use bool for two RDMAContext flags Markus Armbruster
  2023-09-18 17:37   ` Fabiano Rosas
@ 2023-09-22  7:54   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-22  7:54 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> @error_reported and @received_error are flags.  The latter is even
> assigned bool true.  Change them from int to bool.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

> ---
>   migration/rdma.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 97715dbd78..c02a1c83b2 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -91,7 +91,7 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
>               if (!rdma->error_reported) { \
>                   error_report("RDMA is in an error state waiting migration" \
>                                   " to abort!"); \
> -                rdma->error_reported = 1; \
> +                rdma->error_reported = true; \
>               } \
>               return rdma->error_state; \
>           } \
> @@ -365,8 +365,8 @@ typedef struct RDMAContext {
>        * and remember the error state.
>        */
>       int error_state;
> -    int error_reported;
> -    int received_error;
> +    bool error_reported;
> +    bool received_error;
>   
>       /*
>        * Description of ram blocks used throughout the code.

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 15/52] migration/rdma: Ditch useless numeric error codes in error messages
  2023-09-18 14:41 ` [PATCH 15/52] migration/rdma: Ditch useless numeric error codes in error messages Markus Armbruster
  2023-09-18 18:47   ` Fabiano Rosas
@ 2023-09-22  8:44   ` Zhijian Li (Fujitsu)
  2023-09-22  9:43     ` Markus Armbruster
  1 sibling, 1 reply; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-22  8:44 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> Several error messages include numeric error codes returned by failed
> functions:
> 
> * ibv_poll_cq() returns an unspecified negative value.  Useless.
> 
> * rdma_accept and rmda_get_cm_event() return -1.  Useless.


s/rmda_get_cm_event/rdma_get_cm_event

Otherwise,
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> 
> * qemu_rdma_poll() returns either -1 or an unspecified negative
>    value.  Useless.
> 
> * qemu_rdma_block_for_wrid(), qemu_rdma_write_flush(),
>    qemu_rdma_exchange_send(), qemu_rdma_exchange_recv(),
>    qemu_rdma_write() return a negative value that may or may not be an
>    errno value.  While reporting human-readable errno
>    information (which a number is not) can be useful, reporting an
>    error code that may or may not be an errno value is useless.
> 
> Drop these error codes from the error messages.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   migration/rdma.c | 20 ++++++++++----------
>   1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index c02a1c83b2..2173cb076f 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1460,7 +1460,7 @@ static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
>       }
>   
>       if (ret < 0) {
> -        error_report("ibv_poll_cq return %d", ret);
> +        error_report("ibv_poll_cq failed");
>           return ret;
>       }
>   
> @@ -2194,7 +2194,7 @@ retry:
>           ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL);
>           if (ret < 0) {
>               error_report("rdma migration: failed to make "
> -                         "room in full send queue! %d", ret);
> +                         "room in full send queue!");
>               return ret;
>           }
>   
> @@ -2770,7 +2770,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>       ret = qemu_rdma_write_flush(f, rdma);
>       if (ret < 0) {
>           rdma->error_state = ret;
> -        error_setg(errp, "qemu_rdma_write_flush returned %d", ret);
> +        error_setg(errp, "qemu_rdma_write_flush failed");
>           return -1;
>       }
>   
> @@ -2790,7 +2790,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>   
>               if (ret < 0) {
>                   rdma->error_state = ret;
> -                error_setg(errp, "qemu_rdma_exchange_send returned %d", ret);
> +                error_setg(errp, "qemu_rdma_exchange_send failed");
>                   return -1;
>               }
>   
> @@ -2880,7 +2880,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>   
>           if (ret < 0) {
>               rdma->error_state = ret;
> -            error_setg(errp, "qemu_rdma_exchange_recv returned %d", ret);
> +            error_setg(errp, "qemu_rdma_exchange_recv failed");
>               return -1;
>           }
>   
> @@ -3222,7 +3222,7 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
>        */
>       ret = qemu_rdma_write(f, rdma, block_offset, offset, size);
>       if (ret < 0) {
> -        error_report("rdma migration: write error! %d", ret);
> +        error_report("rdma migration: write error");
>           goto err;
>       }
>   
> @@ -3249,7 +3249,7 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
>           uint64_t wr_id, wr_id_in;
>           int ret = qemu_rdma_poll(rdma, rdma->recv_cq, &wr_id_in, NULL);
>           if (ret < 0) {
> -            error_report("rdma migration: polling error! %d", ret);
> +            error_report("rdma migration: polling error");
>               goto err;
>           }
>   
> @@ -3264,7 +3264,7 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
>           uint64_t wr_id, wr_id_in;
>           int ret = qemu_rdma_poll(rdma, rdma->send_cq, &wr_id_in, NULL);
>           if (ret < 0) {
> -            error_report("rdma migration: polling error! %d", ret);
> +            error_report("rdma migration: polling error");
>               goto err;
>           }
>   
> @@ -3439,13 +3439,13 @@ static int qemu_rdma_accept(RDMAContext *rdma)
>   
>       ret = rdma_accept(rdma->cm_id, &conn_param);
>       if (ret) {
> -        error_report("rdma_accept returns %d", ret);
> +        error_report("rdma_accept failed");
>           goto err_rdma_dest_wait;
>       }
>   
>       ret = rdma_get_cm_event(rdma->channel, &cm_event);
>       if (ret) {
> -        error_report("rdma_accept get_cm_event failed %d", ret);
> +        error_report("rdma_accept get_cm_event failed");
>           goto err_rdma_dest_wait;
>       }
>   

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 16/52] migration/rdma: Fix io_writev(), io_readv() methods to obey contract
  2023-09-18 14:41 ` [PATCH 16/52] migration/rdma: Fix io_writev(), io_readv() methods to obey contract Markus Armbruster
  2023-09-18 18:57   ` Fabiano Rosas
@ 2023-09-22  8:59   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-22  8:59 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> QIOChannelClass methods qio_channel_rdma_readv() and
> qio_channel_rdma_writev() violate their method contract when
> rdma->error_state is non-zero:
> 
> 1. They return whatever is in rdma->error_state then.  Only -1 will be
>     fine.  -2 will be misinterpreted as "would block".  Anything less
>     than -2 isn't defined in the contract.  A positive value would be
>     misinterpreted as success, but I believe that's not actually
>     possible.
> 
> 2. They neglect to set an error then.  If something up the call stack
>     dereferences the error when failure is returned, it will crash.  If
>     it ignores the return value and checks the error instead, it will
>     miss the error.
> 
> Crap like this happens when return statements hide in macros,
> especially when their uses are far away from the definition.
> 
> I elected not to investigate how callers are impacted.
> 
> Expand the two bad macro uses, so we can set an error and return -1.
> The next commit will then get rid of the macro altogether.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>


Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> ---
>   migration/rdma.c | 12 ++++++++++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 2173cb076f..30e6dff875 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2761,7 +2761,11 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>           return -1;
>       }
>   
> -    CHECK_ERROR_STATE();
> +    if (rdma->error_state) {
> +        error_setg(errp,
> +                   "RDMA is in an error state waiting migration to abort!");
> +        return -1;
> +    }
>   
>       /*
>        * Push out any writes that
> @@ -2847,7 +2851,11 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>           return -1;
>       }
>   
> -    CHECK_ERROR_STATE();
> +    if (rdma->error_state) {
> +        error_setg(errp,
> +                   "RDMA is in an error state waiting migration to abort!");
> +        return -1;
> +    }
>   
>       for (i = 0; i < niov; i++) {
>           size_t want = iov[i].iov_len;

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 17/52] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE()
  2023-09-18 14:41 ` [PATCH 17/52] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE() Markus Armbruster
  2023-09-18 18:57   ` Fabiano Rosas
@ 2023-09-22  9:01   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-22  9:01 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> Hiding return statements in macros is a bad idea.  Use a function
> instead, and open code the return part.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> ---
>   migration/rdma.c | 43 +++++++++++++++++++++++++++----------------
>   1 file changed, 27 insertions(+), 16 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 30e6dff875..be66f53489 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -85,18 +85,6 @@
>    */
>   static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
>   
> -#define CHECK_ERROR_STATE() \
> -    do { \
> -        if (rdma->error_state) { \
> -            if (!rdma->error_reported) { \
> -                error_report("RDMA is in an error state waiting migration" \
> -                                " to abort!"); \
> -                rdma->error_reported = true; \
> -            } \
> -            return rdma->error_state; \
> -        } \
> -    } while (0)
> -
>   /*
>    * A work request ID is 64-bits and we split up these bits
>    * into 3 parts:
> @@ -451,6 +439,16 @@ typedef struct QEMU_PACKED {
>       uint64_t chunks;            /* how many sequential chunks to register */
>   } RDMARegister;
>   
> +static int check_error_state(RDMAContext *rdma)
> +{
> +    if (rdma->error_state && !rdma->error_reported) {
> +        error_report("RDMA is in an error state waiting migration"
> +                     " to abort!");
> +        rdma->error_reported = true;
> +    }
> +    return rdma->error_state;
> +}
> +
>   static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
>   {
>       RDMALocalBlock *local_block;
> @@ -3219,7 +3217,10 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
>           return -EIO;
>       }
>   
> -    CHECK_ERROR_STATE();
> +    ret = check_error_state(rdma);
> +    if (ret) {
> +        return ret;
> +    }
>   
>       qemu_fflush(f);
>   
> @@ -3535,7 +3536,10 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>           return -EIO;
>       }
>   
> -    CHECK_ERROR_STATE();
> +    ret = check_error_state(rdma);
> +    if (ret) {
> +        return ret;
> +    }
>   
>       local = &rdma->local_ram_blocks;
>       do {
> @@ -3839,6 +3843,7 @@ static int qemu_rdma_registration_start(QEMUFile *f,
>   {
>       QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
>       RDMAContext *rdma;
> +    int ret;
>   
>       if (migration_in_postcopy()) {
>           return 0;
> @@ -3850,7 +3855,10 @@ static int qemu_rdma_registration_start(QEMUFile *f,
>           return -EIO;
>       }
>   
> -    CHECK_ERROR_STATE();
> +    ret = check_error_state(rdma);
> +    if (ret) {
> +        return ret;
> +    }
>   
>       trace_qemu_rdma_registration_start(flags);
>       qemu_put_be64(f, RAM_SAVE_FLAG_HOOK);
> @@ -3881,7 +3889,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
>           return -EIO;
>       }
>   
> -    CHECK_ERROR_STATE();
> +    ret = check_error_state(rdma);
> +    if (ret) {
> +        return ret;
> +    }
>   
>       qemu_fflush(f);
>       ret = qemu_rdma_drain_cq(f, rdma);

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 18/52] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error
  2023-09-18 14:41 ` [PATCH 18/52] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error Markus Armbruster
  2023-09-18 19:00   ` Fabiano Rosas
@ 2023-09-22  9:10   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-22  9:10 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> qemu_rdma_resolve_host() and qemu_rdma_dest_init() try addresses until
> they find on that works.  If none works, they return the first Error
> set by qemu_rdma_broken_ipv6_kernel(), or else return a generic one.
> 
> qemu_rdma_broken_ipv6_kernel() neglects to set an Error when
> ibv_open_device() fails.  If a later address fails differently, we use
> that Error instead, or else the generic one.  Harmless enough, but
> needs fixing all the same.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Wow... IPV6 + RDMA, i have never used it, though

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>



> ---
>   migration/rdma.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index be66f53489..08cd186385 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -855,6 +855,8 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
>                   if (errno == EPERM) {
>                       continue;
>                   } else {
> +                    error_setg_errno(errp, errno,
> +                                     "could not open RDMA device context");
>                       return -EINVAL;
>                   }
>               }

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 19/52] migration/rdma: Fix qemu_get_cm_event_timeout() to always set error
  2023-09-18 14:41 ` [PATCH 19/52] migration/rdma: Fix qemu_get_cm_event_timeout() to always " Markus Armbruster
  2023-09-19 16:02   ` Peter Xu
@ 2023-09-22  9:12   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-22  9:12 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> qemu_get_cm_event_timeout() neglects to set an error when it fails
> because rdma_get_cm_event() fails.  Harmless, as its caller
> qemu_rdma_connect() substitutes a generic error then.  Fix it anyway.
> 
> qemu_rdma_connect() also sets the generic error when its own call of
> rdma_get_cm_event() fails.  Make the error handling more obvious: set
> a specific error right after rdma_get_cm_event() fails.  Delete the
> generic error.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>


Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>



> ---
>   migration/rdma.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 08cd186385..d3dc162363 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2509,7 +2509,11 @@ static int qemu_get_cm_event_timeout(RDMAContext *rdma,
>           ERROR(errp, "failed to poll cm event, errno=%i", errno);
>           return -1;
>       } else if (poll_fd.revents & POLLIN) {
> -        return rdma_get_cm_event(rdma->channel, cm_event);
> +        if (rdma_get_cm_event(rdma->channel, cm_event) < 0) {
> +            ERROR(errp, "failed to get cm event");
> +            return -1;
> +        }
> +        return 0;
>       } else {
>           ERROR(errp, "no POLLIN event, revent=%x", poll_fd.revents);
>           return -1;
> @@ -2559,10 +2563,12 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
>           ret = qemu_get_cm_event_timeout(rdma, &cm_event, 5000, errp);
>       } else {
>           ret = rdma_get_cm_event(rdma->channel, &cm_event);
> +        if (ret < 0) {
> +            ERROR(errp, "failed to get cm event");
> +        }
>       }
>       if (ret) {
>           perror("rdma_get_cm_event after rdma_connect");
> -        ERROR(errp, "connecting to destination!");
>           goto err_rdma_source_connect;
>       }
>   

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 15/52] migration/rdma: Ditch useless numeric error codes in error messages
  2023-09-22  8:44   ` Zhijian Li (Fujitsu)
@ 2023-09-22  9:43     ` Markus Armbruster
  0 siblings, 0 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-22  9:43 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu); +Cc: qemu-devel, quintela, peterx, leobras

"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> On 18/09/2023 22:41, Markus Armbruster wrote:
>> Several error messages include numeric error codes returned by failed
>> functions:
>> 
>> * ibv_poll_cq() returns an unspecified negative value.  Useless.
>> 
>> * rdma_accept and rmda_get_cm_event() return -1.  Useless.
>
>
> s/rmda_get_cm_event/rdma_get_cm_event

Sharp eyes!  Will fix.

> Otherwise,
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Thanks!



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 00/52] migration/rdma: Error handling fixes
  2023-09-21  8:27   ` Zhijian Li (Fujitsu)
@ 2023-09-22 15:21     ` Peter Xu
  2023-09-25  8:06       ` Zhijian Li (Fujitsu)
  0 siblings, 1 reply; 174+ messages in thread
From: Peter Xu @ 2023-09-22 15:21 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu); +Cc: Markus Armbruster, qemu-devel, quintela, leobras

On Thu, Sep 21, 2023 at 08:27:24AM +0000, Zhijian Li (Fujitsu) wrote:
> I'm worried that I may not have enough time, ability, or environment to review/test
> the RDMA patches. but for this patch set, i will take a look later.

That'll be helpful, thanks!

So it seems maybe at least we should have an entry for rdma migration to
reflect the state of the code there.  AFAIU we don't strictly need a
maintainer for the entries; an empty entry should suffice, which I can
prepare a patch for it:

RDMA Migration
S: Odd Fixes
F: migration/rdma*

Zhijian, if you still want to get emails when someone changes the code,
maybe you still wanted be listed as a reviewer (even if not a maintainer)?
So that you don't necessarily need to review every time, but just in case
you still want to get notified whenever someone changes it, that means one
line added onto above:

R: Li Zhijian <lizhijian@fujitsu.com>

Do you prefer me to add that R: for you when I send the maintainer file
update?

I'm curious whether Fujitsu is using this code in production, if so it'll
be great if you can be supported as, perhaps part of, your job to maintain
the rdma code.  But maybe that's not the case.

In all cases, thanks a lot for the helps already.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 21/52] migration/rdma: Fix QEMUFileHooks method return values
  2023-09-18 14:41 ` [PATCH 21/52] migration/rdma: Fix QEMUFileHooks method return values Markus Armbruster
@ 2023-09-25  4:08   ` Zhijian Li (Fujitsu)
  2023-09-25  6:36     ` Markus Armbruster
  0 siblings, 1 reply; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-25  4:08 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> The QEMUFileHooks methods don't come with a written contract.  Digging
> through the code calling them, we find:
> 
> * save_page():

I'm fine with this

> 
>    Negative values RAM_SAVE_CONTROL_DELAYED and
>    RAM_SAVE_CONTROL_NOT_SUPP are special.  Any other negative value is
>    an unspecified error.
> 
>    qemu_rdma_save_page() returns -EIO or rdma->error_state on error.  I
>    believe the latter is always negative.  Nothing stops either of them
>    to clash with the special values, though.  Feels unlikely, but fix
>    it anyway to return only the special values and -1.
> 
> * before_ram_iterate(), before_ram_iterate():

error code returned by before_ram_iterate() and before_ram_iterate() will be
assigned to QEMUFile for upper layer.
But it seems that no callers take care about the error ?  Shouldn't let the callers
to check the error instead ?



> 
>    Negative value means error.  qemu_rdma_registration_start() and
>    qemu_rdma_registration_stop() comply as far as I can tell.  Make
>    them comply *obviously*, by returning -1 on error.
> 
> * hook_ram_load:
> 
>    Negative value means error.  rdma_load_hook() already returns -1 on
>    error.  Leave it alone.

see inline reply below,

> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   migration/rdma.c | 79 +++++++++++++++++++++++-------------------------
>   1 file changed, 37 insertions(+), 42 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index cc59155a50..46b5859268 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -3219,12 +3219,11 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
>       rdma = qatomic_rcu_read(&rioc->rdmaout);
>   
>       if (!rdma) {
> -        return -EIO;
> +        return -1;
>       }
>   
> -    ret = check_error_state(rdma);
> -    if (ret) {
> -        return ret;
> +    if (check_error_state(rdma)) {
> +        return -1;
>       }
>   
>       qemu_fflush(f);
> @@ -3290,9 +3289,10 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
>       }
>   
>       return RAM_SAVE_CONTROL_DELAYED;
> +
>   err:
>       rdma->error_state = ret;
> -    return ret;
> +    return -1;
>   }
>   
>   static void rdma_accept_incoming_migration(void *opaque);
> @@ -3538,12 +3538,11 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>       rdma = qatomic_rcu_read(&rioc->rdmain);
>   
>       if (!rdma) {
> -        return -EIO;
> +        return -1;

that's because EIO is not accurate here ?



>       }
>   
> -    ret = check_error_state(rdma);
> -    if (ret) {
> -        return ret;

Ditto


Thanks
Zhijian

> +    if (check_error_state(rdma)) {
> +        return -1;
>       }
>   
>       local = &rdma->local_ram_blocks;
> @@ -3576,7 +3575,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>                                (unsigned int)comp->block_idx,
>                                rdma->local_ram_blocks.nb_blocks);
>                   ret = -EIO;
> -                goto out;
> +                goto err;
>               }
>               block = &(rdma->local_ram_blocks.block[comp->block_idx]);
>   
> @@ -3588,7 +3587,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>   
>           case RDMA_CONTROL_REGISTER_FINISHED:
>               trace_qemu_rdma_registration_handle_finished();
> -            goto out;
> +            return 0;
>   
>           case RDMA_CONTROL_RAM_BLOCKS_REQUEST:
>               trace_qemu_rdma_registration_handle_ram_blocks();
> @@ -3609,7 +3608,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>                   if (ret) {
>                       error_report("rdma migration: error dest "
>                                       "registering ram blocks");
> -                    goto out;
> +                    goto err;
>                   }
>               }
>   
> @@ -3648,7 +3647,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>   
>               if (ret < 0) {
>                   error_report("rdma migration: error sending remote info");
> -                goto out;
> +                goto err;
>               }
>   
>               break;
> @@ -3675,7 +3674,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>                                    (unsigned int)reg->current_index,
>                                    rdma->local_ram_blocks.nb_blocks);
>                       ret = -ENOENT;
> -                    goto out;
> +                    goto err;
>                   }
>                   block = &(rdma->local_ram_blocks.block[reg->current_index]);
>                   if (block->is_ram_block) {
> @@ -3685,7 +3684,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>                               block->block_name, block->offset,
>                               reg->key.current_addr);
>                           ret = -ERANGE;
> -                        goto out;
> +                        goto err;
>                       }
>                       host_addr = (block->local_host_addr +
>                                   (reg->key.current_addr - block->offset));
> @@ -3701,7 +3700,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>                               " chunk: %" PRIx64,
>                               block->block_name, reg->key.chunk);
>                           ret = -ERANGE;
> -                        goto out;
> +                        goto err;
>                       }
>                   }
>                   chunk_start = ram_chunk_start(block, chunk);
> @@ -3713,7 +3712,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>                               chunk, chunk_start, chunk_end)) {
>                       error_report("cannot get rkey");
>                       ret = -EINVAL;
> -                    goto out;
> +                    goto err;
>                   }
>                   reg_result->rkey = tmp_rkey;
>   
> @@ -3730,7 +3729,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>   
>               if (ret < 0) {
>                   error_report("Failed to send control buffer");
> -                goto out;
> +                goto err;
>               }
>               break;
>           case RDMA_CONTROL_UNREGISTER_REQUEST:
> @@ -3753,7 +3752,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>                   if (ret != 0) {
>                       perror("rdma unregistration chunk failed");
>                       ret = -ret;
> -                    goto out;
> +                    goto err;
>                   }
>   
>                   rdma->total_registrations--;
> @@ -3766,24 +3765,23 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>   
>               if (ret < 0) {
>                   error_report("Failed to send control buffer");
> -                goto out;
> +                goto err;
>               }
>               break;
>           case RDMA_CONTROL_REGISTER_RESULT:
>               error_report("Invalid RESULT message at dest.");
>               ret = -EIO;
> -            goto out;
> +            goto err;
>           default:
>               error_report("Unknown control message %s", control_desc(head.type));
>               ret = -EIO;
> -            goto out;
> +            goto err;
>           }
>       } while (1);
> -out:
> -    if (ret < 0) {
> -        rdma->error_state = ret;
> -    }
> -    return ret;
> +
> +err:
> +    rdma->error_state = ret;
> +    return -1;
>   }
>   
>   /* Destination:
> @@ -3805,7 +3803,7 @@ rdma_block_notification_handle(QEMUFile *f, const char *name)
>       rdma = qatomic_rcu_read(&rioc->rdmain);
>   
>       if (!rdma) {
> -        return -EIO;
> +        return -1;
>       }
>   
>       /* Find the matching RAMBlock in our local list */
> @@ -3818,7 +3816,7 @@ rdma_block_notification_handle(QEMUFile *f, const char *name)
>   
>       if (found == -1) {
>           error_report("RAMBlock '%s' not found on destination", name);
> -        return -ENOENT;
> +        return -1;
>       }
>   
>       rdma->local_ram_blocks.block[curr].src_index = rdma->next_src_index;
> @@ -3848,7 +3846,6 @@ static int qemu_rdma_registration_start(QEMUFile *f,
>   {
>       QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
>       RDMAContext *rdma;
> -    int ret;
>   
>       if (migration_in_postcopy()) {
>           return 0;
> @@ -3857,12 +3854,11 @@ static int qemu_rdma_registration_start(QEMUFile *f,
>       RCU_READ_LOCK_GUARD();
>       rdma = qatomic_rcu_read(&rioc->rdmaout);
>       if (!rdma) {
> -        return -EIO;
> +        return -1;
>       }
>   
> -    ret = check_error_state(rdma);
> -    if (ret) {
> -        return ret;
> +    if (check_error_state(rdma)) {
> +        return -1;
>       }
>   
>       trace_qemu_rdma_registration_start(flags);
> @@ -3891,12 +3887,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
>       RCU_READ_LOCK_GUARD();
>       rdma = qatomic_rcu_read(&rioc->rdmaout);
>       if (!rdma) {
> -        return -EIO;
> +        return -1;
>       }
>   
> -    ret = check_error_state(rdma);
> -    if (ret) {
> -        return ret;
> +    if (check_error_state(rdma)) {
> +        return -1;
>       }
>   
>       qemu_fflush(f);
> @@ -3927,7 +3922,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
>                       qemu_rdma_reg_whole_ram_blocks : NULL);
>           if (ret < 0) {
>               fprintf(stderr, "receiving remote info!");
> -            return ret;
> +            return -1;
>           }
>   
>           nb_dest_blocks = resp.len / sizeof(RDMADestBlock);
> @@ -3950,7 +3945,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
>                       "not identical on both the source and destination.",
>                       local->nb_blocks, nb_dest_blocks);
>               rdma->error_state = -EINVAL;
> -            return -EINVAL;
> +            return -1;
>           }
>   
>           qemu_rdma_move_header(rdma, reg_result_idx, &resp);
> @@ -3966,7 +3961,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
>                           local->block[i].length,
>                           rdma->dest_blocks[i].length);
>                   rdma->error_state = -EINVAL;
> -                return -EINVAL;
> +                return -1;
>               }
>               local->block[i].remote_host_addr =
>                       rdma->dest_blocks[i].remote_host_addr;
> @@ -3986,7 +3981,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
>       return 0;
>   err:
>       rdma->error_state = ret;
> -    return ret;
> +    return -1;
>   }
>   
>   static const QEMUFileHooks rdma_read_hooks = {

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 22/52] migration/rdma: Fix rdma_getaddrinfo() error checking
  2023-09-18 14:41 ` [PATCH 22/52] migration/rdma: Fix rdma_getaddrinfo() error checking Markus Armbruster
@ 2023-09-25  5:21   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-25  5:21 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> rdma_getaddrinfo() returns 0 on success.  On error, it returns one of
> the EAI_ error codes like getaddrinfo() does,


Good catch, It used to be -1 on error, rdma_getaddrinfo(3) updated it 2021.



  or -1 with errno set.
> This is broken by design: POSIX implicitly specifies the EAI_ error
> codes to be non-zero, no more.  They could clash with -1.  Nothing we
> can do about this design flaw.
> 
> Both callers of rdma_getaddrinfo() only recognize negative values as
> error.  Works only because systems elect to make the EAI_ error codes
> negative.
> 
> Best not to rely on that: change the callers to treat any non-zero
> value as failure.  Also change them to return -1 instead of the value
> received from getaddrinfo() on failure, to avoid positive error
> values.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>


Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> ---
>   migration/rdma.c | 14 ++++++--------
>   1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 46b5859268..3421ae0796 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -935,14 +935,14 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
>   
>       if (rdma->host == NULL || !strcmp(rdma->host, "")) {
>           ERROR(errp, "RDMA hostname has not been set");
> -        return -EINVAL;
> +        return -1;
>       }
>   
>       /* create CM channel */
>       rdma->channel = rdma_create_event_channel();
>       if (!rdma->channel) {
>           ERROR(errp, "could not create CM channel");
> -        return -EINVAL;
> +        return -1;
>       }
>   
>       /* create CM id */
> @@ -956,7 +956,7 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
>       port_str[15] = '\0';
>   
>       ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
> -    if (ret < 0) {
> +    if (ret) {
>           ERROR(errp, "could not rdma_getaddrinfo address %s", rdma->host);
>           goto err_resolve_get_addr;
>       }
> @@ -998,7 +998,6 @@ route:
>                   rdma_event_str(cm_event->event));
>           error_report("rdma_resolve_addr");
>           rdma_ack_cm_event(cm_event);
> -        ret = -EINVAL;
>           goto err_resolve_get_addr;
>       }
>       rdma_ack_cm_event(cm_event);
> @@ -1019,7 +1018,6 @@ route:
>           ERROR(errp, "result not equal to event_route_resolved: %s",
>                           rdma_event_str(cm_event->event));
>           rdma_ack_cm_event(cm_event);
> -        ret = -EINVAL;
>           goto err_resolve_get_addr;
>       }
>       rdma_ack_cm_event(cm_event);
> @@ -1034,7 +1032,7 @@ err_resolve_get_addr:
>   err_resolve_create_id:
>       rdma_destroy_event_channel(rdma->channel);
>       rdma->channel = NULL;
> -    return ret;
> +    return -1;
>   }
>   
>   /*
> @@ -2644,7 +2642,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
>       port_str[15] = '\0';
>   
>       ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
> -    if (ret < 0) {
> +    if (ret) {
>           ERROR(errp, "could not rdma_getaddrinfo address %s", rdma->host);
>           goto err_dest_init_bind_addr;
>       }
> @@ -2688,7 +2686,7 @@ err_dest_init_create_listen_id:
>       rdma_destroy_event_channel(rdma->channel);
>       rdma->channel = NULL;
>       rdma->error_state = ret;
> -    return ret;
> +    return -1;
>   
>   }
>   

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 23/52] migration/rdma: Clean up qemu_rdma_wait_comp_channel()'s error value
  2023-09-18 14:41 ` [PATCH 23/52] migration/rdma: Clean up qemu_rdma_wait_comp_channel()'s error value Markus Armbruster
@ 2023-09-25  5:43   ` Zhijian Li (Fujitsu)
  2023-09-25  6:55     ` Markus Armbruster
  0 siblings, 1 reply; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-25  5:43 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> qemu_rdma_wait_comp_channel() returns 0 on success, and either -1 or
> rdma->error_state on failure.  Callers actually expect a negative
> error value. 

I don't see the only one callers expect a negative error code.
migration/rdma.c:1654:        ret = qemu_rdma_wait_comp_channel(rdma, ch);
migration/rdma.c-1655-        if (ret) {
migration/rdma.c-1656-            goto err_block_for_wrid;


  I believe rdma->error_state can't be positive, but let's
> make things more obvious by simply returning -1 on any failure.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   migration/rdma.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 3421ae0796..efbb3c7754 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1588,7 +1588,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
>       if (rdma->received_error) {
>           return -EPIPE;
>       }
> -    return rdma->error_state;
> +    return -!!rdma->error_state;

And i rarely see such things, below would be more clear.

return rdma->error_state ? -1 : 0:


>   }
>   
>   static struct ibv_comp_channel *to_channel(RDMAContext *rdma, uint64_t wrid)

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 26/52] migration/rdma: Replace int error_state by bool errored
  2023-09-18 14:41 ` [PATCH 26/52] migration/rdma: Replace int error_state by bool errored Markus Armbruster
@ 2023-09-25  6:17   ` Zhijian Li (Fujitsu)
  2023-09-25  7:09     ` Markus Armbruster
  2023-09-27 17:38   ` Eric Blake
  1 sibling, 1 reply; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-25  6:17 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> All we do with the value of RDMAContext member @error_state is test
> whether it's zero.  Change to bool and rename to @errored.
> 

make sense!

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Can we move this patch ahead "[PATCH 23/52] migration/rdma: Clean up qemu_rdma_wait_comp_channel()'s error value",
so that [23/52] [24/52] [25/52] will be more easy to review.



> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   migration/rdma.c | 66 ++++++++++++++++++++++++------------------------
>   1 file changed, 33 insertions(+), 33 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index ad314cc10a..85f6b274bf 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -352,7 +352,7 @@ typedef struct RDMAContext {
>        * memory registration, then do not attempt any future work
>        * and remember the error state.
>        */
> -    int error_state;
> +    int errored;
>       bool error_reported;
>       bool received_error;
>   
> @@ -439,14 +439,14 @@ typedef struct QEMU_PACKED {
>       uint64_t chunks;            /* how many sequential chunks to register */
>   } RDMARegister;
>   
> -static int check_error_state(RDMAContext *rdma)
> +static bool rdma_errored(RDMAContext *rdma)
>   {
> -    if (rdma->error_state && !rdma->error_reported) {
> +    if (rdma->errored && !rdma->error_reported) {
>           error_report("RDMA is in an error state waiting migration"
>                        " to abort!");
>           rdma->error_reported = true;
>       }
> -    return rdma->error_state;
> +    return rdma->errored;
>   }
>   
>   static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
> @@ -1531,7 +1531,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
>            * But we need to be able to handle 'cancel' or an error
>            * without hanging forever.
>            */
> -        while (!rdma->error_state  && !rdma->received_error) {
> +        while (!rdma->errored && !rdma->received_error) {
>               GPollFD pfds[2];
>               pfds[0].fd = comp_channel->fd;
>               pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
> @@ -1588,7 +1588,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
>       if (rdma->received_error) {
>           return -1;
>       }
> -    return -!!rdma->error_state;
> +    return -rdma->errored;
>   }
>   
>   static struct ibv_comp_channel *to_channel(RDMAContext *rdma, uint64_t wrid)
> @@ -1701,7 +1701,7 @@ err_block_for_wrid:
>           ibv_ack_cq_events(cq, num_cq_events);
>       }
>   
> -    rdma->error_state = ret;
> +    rdma->errored = true;
>       return -1;
>   }
>   
> @@ -2340,7 +2340,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>       int idx;
>   
>       if (rdma->cm_id && rdma->connected) {
> -        if ((rdma->error_state ||
> +        if ((rdma->errored ||
>                migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) &&
>               !rdma->received_error) {
>               RDMAControlHeader head = { .len = 0,
> @@ -2621,14 +2621,14 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
>   
>       if (!rdma->host || !rdma->host[0]) {
>           ERROR(errp, "RDMA host is not set!");
> -        rdma->error_state = -EINVAL;
> +        rdma->errored = true;
>           return -1;
>       }
>       /* create CM channel */
>       rdma->channel = rdma_create_event_channel();
>       if (!rdma->channel) {
>           ERROR(errp, "could not create rdma event channel");
> -        rdma->error_state = -EINVAL;
> +        rdma->errored = true;
>           return -1;
>       }
>   
> @@ -2686,7 +2686,7 @@ err_dest_init_bind_addr:
>   err_dest_init_create_listen_id:
>       rdma_destroy_event_channel(rdma->channel);
>       rdma->channel = NULL;
> -    rdma->error_state = ret;
> +    rdma->errored = true;
>       return -1;
>   
>   }
> @@ -2763,7 +2763,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>           return -1;
>       }
>   
> -    if (rdma->error_state) {
> +    if (rdma->errored) {
>           error_setg(errp,
>                      "RDMA is in an error state waiting migration to abort!");
>           return -1;
> @@ -2775,7 +2775,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>        */
>       ret = qemu_rdma_write_flush(f, rdma);
>       if (ret < 0) {
> -        rdma->error_state = ret;
> +        rdma->errored = true;
>           error_setg(errp, "qemu_rdma_write_flush failed");
>           return -1;
>       }
> @@ -2795,7 +2795,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>               ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
>   
>               if (ret < 0) {
> -                rdma->error_state = ret;
> +                rdma->errored = true;
>                   error_setg(errp, "qemu_rdma_exchange_send failed");
>                   return -1;
>               }
> @@ -2853,7 +2853,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>           return -1;
>       }
>   
> -    if (rdma->error_state) {
> +    if (rdma->errored) {
>           error_setg(errp,
>                      "RDMA is in an error state waiting migration to abort!");
>           return -1;
> @@ -2889,7 +2889,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>           ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_QEMU_FILE);
>   
>           if (ret < 0) {
> -            rdma->error_state = ret;
> +            rdma->errored = true;
>               error_setg(errp, "qemu_rdma_exchange_recv failed");
>               return -1;
>           }
> @@ -3162,21 +3162,21 @@ qio_channel_rdma_shutdown(QIOChannel *ioc,
>       switch (how) {
>       case QIO_CHANNEL_SHUTDOWN_READ:
>           if (rdmain) {
> -            rdmain->error_state = -1;
> +            rdmain->errored = true;
>           }
>           break;
>       case QIO_CHANNEL_SHUTDOWN_WRITE:
>           if (rdmaout) {
> -            rdmaout->error_state = -1;
> +            rdmaout->errored = true;
>           }
>           break;
>       case QIO_CHANNEL_SHUTDOWN_BOTH:
>       default:
>           if (rdmain) {
> -            rdmain->error_state = -1;
> +            rdmain->errored = true;
>           }
>           if (rdmaout) {
> -            rdmaout->error_state = -1;
> +            rdmaout->errored = true;
>           }
>           break;
>       }
> @@ -3221,7 +3221,7 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
>           return -1;
>       }
>   
> -    if (check_error_state(rdma)) {
> +    if (rdma_errored(rdma)) {
>           return -1;
>       }
>   
> @@ -3290,7 +3290,7 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
>       return RAM_SAVE_CONTROL_DELAYED;
>   
>   err:
> -    rdma->error_state = ret;
> +    rdma->errored = true;
>       return -1;
>   }
>   
> @@ -3311,13 +3311,13 @@ static void rdma_cm_poll_handler(void *opaque)
>   
>       if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
>           cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
> -        if (!rdma->error_state &&
> +        if (!rdma->errored &&
>               migration_incoming_get_current()->state !=
>                 MIGRATION_STATUS_COMPLETED) {
>               error_report("receive cm event, cm event is %d", cm_event->event);
> -            rdma->error_state = -EPIPE;
> +            rdma->errored = true;
>               if (rdma->return_path) {
> -                rdma->return_path->error_state = -EPIPE;
> +                rdma->return_path->errored = true;
>               }
>           }
>           rdma_ack_cm_event(cm_event);
> @@ -3483,7 +3483,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
>       return 0;
>   
>   err_rdma_dest_wait:
> -    rdma->error_state = ret;
> +    rdma->errored = true;
>       qemu_rdma_cleanup(rdma);
>       g_free(rdma_return_path);
>       return -1;
> @@ -3540,7 +3540,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>           return -1;
>       }
>   
> -    if (check_error_state(rdma)) {
> +    if (rdma_errored(rdma)) {
>           return -1;
>       }
>   
> @@ -3779,7 +3779,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>       } while (1);
>   
>   err:
> -    rdma->error_state = ret;
> +    rdma->errored = true;
>       return -1;
>   }
>   
> @@ -3856,7 +3856,7 @@ static int qemu_rdma_registration_start(QEMUFile *f,
>           return -1;
>       }
>   
> -    if (check_error_state(rdma)) {
> +    if (rdma_errored(rdma)) {
>           return -1;
>       }
>   
> @@ -3889,7 +3889,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
>           return -1;
>       }
>   
> -    if (check_error_state(rdma)) {
> +    if (rdma_errored(rdma)) {
>           return -1;
>       }
>   
> @@ -3943,7 +3943,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
>                       "Your QEMU command line parameters are probably "
>                       "not identical on both the source and destination.",
>                       local->nb_blocks, nb_dest_blocks);
> -            rdma->error_state = -EINVAL;
> +            rdma->errored = true;
>               return -1;
>           }
>   
> @@ -3959,7 +3959,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
>                           "vs %" PRIu64, local->block[i].block_name, i,
>                           local->block[i].length,
>                           rdma->dest_blocks[i].length);
> -                rdma->error_state = -EINVAL;
> +                rdma->errored = true;
>                   return -1;
>               }
>               local->block[i].remote_host_addr =
> @@ -3979,7 +3979,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
>   
>       return 0;
>   err:
> -    rdma->error_state = ret;
> +    rdma->errored = true;
>       return -1;
>   }
>   

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 27/52] migration/rdma: Drop superfluous assignments to @ret
  2023-09-18 14:41 ` [PATCH 27/52] migration/rdma: Drop superfluous assignments to @ret Markus Armbruster
@ 2023-09-25  6:20   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-25  6:20 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> ---
>   migration/rdma.c | 35 ++++++++++-------------------------
>   1 file changed, 10 insertions(+), 25 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 85f6b274bf..62d95b7d2c 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1514,7 +1514,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
>                                          struct ibv_comp_channel *comp_channel)
>   {
>       struct rdma_cm_event *cm_event;
> -    int ret = -1;
> +    int ret;
>   
>       /*
>        * Coroutine doesn't start until migration_fd_process_incoming()
> @@ -1619,7 +1619,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
>                                       uint64_t wrid_requested,
>                                       uint32_t *byte_len)
>   {
> -    int num_cq_events = 0, ret = 0;
> +    int num_cq_events = 0, ret;
>       struct ibv_cq *cq;
>       void *cq_ctx;
>       uint64_t wr_id = RDMA_WRID_NONE, wr_id_in;
> @@ -1664,8 +1664,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
>   
>           num_cq_events++;
>   
> -        ret = -ibv_req_notify_cq(cq, 0);
> -        if (ret) {
> +        if (ibv_req_notify_cq(cq, 0)) {
>               goto err_block_for_wrid;
>           }
>   
> @@ -1712,7 +1711,7 @@ err_block_for_wrid:
>   static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
>                                          RDMAControlHeader *head)
>   {
> -    int ret = 0;
> +    int ret;
>       RDMAWorkRequestData *wr = &rdma->wr_data[RDMA_WRID_CONTROL];
>       struct ibv_send_wr *bad_wr;
>       struct ibv_sge sge = {
> @@ -1869,7 +1868,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
>                                      int *resp_idx,
>                                      int (*callback)(RDMAContext *rdma))
>   {
> -    int ret = 0;
> +    int ret;
>   
>       /*
>        * Wait until the dest is ready before attempting to deliver the message
> @@ -2841,7 +2840,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>       QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>       RDMAContext *rdma;
>       RDMAControlHeader head;
> -    int ret = 0;
> +    int ret;
>       size_t i;
>       size_t done = 0;
>   
> @@ -3340,7 +3339,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
>       RDMAContext *rdma_return_path = NULL;
>       struct rdma_cm_event *cm_event;
>       struct ibv_context *verbs;
> -    int ret = -EINVAL;
> +    int ret;
>       int idx;
>   
>       ret = rdma_get_cm_event(rdma->channel, &cm_event);
> @@ -3350,7 +3349,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
>   
>       if (cm_event->event != RDMA_CM_EVENT_CONNECT_REQUEST) {
>           rdma_ack_cm_event(cm_event);
> -        ret = -1;
>           goto err_rdma_dest_wait;
>       }
>   
> @@ -3363,7 +3361,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
>           rdma_return_path = qemu_rdma_data_init(rdma->host_port, NULL);
>           if (rdma_return_path == NULL) {
>               rdma_ack_cm_event(cm_event);
> -            ret = -1;
>               goto err_rdma_dest_wait;
>           }
>   
> @@ -3378,7 +3375,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
>           error_report("Unknown source RDMA version: %d, bailing...",
>                        cap.version);
>           rdma_ack_cm_event(cm_event);
> -        ret = -1;
>           goto err_rdma_dest_wait;
>       }
>   
> @@ -3411,7 +3407,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
>       } else if (rdma->verbs != verbs) {
>           error_report("ibv context not matching %p, %p!", rdma->verbs,
>                        verbs);
> -        ret = -1;
>           goto err_rdma_dest_wait;
>       }
>   
> @@ -3465,7 +3460,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
>       if (cm_event->event != RDMA_CM_EVENT_ESTABLISHED) {
>           error_report("rdma_accept not event established");
>           rdma_ack_cm_event(cm_event);
> -        ret = -1;
>           goto err_rdma_dest_wait;
>       }
>   
> @@ -3528,7 +3522,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>       static RDMARegisterResult results[RDMA_CONTROL_MAX_COMMANDS_PER_MESSAGE];
>       RDMALocalBlock *block;
>       void *host_addr;
> -    int ret = 0;
> +    int ret;
>       int idx = 0;
>       int count = 0;
>       int i = 0;
> @@ -3557,7 +3551,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>           if (head.repeat > RDMA_CONTROL_MAX_COMMANDS_PER_MESSAGE) {
>               error_report("rdma: Too many requests in this message (%d)."
>                               "Bailing.", head.repeat);
> -            ret = -EIO;
>               break;
>           }
>   
> @@ -3573,7 +3566,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>                   error_report("rdma: 'compress' bad block index %u (vs %d)",
>                                (unsigned int)comp->block_idx,
>                                rdma->local_ram_blocks.nb_blocks);
> -                ret = -EIO;
>                   goto err;
>               }
>               block = &(rdma->local_ram_blocks.block[comp->block_idx]);
> @@ -3672,7 +3664,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>                       error_report("rdma: 'register' bad block index %u (vs %d)",
>                                    (unsigned int)reg->current_index,
>                                    rdma->local_ram_blocks.nb_blocks);
> -                    ret = -ENOENT;
>                       goto err;
>                   }
>                   block = &(rdma->local_ram_blocks.block[reg->current_index]);
> @@ -3682,7 +3673,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>                               " offset: %" PRIx64 " current_addr: %" PRIx64,
>                               block->block_name, block->offset,
>                               reg->key.current_addr);
> -                        ret = -ERANGE;
>                           goto err;
>                       }
>                       host_addr = (block->local_host_addr +
> @@ -3698,7 +3688,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>                           error_report("rdma: bad chunk for block %s"
>                               " chunk: %" PRIx64,
>                               block->block_name, reg->key.chunk);
> -                        ret = -ERANGE;
>                           goto err;
>                       }
>                   }
> @@ -3710,7 +3699,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>                               (uintptr_t)host_addr, NULL, &tmp_rkey,
>                               chunk, chunk_start, chunk_end)) {
>                       error_report("cannot get rkey");
> -                    ret = -EINVAL;
>                       goto err;
>                   }
>                   reg_result->rkey = tmp_rkey;
> @@ -3750,7 +3738,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>   
>                   if (ret != 0) {
>                       perror("rdma unregistration chunk failed");
> -                    ret = -ret;
>                       goto err;
>                   }
>   
> @@ -3769,11 +3756,9 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>               break;
>           case RDMA_CONTROL_REGISTER_RESULT:
>               error_report("Invalid RESULT message at dest.");
> -            ret = -EIO;
>               goto err;
>           default:
>               error_report("Unknown control message %s", control_desc(head.type));
> -            ret = -EIO;
>               goto err;
>           }
>       } while (1);
> @@ -3877,7 +3862,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
>       QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
>       RDMAContext *rdma;
>       RDMAControlHeader head = { .len = 0, .repeat = 1 };
> -    int ret = 0;
> +    int ret;
>   
>       if (migration_in_postcopy()) {
>           return 0;
> @@ -4151,7 +4136,7 @@ void rdma_start_outgoing_migration(void *opaque,
>       MigrationState *s = opaque;
>       RDMAContext *rdma_return_path = NULL;
>       RDMAContext *rdma;
> -    int ret = 0;
> +    int ret;
>   
>       /* Avoid ram_block_discard_disable(), cannot change during migration. */
>       if (ram_block_discard_is_required()) {

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 28/52] migration/rdma: Check negative error values the same way everywhere
  2023-09-18 14:41 ` [PATCH 28/52] migration/rdma: Check negative error values the same way everywhere Markus Armbruster
@ 2023-09-25  6:26   ` Zhijian Li (Fujitsu)
  2023-09-25  7:29     ` Markus Armbruster
  0 siblings, 1 reply; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-25  6:26 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel, quintela, peterx; +Cc: leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> When a function returns 0 on success, negative value on error,
> checking for non-zero suffices, but checking for negative is clearer.
> So do that.
> 

This patch is no my favor convention.

@Peter, Juan

I'd like to hear your opinions.

Thanks
Zhijian


> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   migration/rdma.c | 82 ++++++++++++++++++++++++------------------------
>   1 file changed, 41 insertions(+), 41 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 62d95b7d2c..6c643a1b30 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -947,7 +947,7 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
>   
>       /* create CM id */
>       ret = rdma_create_id(rdma->channel, &rdma->cm_id, NULL, RDMA_PS_TCP);
> -    if (ret) {
> +    if (ret < 0) {
>           ERROR(errp, "could not create channel id");
>           goto err_resolve_create_id;
>       }
> @@ -968,10 +968,10 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
>   
>           ret = rdma_resolve_addr(rdma->cm_id, NULL, e->ai_dst_addr,
>                   RDMA_RESOLVE_TIMEOUT_MS);
> -        if (!ret) {
> +        if (ret >= 0) {
>               if (e->ai_family == AF_INET6) {
>                   ret = qemu_rdma_broken_ipv6_kernel(rdma->cm_id->verbs, errp);
> -                if (ret) {
> +                if (ret < 0) {
>                       continue;
>                   }
>               }
> @@ -988,7 +988,7 @@ route:
>       qemu_rdma_dump_gid("source_resolve_addr", rdma->cm_id);
>   
>       ret = rdma_get_cm_event(rdma->channel, &cm_event);
> -    if (ret) {
> +    if (ret < 0) {
>           ERROR(errp, "could not perform event_addr_resolved");
>           goto err_resolve_get_addr;
>       }
> @@ -1004,13 +1004,13 @@ route:
>   
>       /* resolve route */
>       ret = rdma_resolve_route(rdma->cm_id, RDMA_RESOLVE_TIMEOUT_MS);
> -    if (ret) {
> +    if (ret < 0) {
>           ERROR(errp, "could not resolve rdma route");
>           goto err_resolve_get_addr;
>       }
>   
>       ret = rdma_get_cm_event(rdma->channel, &cm_event);
> -    if (ret) {
> +    if (ret < 0) {
>           ERROR(errp, "could not perform event_route_resolved");
>           goto err_resolve_get_addr;
>       }
> @@ -1118,7 +1118,7 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma)
>       attr.qp_type = IBV_QPT_RC;
>   
>       ret = rdma_create_qp(rdma->cm_id, rdma->pd, &attr);
> -    if (ret) {
> +    if (ret < 0) {
>           return -1;
>       }
>   
> @@ -1551,7 +1551,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
>   
>                   if (pfds[1].revents) {
>                       ret = rdma_get_cm_event(rdma->channel, &cm_event);
> -                    if (ret) {
> +                    if (ret < 0) {
>                           error_report("failed to get cm event while wait "
>                                        "completion channel");
>                           return -1;
> @@ -1652,12 +1652,12 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
>   
>       while (1) {
>           ret = qemu_rdma_wait_comp_channel(rdma, ch);
> -        if (ret) {
> +        if (ret < 0) {
>               goto err_block_for_wrid;
>           }
>   
>           ret = ibv_get_cq_event(ch, &cq, &cq_ctx);
> -        if (ret) {
> +        if (ret < 0) {
>               perror("ibv_get_cq_event");
>               goto err_block_for_wrid;
>           }
> @@ -1888,7 +1888,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
>        */
>       if (resp) {
>           ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_DATA);
> -        if (ret) {
> +        if (ret < 0) {
>               error_report("rdma migration: error posting"
>                       " extra control recv for anticipated result!");
>               return -1;
> @@ -1899,7 +1899,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
>        * Post a WR to replace the one we just consumed for the READY message.
>        */
>       ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
> -    if (ret) {
> +    if (ret < 0) {
>           error_report("rdma migration: error posting first control recv!");
>           return -1;
>       }
> @@ -1986,7 +1986,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
>        * Post a new RECV work request to replace the one we just consumed.
>        */
>       ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
> -    if (ret) {
> +    if (ret < 0) {
>           error_report("rdma migration: error posting second control recv!");
>           return -1;
>       }
> @@ -2311,7 +2311,7 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
>       /* If we cannot merge it, we flush the current buffer first. */
>       if (!qemu_rdma_buffer_mergable(rdma, current_addr, len)) {
>           ret = qemu_rdma_write_flush(f, rdma);
> -        if (ret) {
> +        if (ret < 0) {
>               return -1;
>           }
>           rdma->current_length = 0;
> @@ -2441,12 +2441,12 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
>       rdma->pin_all = pin_all;
>   
>       ret = qemu_rdma_resolve_host(rdma, errp);
> -    if (ret) {
> +    if (ret < 0) {
>           goto err_rdma_source_init;
>       }
>   
>       ret = qemu_rdma_alloc_pd_cq(rdma);
> -    if (ret) {
> +    if (ret < 0) {
>           ERROR(errp, "rdma migration: error allocating pd and cq! Your mlock()"
>                       " limits may be too low. Please check $ ulimit -a # and "
>                       "search for 'ulimit -l' in the output");
> @@ -2454,7 +2454,7 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
>       }
>   
>       ret = qemu_rdma_alloc_qp(rdma);
> -    if (ret) {
> +    if (ret < 0) {
>           ERROR(errp, "rdma migration: error allocating qp!");
>           goto err_rdma_source_init;
>       }
> @@ -2471,7 +2471,7 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
>   
>       for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
>           ret = qemu_rdma_reg_control(rdma, idx);
> -        if (ret) {
> +        if (ret < 0) {
>               ERROR(errp, "rdma migration: error registering %d control!",
>                                                               idx);
>               goto err_rdma_source_init;
> @@ -2545,13 +2545,13 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
>       caps_to_network(&cap);
>   
>       ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
> -    if (ret) {
> +    if (ret < 0) {
>           ERROR(errp, "posting second control recv");
>           goto err_rdma_source_connect;
>       }
>   
>       ret = rdma_connect(rdma->cm_id, &conn_param);
> -    if (ret) {
> +    if (ret < 0) {
>           perror("rdma_connect");
>           ERROR(errp, "connecting to destination!");
>           goto err_rdma_source_connect;
> @@ -2565,7 +2565,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
>               ERROR(errp, "failed to get cm event");
>           }
>       }
> -    if (ret) {
> +    if (ret < 0) {
>           perror("rdma_get_cm_event after rdma_connect");
>           goto err_rdma_source_connect;
>       }
> @@ -2633,7 +2633,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
>   
>       /* create CM id */
>       ret = rdma_create_id(rdma->channel, &listen_id, NULL, RDMA_PS_TCP);
> -    if (ret) {
> +    if (ret < 0) {
>           ERROR(errp, "could not create cm_id!");
>           goto err_dest_init_create_listen_id;
>       }
> @@ -2649,7 +2649,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
>   
>       ret = rdma_set_option(listen_id, RDMA_OPTION_ID, RDMA_OPTION_ID_REUSEADDR,
>                             &reuse, sizeof reuse);
> -    if (ret) {
> +    if (ret < 0) {
>           ERROR(errp, "Error: could not set REUSEADDR option");
>           goto err_dest_init_bind_addr;
>       }
> @@ -2658,12 +2658,12 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
>               &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof ip);
>           trace_qemu_rdma_dest_init_trying(rdma->host, ip);
>           ret = rdma_bind_addr(listen_id, e->ai_dst_addr);
> -        if (ret) {
> +        if (ret < 0) {
>               continue;
>           }
>           if (e->ai_family == AF_INET6) {
>               ret = qemu_rdma_broken_ipv6_kernel(listen_id->verbs, errp);
> -            if (ret) {
> +            if (ret < 0) {
>                   continue;
>               }
>           }
> @@ -3303,7 +3303,7 @@ static void rdma_cm_poll_handler(void *opaque)
>       MigrationIncomingState *mis = migration_incoming_get_current();
>   
>       ret = rdma_get_cm_event(rdma->channel, &cm_event);
> -    if (ret) {
> +    if (ret < 0) {
>           error_report("get_cm_event failed %d", errno);
>           return;
>       }
> @@ -3343,7 +3343,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
>       int idx;
>   
>       ret = rdma_get_cm_event(rdma->channel, &cm_event);
> -    if (ret) {
> +    if (ret < 0) {
>           goto err_rdma_dest_wait;
>       }
>   
> @@ -3413,13 +3413,13 @@ static int qemu_rdma_accept(RDMAContext *rdma)
>       qemu_rdma_dump_id("dest_init", verbs);
>   
>       ret = qemu_rdma_alloc_pd_cq(rdma);
> -    if (ret) {
> +    if (ret < 0) {
>           error_report("rdma migration: error allocating pd and cq!");
>           goto err_rdma_dest_wait;
>       }
>   
>       ret = qemu_rdma_alloc_qp(rdma);
> -    if (ret) {
> +    if (ret < 0) {
>           error_report("rdma migration: error allocating qp!");
>           goto err_rdma_dest_wait;
>       }
> @@ -3428,7 +3428,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
>   
>       for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
>           ret = qemu_rdma_reg_control(rdma, idx);
> -        if (ret) {
> +        if (ret < 0) {
>               error_report("rdma: error registering %d control", idx);
>               goto err_rdma_dest_wait;
>           }
> @@ -3446,13 +3446,13 @@ static int qemu_rdma_accept(RDMAContext *rdma)
>       }
>   
>       ret = rdma_accept(rdma->cm_id, &conn_param);
> -    if (ret) {
> +    if (ret < 0) {
>           error_report("rdma_accept failed");
>           goto err_rdma_dest_wait;
>       }
>   
>       ret = rdma_get_cm_event(rdma->channel, &cm_event);
> -    if (ret) {
> +    if (ret < 0) {
>           error_report("rdma_accept get_cm_event failed");
>           goto err_rdma_dest_wait;
>       }
> @@ -3467,7 +3467,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
>       rdma->connected = true;
>   
>       ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
> -    if (ret) {
> +    if (ret < 0) {
>           error_report("rdma migration: error posting second control recv");
>           goto err_rdma_dest_wait;
>       }
> @@ -3596,7 +3596,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>   
>               if (rdma->pin_all) {
>                   ret = qemu_rdma_reg_whole_ram_blocks(rdma);
> -                if (ret) {
> +                if (ret < 0) {
>                       error_report("rdma migration: error dest "
>                                       "registering ram blocks");
>                       goto err;
> @@ -4057,7 +4057,7 @@ static void rdma_accept_incoming_migration(void *opaque)
>       trace_qemu_rdma_accept_incoming_migration();
>       ret = qemu_rdma_accept(rdma);
>   
> -    if (ret) {
> +    if (ret < 0) {
>           fprintf(stderr, "RDMA ERROR: Migration initialization failed\n");
>           return;
>       }
> @@ -4101,7 +4101,7 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
>       }
>   
>       ret = qemu_rdma_dest_init(rdma, errp);
> -    if (ret) {
> +    if (ret < 0) {
>           goto err;
>       }
>   
> @@ -4109,7 +4109,7 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
>   
>       ret = rdma_listen(rdma->listen_id, 5);
>   
> -    if (ret) {
> +    if (ret < 0) {
>           ERROR(errp, "listening on socket!");
>           goto cleanup_rdma;
>       }
> @@ -4151,14 +4151,14 @@ void rdma_start_outgoing_migration(void *opaque,
>   
>       ret = qemu_rdma_source_init(rdma, migrate_rdma_pin_all(), errp);
>   
> -    if (ret) {
> +    if (ret < 0) {
>           goto err;
>       }
>   
>       trace_rdma_start_outgoing_migration_after_rdma_source_init();
>       ret = qemu_rdma_connect(rdma, false, errp);
>   
> -    if (ret) {
> +    if (ret < 0) {
>           goto err;
>       }
>   
> @@ -4173,13 +4173,13 @@ void rdma_start_outgoing_migration(void *opaque,
>           ret = qemu_rdma_source_init(rdma_return_path,
>                                       migrate_rdma_pin_all(), errp);
>   
> -        if (ret) {
> +        if (ret < 0) {
>               goto return_path_err;
>           }
>   
>           ret = qemu_rdma_connect(rdma_return_path, true, errp);
>   
> -        if (ret) {
> +        if (ret < 0) {
>               goto return_path_err;
>           }
>   

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 29/52] migration/rdma: Plug a memory leak and improve a message
  2023-09-18 14:41 ` [PATCH 29/52] migration/rdma: Plug a memory leak and improve a message Markus Armbruster
@ 2023-09-25  6:31   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-25  6:31 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> When migration capability @rdma-pin-all is true, but the server cannot
> honor it, qemu_rdma_connect() calls macro ERROR(), then returns
> success.
> 
> ERROR() sets an error.  Since qemu_rdma_connect() returns success, its
> caller rdma_start_outgoing_migration() duly assumes @errp is still
> clear.  The Error object leaks.
> 
> ERROR() additionally reports the situation to the user as an error:
> 
>      RDMA ERROR: Server cannot support pinning all memory. Will register memory dynamically.
> 
> Is this an error or not?  It actually isn't; we disable @rdma-pin-all
> and carry on.  "Correcting" the user's configuration decisions that
> way feels problematic, but that's a topic for another day.
> 
> Replace ERROR() by warn_report().  This plugs the memory leak, and
> emits a clearer message to the user.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>


Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

> ---
>   migration/rdma.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 6c643a1b30..d52de857c5 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2586,8 +2586,8 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
>        * and disable them otherwise.
>        */
>       if (rdma->pin_all && !(cap.flags & RDMA_CAPABILITY_PIN_ALL)) {
> -        ERROR(errp, "Server cannot support pinning all memory. "
> -                        "Will register memory dynamically.");
> +        warn_report("RDMA: Server cannot support pinning all memory. "
> +                    "Will register memory dynamically.");
>           rdma->pin_all = false;
>       }
>   

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 30/52] migration/rdma: Delete inappropriate error_report() in macro ERROR()
  2023-09-18 14:41 ` [PATCH 30/52] migration/rdma: Delete inappropriate error_report() in macro ERROR() Markus Armbruster
@ 2023-09-25  6:35   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-25  6:35 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
> 
> Macro ERROR() violates this principle.  Delete the error_report()
> there.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

And

Tested-by: Li Zhijian <lizhijian@fujitsu.com>

> ---
>   migration/rdma.c | 4 ----
>   1 file changed, 4 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index d52de857c5..be31694d4f 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -40,12 +40,8 @@
>   #include "options.h"
>   #include <poll.h>
>   
> -/*
> - * Print and error on both the Monitor and the Log file.
> - */
>   #define ERROR(errp, fmt, ...) \
>       do { \
> -        fprintf(stderr, "RDMA ERROR: " fmt "\n", ## __VA_ARGS__); \
>           if (errp && (*(errp) == NULL)) { \
>               error_setg(errp, "RDMA ERROR: " fmt, ## __VA_ARGS__); \
>           } \

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 21/52] migration/rdma: Fix QEMUFileHooks method return values
  2023-09-25  4:08   ` Zhijian Li (Fujitsu)
@ 2023-09-25  6:36     ` Markus Armbruster
  2023-09-25  7:03       ` Zhijian Li (Fujitsu)
  0 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-25  6:36 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu)
  Cc: Markus Armbruster, qemu-devel, quintela, peterx, leobras

"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> On 18/09/2023 22:41, Markus Armbruster wrote:
>> The QEMUFileHooks methods don't come with a written contract.  Digging
>> through the code calling them, we find:
>> 
>> * save_page():
>
> I'm fine with this
>
>> 
>>    Negative values RAM_SAVE_CONTROL_DELAYED and
>>    RAM_SAVE_CONTROL_NOT_SUPP are special.  Any other negative value is
>>    an unspecified error.
>> 
>>    qemu_rdma_save_page() returns -EIO or rdma->error_state on error.  I
>>    believe the latter is always negative.  Nothing stops either of them
>>    to clash with the special values, though.  Feels unlikely, but fix
>>    it anyway to return only the special values and -1.
>> 
>> * before_ram_iterate(), before_ram_iterate():
>
> error code returned by before_ram_iterate() and before_ram_iterate() will be
> assigned to QEMUFile for upper layer.
> But it seems that no callers take care about the error ?  Shouldn't let the callers
> to check the error instead ?

The error values returned by qemu_rdma_registration_start() and
qemu_rdma_registration_stop() carry no additional information a caller
could check.

Both return either -EIO or rdma->error_state on error.  The latter is
*not* a negative errno code.  Evidence: qio_channel_rdma_shutdown()
assigns -1, qemu_rdma_block_for_wrid() assigns the error value of
qemu_rdma_poll(), which can be the error value of ibv_poll_cq(), which
is an unspecified negative value, ...

I decided not to investigate what qemu-file.c does with the error values
after one quick glance at the code.  It's confusing, and quite possibly
confused.  But I'm already at 50+ patches, and am neither inclined nor
able to take on more migration cleanup at this time.

>>    Negative value means error.  qemu_rdma_registration_start() and
>>    qemu_rdma_registration_stop() comply as far as I can tell.  Make
>>    them comply *obviously*, by returning -1 on error.
>> 
>> * hook_ram_load:
>> 
>>    Negative value means error.  rdma_load_hook() already returns -1 on
>>    error.  Leave it alone.
>
> see inline reply below,
>
>> 
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   migration/rdma.c | 79 +++++++++++++++++++++++-------------------------
>>   1 file changed, 37 insertions(+), 42 deletions(-)
>> 
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index cc59155a50..46b5859268 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -3219,12 +3219,11 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
>>       rdma = qatomic_rcu_read(&rioc->rdmaout);
>>   
>>       if (!rdma) {
>> -        return -EIO;
>> +        return -1;
>>       }
>>   
>> -    ret = check_error_state(rdma);
>> -    if (ret) {
>> -        return ret;
>> +    if (check_error_state(rdma)) {
>> +        return -1;
>>       }
>>   
>>       qemu_fflush(f);
>> @@ -3290,9 +3289,10 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
>>       }
>>   
>>       return RAM_SAVE_CONTROL_DELAYED;
>> +
>>   err:
>>       rdma->error_state = ret;
>> -    return ret;
>> +    return -1;
>>   }
>>   
>>   static void rdma_accept_incoming_migration(void *opaque);
>> @@ -3538,12 +3538,11 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>>       rdma = qatomic_rcu_read(&rioc->rdmain);
>>   
>>       if (!rdma) {
>> -        return -EIO;
>> +        return -1;
>
> that's because EIO is not accurate here ?

It's because the function does not return a negative errno code, and
returning -EIO is misleading readers into assuming it does.

>>       }
>>   
>> -    ret = check_error_state(rdma);
>> -    if (ret) {
>> -        return ret;
>
> Ditto

Likewise.

[...]



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 23/52] migration/rdma: Clean up qemu_rdma_wait_comp_channel()'s error value
  2023-09-25  5:43   ` Zhijian Li (Fujitsu)
@ 2023-09-25  6:55     ` Markus Armbruster
  0 siblings, 0 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-25  6:55 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu); +Cc: qemu-devel, quintela, peterx, leobras

"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> On 18/09/2023 22:41, Markus Armbruster wrote:
>> qemu_rdma_wait_comp_channel() returns 0 on success, and either -1 or
>> rdma->error_state on failure.  Callers actually expect a negative
>> error value. 
>
> I don't see the only one callers expect a negative error code.
> migration/rdma.c:1654:        ret = qemu_rdma_wait_comp_channel(rdma, ch);
> migration/rdma.c-1655-        if (ret) {
> migration/rdma.c-1656-            goto err_block_for_wrid;

You're right.

I want the change anyway, to let me simplify the code some.  I'll adjust
the commit message.

>   I believe rdma->error_state can't be positive, but let's
>> make things more obvious by simply returning -1 on any failure.
>> 
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   migration/rdma.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index 3421ae0796..efbb3c7754 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -1588,7 +1588,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
>>       if (rdma->received_error) {
>>           return -EPIPE;
>>       }
>> -    return rdma->error_state;
>> +    return -!!rdma->error_state;
>
> And i rarely see such things, below would be more clear.
>
> return rdma->error_state ? -1 : 0:

Goes away in PATCH 26:

   @@ -1588,7 +1588,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
        if (rdma->received_error) {
            return -1;
        }
   -    return -!!rdma->error_state;
   +    return -rdma->errored;
    }

    static struct ibv_comp_channel *to_channel(RDMAContext *rdma, uint64_t wrid)

>
>>   }
>>   
>>   static struct ibv_comp_channel *to_channel(RDMAContext *rdma, uint64_t wrid)



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 21/52] migration/rdma: Fix QEMUFileHooks method return values
  2023-09-25  6:36     ` Markus Armbruster
@ 2023-09-25  7:03       ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-25  7:03 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, quintela, peterx, leobras



On 25/09/2023 14:36, Markus Armbruster wrote:
> "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:
> 
>> On 18/09/2023 22:41, Markus Armbruster wrote:
>>> The QEMUFileHooks methods don't come with a written contract.  Digging
>>> through the code calling them, we find:
>>>
>>> * save_page():
>>
>> I'm fine with this
>>
>>>
>>>     Negative values RAM_SAVE_CONTROL_DELAYED and
>>>     RAM_SAVE_CONTROL_NOT_SUPP are special.  Any other negative value is
>>>     an unspecified error.
>>>
>>>     qemu_rdma_save_page() returns -EIO or rdma->error_state on error.  I
>>>     believe the latter is always negative.  Nothing stops either of them
>>>     to clash with the special values, though.  Feels unlikely, but fix
>>>     it anyway to return only the special values and -1.
>>>
>>> * before_ram_iterate(), before_ram_iterate():
>>
>> error code returned by before_ram_iterate() and before_ram_iterate() will be
>> assigned to QEMUFile for upper layer.
>> But it seems that no callers take care about the error ?  Shouldn't let the callers
>> to check the error instead ?
> 
> The error values returned by qemu_rdma_registration_start() and
> qemu_rdma_registration_stop() carry no additional information a caller
> could check.


I think qemu_file_get_error(f) can be used for callers to check the error code.



> 
> Both return either -EIO or rdma->error_state on error.  The latter is
> *not* a negative errno code.  Evidence: qio_channel_rdma_shutdown()
> assigns -1, qemu_rdma_block_for_wrid() assigns the error value of
> qemu_rdma_poll(), which can be the error value of ibv_poll_cq(), which
> is an unspecified negative value, ...
> 
you are right.


> I decided not to investigate what qemu-file.c does with the error values
> after one quick glance at the code.  It's confusing, and quite possibly
> confused.  But I'm already at 50+ patches, and am neither inclined nor
> able to take on more migration cleanup at this time.

Yeah, it's already big enough patch set.

very thanks

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> 
>>>     Negative value means error.  qemu_rdma_registration_start() and
>>>     qemu_rdma_registration_stop() comply as far as I can tell.  Make
>>>     them comply *obviously*, by returning -1 on error.
>>>
>>> * hook_ram_load:
>>>
>>>     Negative value means error.  rdma_load_hook() already returns -1 on
>>>     error.  Leave it alone.
>>
>> see inline reply below,
>>
>>>
>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>> ---
>>>    migration/rdma.c | 79 +++++++++++++++++++++++-------------------------
>>>    1 file changed, 37 insertions(+), 42 deletions(-)
>>>
>>> diff --git a/migration/rdma.c b/migration/rdma.c
>>> index cc59155a50..46b5859268 100644
>>> --- a/migration/rdma.c
>>> +++ b/migration/rdma.c
>>> @@ -3219,12 +3219,11 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
>>>        rdma = qatomic_rcu_read(&rioc->rdmaout);
>>>    
>>>        if (!rdma) {
>>> -        return -EIO;
>>> +        return -1;
>>>        }
>>>    
>>> -    ret = check_error_state(rdma);
>>> -    if (ret) {
>>> -        return ret;
>>> +    if (check_error_state(rdma)) {
>>> +        return -1;
>>>        }
>>>    
>>>        qemu_fflush(f);
>>> @@ -3290,9 +3289,10 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
>>>        }
>>>    
>>>        return RAM_SAVE_CONTROL_DELAYED;
>>> +
>>>    err:
>>>        rdma->error_state = ret;
>>> -    return ret;
>>> +    return -1;
>>>    }
>>>    
>>>    static void rdma_accept_incoming_migration(void *opaque);
>>> @@ -3538,12 +3538,11 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>>>        rdma = qatomic_rcu_read(&rioc->rdmain);
>>>    
>>>        if (!rdma) {
>>> -        return -EIO;
>>> +        return -1;
>>
>> that's because EIO is not accurate here ?
> 
> It's because the function does not return a negative errno code, and
> returning -EIO is misleading readers into assuming it does
> 
>>>        }
>>>    
>>> -    ret = check_error_state(rdma);
>>> -    if (ret) {
>>> -        return ret;
>>
>> Ditto
> 
> Likewise.
> 
> [...]
> 

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 26/52] migration/rdma: Replace int error_state by bool errored
  2023-09-25  6:17   ` Zhijian Li (Fujitsu)
@ 2023-09-25  7:09     ` Markus Armbruster
  2023-09-26 10:18       ` Zhijian Li (Fujitsu)
  0 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-25  7:09 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu)
  Cc: Markus Armbruster, qemu-devel, quintela, peterx, leobras

"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> On 18/09/2023 22:41, Markus Armbruster wrote:
>> All we do with the value of RDMAContext member @error_state is test
>> whether it's zero.  Change to bool and rename to @errored.
>> 
>
> make sense!
>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
>
> Can we move this patch ahead "[PATCH 23/52] migration/rdma: Clean up qemu_rdma_wait_comp_channel()'s error value",
> so that [23/52] [24/52] [25/52] will be more easy to review.

I think I could squash PATCH 23 into "[PATCH 25/52] migration/rdma: Dumb
down remaining int error values to -1".  Would that work for you?



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 28/52] migration/rdma: Check negative error values the same way everywhere
  2023-09-25  6:26   ` Zhijian Li (Fujitsu)
@ 2023-09-25  7:29     ` Markus Armbruster
  2023-10-04 16:32       ` Juan Quintela
  0 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-25  7:29 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu)
  Cc: Markus Armbruster, qemu-devel, quintela, peterx, leobras

"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> On 18/09/2023 22:41, Markus Armbruster wrote:
>> When a function returns 0 on success, negative value on error,
>> checking for non-zero suffices, but checking for negative is clearer.
>> So do that.
>> 
>
> This patch is no my favor convention.

Certainly a matter of taste, which means maintainers get to decide, not
me.

Failure checks can be confusing in C.  Is

    if (foo(...))

checking for success or for failure?  Impossible to tell.  If foo()
returns a pointer, it almost certainly checks for success.  If it
returns bool, likewise.  But if it returns an integer, it probably
checks for failure.

Getting a condition backwards is a common coding mistake.  Consider
patch review of

    if (condition) {
        obviously the error case
    }

Patch review is more likely to catch a backward condition when the
condition's sense is locally obvious.

Conventions can help.  Here's the one I like:

* Check for a function's failure the same way everywhere.

* When a function returns something "truthy" on success, and something
  "falsy" on failure, use

    if (!fun(...))

  Special cases:

  - bool true on success, false on failure

  - non-null pointer on success, null pointer on failure

* When a function returns non-negative value on success, negative value
  on failure, use

    if (fun(...) < 0)

* Avoid non-negative integer error values.

* Avoid if (fun(...)), because it's locally ambiguous.

> @Peter, Juan
>
> I'd like to hear your opinions.

Yes, please.



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 31/52] migration/rdma: Retire macro ERROR()
  2023-09-18 14:41 ` [PATCH 31/52] migration/rdma: Retire " Markus Armbruster
@ 2023-09-25  7:31   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-25  7:31 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> ERROR() has become "error_setg() unless an error has been set
> already".  Hiding the conditional in the macro is in the way of
> further work.  Replace the macro uses by their expansion, and delete
> the macro.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>


Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>



> ---
>   migration/rdma.c | 168 +++++++++++++++++++++++++++++++++--------------
>   1 file changed, 120 insertions(+), 48 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index be31694d4f..df5b3a8e2c 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -40,13 +40,6 @@
>   #include "options.h"
>   #include <poll.h>
>   
> -#define ERROR(errp, fmt, ...) \
> -    do { \
> -        if (errp && (*(errp) == NULL)) { \
> -            error_setg(errp, "RDMA ERROR: " fmt, ## __VA_ARGS__); \
> -        } \
> -    } while (0)
> -
>   #define RDMA_RESOLVE_TIMEOUT_MS 10000
>   
>   /* Do not merge data if larger than this. */
> @@ -859,7 +852,10 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
>   
>               if (ibv_query_port(verbs, 1, &port_attr)) {
>                   ibv_close_device(verbs);
> -                ERROR(errp, "Could not query initial IB port");
> +                if (errp && !*errp) {
> +                    error_setg(errp,
> +                               "RDMA ERROR: Could not query initial IB port");
> +                }
>                   return -1;
>               }
>   
> @@ -882,9 +878,12 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
>                                   " migrate over the IB fabric until the kernel "
>                                   " fixes the bug.\n");
>               } else {
> -                ERROR(errp, "You only have RoCE / iWARP devices in your systems"
> -                            " and your management software has specified '[::]'"
> -                            ", but IPv6 over RoCE / iWARP is not supported in Linux.");
> +                if (errp && !*errp) {
> +                    error_setg(errp, "RDMA ERROR: "
> +                               "You only have RoCE / iWARP devices in your systems"
> +                               " and your management software has specified '[::]'"
> +                               ", but IPv6 over RoCE / iWARP is not supported in Linux.");
> +                }
>                   return -1;
>               }
>           }
> @@ -900,13 +899,18 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
>   
>       /* IB ports start with 1, not 0 */
>       if (ibv_query_port(verbs, 1, &port_attr)) {
> -        ERROR(errp, "Could not query initial IB port");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: Could not query initial IB port");
> +        }
>           return -1;
>       }
>   
>       if (port_attr.link_layer == IBV_LINK_LAYER_ETHERNET) {
> -        ERROR(errp, "Linux kernel's RoCE / iWARP does not support IPv6 "
> -                    "(but patches on linux-rdma in progress)");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: "
> +                       "Linux kernel's RoCE / iWARP does not support IPv6 "
> +                       "(but patches on linux-rdma in progress)");
> +        }
>           return -1;
>       }
>   
> @@ -930,21 +934,27 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
>       struct rdma_addrinfo *e;
>   
>       if (rdma->host == NULL || !strcmp(rdma->host, "")) {
> -        ERROR(errp, "RDMA hostname has not been set");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: RDMA hostname has not been set");
> +        }
>           return -1;
>       }
>   
>       /* create CM channel */
>       rdma->channel = rdma_create_event_channel();
>       if (!rdma->channel) {
> -        ERROR(errp, "could not create CM channel");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: could not create CM channel");
> +        }
>           return -1;
>       }
>   
>       /* create CM id */
>       ret = rdma_create_id(rdma->channel, &rdma->cm_id, NULL, RDMA_PS_TCP);
>       if (ret < 0) {
> -        ERROR(errp, "could not create channel id");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: could not create channel id");
> +        }
>           goto err_resolve_create_id;
>       }
>   
> @@ -953,7 +963,10 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
>   
>       ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
>       if (ret) {
> -        ERROR(errp, "could not rdma_getaddrinfo address %s", rdma->host);
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
> +                       rdma->host);
> +        }
>           goto err_resolve_get_addr;
>       }
>   
> @@ -976,7 +989,10 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
>       }
>   
>       rdma_freeaddrinfo(res);
> -    ERROR(errp, "could not resolve address %s", rdma->host);
> +    if (errp && !*errp) {
> +        error_setg(errp, "RDMA ERROR: could not resolve address %s",
> +                   rdma->host);
> +    }
>       goto err_resolve_get_addr;
>   
>   route:
> @@ -985,13 +1001,18 @@ route:
>   
>       ret = rdma_get_cm_event(rdma->channel, &cm_event);
>       if (ret < 0) {
> -        ERROR(errp, "could not perform event_addr_resolved");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: could not perform event_addr_resolved");
> +        }
>           goto err_resolve_get_addr;
>       }
>   
>       if (cm_event->event != RDMA_CM_EVENT_ADDR_RESOLVED) {
> -        ERROR(errp, "result not equal to event_addr_resolved %s",
> -                rdma_event_str(cm_event->event));
> +        if (errp && !*errp) {
> +            error_setg(errp,
> +                       "RDMA ERROR: result not equal to event_addr_resolved %s",
> +                       rdma_event_str(cm_event->event));
> +        }
>           error_report("rdma_resolve_addr");
>           rdma_ack_cm_event(cm_event);
>           goto err_resolve_get_addr;
> @@ -1001,18 +1022,25 @@ route:
>       /* resolve route */
>       ret = rdma_resolve_route(rdma->cm_id, RDMA_RESOLVE_TIMEOUT_MS);
>       if (ret < 0) {
> -        ERROR(errp, "could not resolve rdma route");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: could not resolve rdma route");
> +        }
>           goto err_resolve_get_addr;
>       }
>   
>       ret = rdma_get_cm_event(rdma->channel, &cm_event);
>       if (ret < 0) {
> -        ERROR(errp, "could not perform event_route_resolved");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: could not perform event_route_resolved");
> +        }
>           goto err_resolve_get_addr;
>       }
>       if (cm_event->event != RDMA_CM_EVENT_ROUTE_RESOLVED) {
> -        ERROR(errp, "result not equal to event_route_resolved: %s",
> -                        rdma_event_str(cm_event->event));
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: "
> +                       "result not equal to event_route_resolved: %s",
> +                       rdma_event_str(cm_event->event));
> +        }
>           rdma_ack_cm_event(cm_event);
>           goto err_resolve_get_addr;
>       }
> @@ -2443,15 +2471,20 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
>   
>       ret = qemu_rdma_alloc_pd_cq(rdma);
>       if (ret < 0) {
> -        ERROR(errp, "rdma migration: error allocating pd and cq! Your mlock()"
> -                    " limits may be too low. Please check $ ulimit -a # and "
> -                    "search for 'ulimit -l' in the output");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: "
> +                       "rdma migration: error allocating pd and cq! Your mlock()"
> +                       " limits may be too low. Please check $ ulimit -a # and "
> +                       "search for 'ulimit -l' in the output");
> +        }
>           goto err_rdma_source_init;
>       }
>   
>       ret = qemu_rdma_alloc_qp(rdma);
>       if (ret < 0) {
> -        ERROR(errp, "rdma migration: error allocating qp!");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: rdma migration: error allocating qp!");
> +        }
>           goto err_rdma_source_init;
>       }
>   
> @@ -2468,8 +2501,11 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
>       for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
>           ret = qemu_rdma_reg_control(rdma, idx);
>           if (ret < 0) {
> -            ERROR(errp, "rdma migration: error registering %d control!",
> -                                                            idx);
> +            if (errp && !*errp) {
> +                error_setg(errp,
> +                           "RDMA ERROR: rdma migration: error registering %d control!",
> +                           idx);
> +            }
>               goto err_rdma_source_init;
>           }
>       }
> @@ -2497,19 +2533,29 @@ static int qemu_get_cm_event_timeout(RDMAContext *rdma,
>       } while (ret < 0 && errno == EINTR);
>   
>       if (ret == 0) {
> -        ERROR(errp, "poll cm event timeout");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: poll cm event timeout");
> +        }
>           return -1;
>       } else if (ret < 0) {
> -        ERROR(errp, "failed to poll cm event, errno=%i", errno);
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: failed to poll cm event, errno=%i",
> +                       errno);
> +        }
>           return -1;
>       } else if (poll_fd.revents & POLLIN) {
>           if (rdma_get_cm_event(rdma->channel, cm_event) < 0) {
> -            ERROR(errp, "failed to get cm event");
> +            if (errp && !*errp) {
> +                error_setg(errp, "RDMA ERROR: failed to get cm event");
> +            }
>               return -1;
>           }
>           return 0;
>       } else {
> -        ERROR(errp, "no POLLIN event, revent=%x", poll_fd.revents);
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: no POLLIN event, revent=%x",
> +                       poll_fd.revents);
> +        }
>           return -1;
>       }
>   }
> @@ -2542,14 +2588,18 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
>   
>       ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
>       if (ret < 0) {
> -        ERROR(errp, "posting second control recv");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: posting second control recv");
> +        }
>           goto err_rdma_source_connect;
>       }
>   
>       ret = rdma_connect(rdma->cm_id, &conn_param);
>       if (ret < 0) {
>           perror("rdma_connect");
> -        ERROR(errp, "connecting to destination!");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: connecting to destination!");
> +        }
>           goto err_rdma_source_connect;
>       }
>   
> @@ -2558,7 +2608,9 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
>       } else {
>           ret = rdma_get_cm_event(rdma->channel, &cm_event);
>           if (ret < 0) {
> -            ERROR(errp, "failed to get cm event");
> +            if (errp && !*errp) {
> +                error_setg(errp, "RDMA ERROR: failed to get cm event");
> +            }
>           }
>       }
>       if (ret < 0) {
> @@ -2568,7 +2620,9 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
>   
>       if (cm_event->event != RDMA_CM_EVENT_ESTABLISHED) {
>           error_report("rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect");
> -        ERROR(errp, "connecting to destination!");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: connecting to destination!");
> +        }
>           rdma_ack_cm_event(cm_event);
>           goto err_rdma_source_connect;
>       }
> @@ -2615,14 +2669,18 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
>       }
>   
>       if (!rdma->host || !rdma->host[0]) {
> -        ERROR(errp, "RDMA host is not set!");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: RDMA host is not set!");
> +        }
>           rdma->errored = true;
>           return -1;
>       }
>       /* create CM channel */
>       rdma->channel = rdma_create_event_channel();
>       if (!rdma->channel) {
> -        ERROR(errp, "could not create rdma event channel");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: could not create rdma event channel");
> +        }
>           rdma->errored = true;
>           return -1;
>       }
> @@ -2630,7 +2688,9 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
>       /* create CM id */
>       ret = rdma_create_id(rdma->channel, &listen_id, NULL, RDMA_PS_TCP);
>       if (ret < 0) {
> -        ERROR(errp, "could not create cm_id!");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: could not create cm_id!");
> +        }
>           goto err_dest_init_create_listen_id;
>       }
>   
> @@ -2639,14 +2699,19 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
>   
>       ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
>       if (ret) {
> -        ERROR(errp, "could not rdma_getaddrinfo address %s", rdma->host);
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
> +                       rdma->host);
> +        }
>           goto err_dest_init_bind_addr;
>       }
>   
>       ret = rdma_set_option(listen_id, RDMA_OPTION_ID, RDMA_OPTION_ID_REUSEADDR,
>                             &reuse, sizeof reuse);
>       if (ret < 0) {
> -        ERROR(errp, "Error: could not set REUSEADDR option");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: Error: could not set REUSEADDR option");
> +        }
>           goto err_dest_init_bind_addr;
>       }
>       for (e = res; e != NULL; e = e->ai_next) {
> @@ -2668,7 +2733,9 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
>   
>       rdma_freeaddrinfo(res);
>       if (!e) {
> -        ERROR(errp, "Error: could not rdma_bind_addr!");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: Error: could not rdma_bind_addr!");
> +        }
>           goto err_dest_init_bind_addr;
>       }
>   
> @@ -2720,7 +2787,10 @@ static RDMAContext *qemu_rdma_data_init(const char *host_port, Error **errp)
>           rdma->host = g_strdup(addr->host);
>           rdma->host_port = g_strdup(host_port);
>       } else {
> -        ERROR(errp, "bad RDMA migration address '%s'", host_port);
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: bad RDMA migration address '%s'",
> +                       host_port);
> +        }
>           g_free(rdma);
>           rdma = NULL;
>       }
> @@ -4106,7 +4176,9 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
>       ret = rdma_listen(rdma->listen_id, 5);
>   
>       if (ret < 0) {
> -        ERROR(errp, "listening on socket!");
> +        if (errp && !*errp) {
> +            error_setg(errp, "RDMA ERROR: listening on socket!");
> +        }
>           goto cleanup_rdma;
>       }
>   

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 32/52] migration/rdma: Fix error handling around rdma_getaddrinfo()
  2023-09-18 14:41 ` [PATCH 32/52] migration/rdma: Fix error handling around rdma_getaddrinfo() Markus Armbruster
@ 2023-09-25  7:32   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-25  7:32 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> qemu_rdma_resolve_host() and qemu_rdma_dest_init() iterate over
> addresses to find one that works, holding onto the first Error from
> qemu_rdma_broken_ipv6_kernel() for use when no address works.  Issues:
> 
> 1. If @errp was &error_abort or &error_fatal, we'd terminate instead
>     of trying the next address.  Can't actually happen, since no caller
>     passes these arguments.
> 
> 2. When @errp is a pointer to a variable containing NULL, and
>     qemu_rdma_broken_ipv6_kernel() fails, the variable no longer
>     contains NULL.  Subsequent iterations pass it again, violating
>     Error usage rules.  Dangerous, as setting an error would then trip
>     error_setv()'s assertion.  Works only because
>     qemu_rdma_broken_ipv6_kernel() and the code following the loops
>     carefully avoids setting a second error.
> 
> 3. If qemu_rdma_broken_ipv6_kernel() fails, and then a later iteration
>     finds a working address, @errp still holds the first error from
>     qemu_rdma_broken_ipv6_kernel().  If we then run into another error,
>     we report the qemu_rdma_broken_ipv6_kernel() failure instead.
> 
> 4. If we don't run into another error, we leak the Error object.
> 
> Use a local error variable, and propagate to @errp.  This fixes 3. and
> also cleans up 1 and partly 2.
> 
> Free this error when we have a working address.  This fixes 4.
> 
> Pass the local error variable to qemu_rdma_broken_ipv6_kernel() only
> until it fails.  Pass null on any later iterations.  This cleans up
> the remainder of 2.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>


Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 33/52] migration/rdma: Drop "@errp is clear" guards around error_setg()
  2023-09-18 14:41 ` [PATCH 33/52] migration/rdma: Drop "@errp is clear" guards around error_setg() Markus Armbruster
@ 2023-09-25  7:32   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-25  7:32 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> These guards are all redundant now.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>


Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> ---
>   migration/rdma.c | 164 +++++++++++++++--------------------------------
>   1 file changed, 51 insertions(+), 113 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index d29affe410..c88cd1f468 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -852,10 +852,8 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
>   
>               if (ibv_query_port(verbs, 1, &port_attr)) {
>                   ibv_close_device(verbs);
> -                if (errp && !*errp) {
> -                    error_setg(errp,
> -                               "RDMA ERROR: Could not query initial IB port");
> -                }
> +                error_setg(errp,
> +                           "RDMA ERROR: Could not query initial IB port");
>                   return -1;
>               }
>   
> @@ -878,12 +876,10 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
>                                   " migrate over the IB fabric until the kernel "
>                                   " fixes the bug.\n");
>               } else {
> -                if (errp && !*errp) {
> -                    error_setg(errp, "RDMA ERROR: "
> -                               "You only have RoCE / iWARP devices in your systems"
> -                               " and your management software has specified '[::]'"
> -                               ", but IPv6 over RoCE / iWARP is not supported in Linux.");
> -                }
> +                error_setg(errp, "RDMA ERROR: "
> +                           "You only have RoCE / iWARP devices in your systems"
> +                           " and your management software has specified '[::]'"
> +                           ", but IPv6 over RoCE / iWARP is not supported in Linux.");
>                   return -1;
>               }
>           }
> @@ -899,18 +895,14 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
>   
>       /* IB ports start with 1, not 0 */
>       if (ibv_query_port(verbs, 1, &port_attr)) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: Could not query initial IB port");
> -        }
> +        error_setg(errp, "RDMA ERROR: Could not query initial IB port");
>           return -1;
>       }
>   
>       if (port_attr.link_layer == IBV_LINK_LAYER_ETHERNET) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: "
> -                       "Linux kernel's RoCE / iWARP does not support IPv6 "
> -                       "(but patches on linux-rdma in progress)");
> -        }
> +        error_setg(errp, "RDMA ERROR: "
> +                   "Linux kernel's RoCE / iWARP does not support IPv6 "
> +                   "(but patches on linux-rdma in progress)");
>           return -1;
>       }
>   
> @@ -935,27 +927,21 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
>       struct rdma_addrinfo *e;
>   
>       if (rdma->host == NULL || !strcmp(rdma->host, "")) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: RDMA hostname has not been set");
> -        }
> +        error_setg(errp, "RDMA ERROR: RDMA hostname has not been set");
>           return -1;
>       }
>   
>       /* create CM channel */
>       rdma->channel = rdma_create_event_channel();
>       if (!rdma->channel) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: could not create CM channel");
> -        }
> +        error_setg(errp, "RDMA ERROR: could not create CM channel");
>           return -1;
>       }
>   
>       /* create CM id */
>       ret = rdma_create_id(rdma->channel, &rdma->cm_id, NULL, RDMA_PS_TCP);
>       if (ret < 0) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: could not create channel id");
> -        }
> +        error_setg(errp, "RDMA ERROR: could not create channel id");
>           goto err_resolve_create_id;
>       }
>   
> @@ -964,10 +950,8 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
>   
>       ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
>       if (ret) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
> -                       rdma->host);
> -        }
> +        error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
> +                   rdma->host);
>           goto err_resolve_get_addr;
>       }
>   
> @@ -1009,18 +993,14 @@ route:
>   
>       ret = rdma_get_cm_event(rdma->channel, &cm_event);
>       if (ret < 0) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: could not perform event_addr_resolved");
> -        }
> +        error_setg(errp, "RDMA ERROR: could not perform event_addr_resolved");
>           goto err_resolve_get_addr;
>       }
>   
>       if (cm_event->event != RDMA_CM_EVENT_ADDR_RESOLVED) {
> -        if (errp && !*errp) {
> -            error_setg(errp,
> -                       "RDMA ERROR: result not equal to event_addr_resolved %s",
> -                       rdma_event_str(cm_event->event));
> -        }
> +        error_setg(errp,
> +                   "RDMA ERROR: result not equal to event_addr_resolved %s",
> +                   rdma_event_str(cm_event->event));
>           error_report("rdma_resolve_addr");
>           rdma_ack_cm_event(cm_event);
>           goto err_resolve_get_addr;
> @@ -1030,25 +1010,19 @@ route:
>       /* resolve route */
>       ret = rdma_resolve_route(rdma->cm_id, RDMA_RESOLVE_TIMEOUT_MS);
>       if (ret < 0) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: could not resolve rdma route");
> -        }
> +        error_setg(errp, "RDMA ERROR: could not resolve rdma route");
>           goto err_resolve_get_addr;
>       }
>   
>       ret = rdma_get_cm_event(rdma->channel, &cm_event);
>       if (ret < 0) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: could not perform event_route_resolved");
> -        }
> +        error_setg(errp, "RDMA ERROR: could not perform event_route_resolved");
>           goto err_resolve_get_addr;
>       }
>       if (cm_event->event != RDMA_CM_EVENT_ROUTE_RESOLVED) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: "
> -                       "result not equal to event_route_resolved: %s",
> -                       rdma_event_str(cm_event->event));
> -        }
> +        error_setg(errp, "RDMA ERROR: "
> +                   "result not equal to event_route_resolved: %s",
> +                   rdma_event_str(cm_event->event));
>           rdma_ack_cm_event(cm_event);
>           goto err_resolve_get_addr;
>       }
> @@ -2479,20 +2453,16 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
>   
>       ret = qemu_rdma_alloc_pd_cq(rdma);
>       if (ret < 0) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: "
> -                       "rdma migration: error allocating pd and cq! Your mlock()"
> -                       " limits may be too low. Please check $ ulimit -a # and "
> -                       "search for 'ulimit -l' in the output");
> -        }
> +        error_setg(errp, "RDMA ERROR: "
> +                   "rdma migration: error allocating pd and cq! Your mlock()"
> +                   " limits may be too low. Please check $ ulimit -a # and "
> +                   "search for 'ulimit -l' in the output");
>           goto err_rdma_source_init;
>       }
>   
>       ret = qemu_rdma_alloc_qp(rdma);
>       if (ret < 0) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: rdma migration: error allocating qp!");
> -        }
> +        error_setg(errp, "RDMA ERROR: rdma migration: error allocating qp!");
>           goto err_rdma_source_init;
>       }
>   
> @@ -2509,11 +2479,9 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
>       for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
>           ret = qemu_rdma_reg_control(rdma, idx);
>           if (ret < 0) {
> -            if (errp && !*errp) {
> -                error_setg(errp,
> -                           "RDMA ERROR: rdma migration: error registering %d control!",
> -                           idx);
> -            }
> +            error_setg(errp,
> +                       "RDMA ERROR: rdma migration: error registering %d control!",
> +                       idx);
>               goto err_rdma_source_init;
>           }
>       }
> @@ -2541,29 +2509,21 @@ static int qemu_get_cm_event_timeout(RDMAContext *rdma,
>       } while (ret < 0 && errno == EINTR);
>   
>       if (ret == 0) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: poll cm event timeout");
> -        }
> +        error_setg(errp, "RDMA ERROR: poll cm event timeout");
>           return -1;
>       } else if (ret < 0) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: failed to poll cm event, errno=%i",
> -                       errno);
> -        }
> +        error_setg(errp, "RDMA ERROR: failed to poll cm event, errno=%i",
> +                   errno);
>           return -1;
>       } else if (poll_fd.revents & POLLIN) {
>           if (rdma_get_cm_event(rdma->channel, cm_event) < 0) {
> -            if (errp && !*errp) {
> -                error_setg(errp, "RDMA ERROR: failed to get cm event");
> -            }
> +            error_setg(errp, "RDMA ERROR: failed to get cm event");
>               return -1;
>           }
>           return 0;
>       } else {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: no POLLIN event, revent=%x",
> -                       poll_fd.revents);
> -        }
> +        error_setg(errp, "RDMA ERROR: no POLLIN event, revent=%x",
> +                   poll_fd.revents);
>           return -1;
>       }
>   }
> @@ -2596,18 +2556,14 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
>   
>       ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
>       if (ret < 0) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: posting second control recv");
> -        }
> +        error_setg(errp, "RDMA ERROR: posting second control recv");
>           goto err_rdma_source_connect;
>       }
>   
>       ret = rdma_connect(rdma->cm_id, &conn_param);
>       if (ret < 0) {
>           perror("rdma_connect");
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: connecting to destination!");
> -        }
> +        error_setg(errp, "RDMA ERROR: connecting to destination!");
>           goto err_rdma_source_connect;
>       }
>   
> @@ -2616,9 +2572,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
>       } else {
>           ret = rdma_get_cm_event(rdma->channel, &cm_event);
>           if (ret < 0) {
> -            if (errp && !*errp) {
> -                error_setg(errp, "RDMA ERROR: failed to get cm event");
> -            }
> +            error_setg(errp, "RDMA ERROR: failed to get cm event");
>           }
>       }
>       if (ret < 0) {
> @@ -2628,9 +2582,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
>   
>       if (cm_event->event != RDMA_CM_EVENT_ESTABLISHED) {
>           error_report("rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect");
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: connecting to destination!");
> -        }
> +        error_setg(errp, "RDMA ERROR: connecting to destination!");
>           rdma_ack_cm_event(cm_event);
>           goto err_rdma_source_connect;
>       }
> @@ -2678,18 +2630,14 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
>       }
>   
>       if (!rdma->host || !rdma->host[0]) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: RDMA host is not set!");
> -        }
> +        error_setg(errp, "RDMA ERROR: RDMA host is not set!");
>           rdma->errored = true;
>           return -1;
>       }
>       /* create CM channel */
>       rdma->channel = rdma_create_event_channel();
>       if (!rdma->channel) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: could not create rdma event channel");
> -        }
> +        error_setg(errp, "RDMA ERROR: could not create rdma event channel");
>           rdma->errored = true;
>           return -1;
>       }
> @@ -2697,9 +2645,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
>       /* create CM id */
>       ret = rdma_create_id(rdma->channel, &listen_id, NULL, RDMA_PS_TCP);
>       if (ret < 0) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: could not create cm_id!");
> -        }
> +        error_setg(errp, "RDMA ERROR: could not create cm_id!");
>           goto err_dest_init_create_listen_id;
>       }
>   
> @@ -2708,19 +2654,15 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
>   
>       ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
>       if (ret) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
> -                       rdma->host);
> -        }
> +        error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
> +                   rdma->host);
>           goto err_dest_init_bind_addr;
>       }
>   
>       ret = rdma_set_option(listen_id, RDMA_OPTION_ID, RDMA_OPTION_ID_REUSEADDR,
>                             &reuse, sizeof reuse);
>       if (ret < 0) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: Error: could not set REUSEADDR option");
> -        }
> +        error_setg(errp, "RDMA ERROR: Error: could not set REUSEADDR option");
>           goto err_dest_init_bind_addr;
>       }
>   
> @@ -2804,10 +2746,8 @@ static RDMAContext *qemu_rdma_data_init(const char *host_port, Error **errp)
>           rdma->host = g_strdup(addr->host);
>           rdma->host_port = g_strdup(host_port);
>       } else {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: bad RDMA migration address '%s'",
> -                       host_port);
> -        }
> +        error_setg(errp, "RDMA ERROR: bad RDMA migration address '%s'",
> +                   host_port);
>           g_free(rdma);
>           rdma = NULL;
>       }
> @@ -4193,9 +4133,7 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
>       ret = rdma_listen(rdma->listen_id, 5);
>   
>       if (ret < 0) {
> -        if (errp && !*errp) {
> -            error_setg(errp, "RDMA ERROR: listening on socket!");
> -        }
> +        error_setg(errp, "RDMA ERROR: listening on socket!");
>           goto cleanup_rdma;
>       }
>   

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 00/52] migration/rdma: Error handling fixes
  2023-09-22 15:21     ` Peter Xu
@ 2023-09-25  8:06       ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-25  8:06 UTC (permalink / raw)
  To: Peter Xu; +Cc: Markus Armbruster, qemu-devel, quintela, leobras



On 22/09/2023 23:21, Peter Xu wrote:
> On Thu, Sep 21, 2023 at 08:27:24AM +0000, Zhijian Li (Fujitsu) wrote:
>> I'm worried that I may not have enough time, ability, or environment to review/test
>> the RDMA patches. but for this patch set, i will take a look later.
> 
> That'll be helpful, thanks!
> 
> So it seems maybe at least we should have an entry for rdma migration to
> reflect the state of the code there.  AFAIU we don't strictly need a
> maintainer for the entries; an empty entry should suffice, which I can
> prepare a patch for it:
> 
> RDMA Migration
> S: Odd Fixes
> F: migration/rdma*
> 
> Zhijian, if you still want to get emails when someone changes the code,
> maybe you still wanted be listed as a reviewer (even if not a maintainer)?
> So that you don't necessarily need to review every time, but just in case
> you still want to get notified whenever someone changes it, that means one
> line added onto above:
> 
> R: Li Zhijian <lizhijian@fujitsu.com>
> 
> Do you prefer me to add that R: for you when I send the maintainer file
> update?

Yes, thanks. my pleasure :)



> 
> I'm curious whether Fujitsu is using this code in production, if so it'll
> be great if you can be supported as,

This depends on Fujitsu's customer requirements, I don't really know either :)


Thanks
Zhijian

  perhaps part of, your job to maintain
> the rdma code.  But maybe that's not the case.> 
> In all cases, thanks a lot for the helps already.
> 

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 34/52] migration/rdma: Convert qemu_rdma_exchange_recv() to Error
  2023-09-18 14:41 ` [PATCH 34/52] migration/rdma: Convert qemu_rdma_exchange_recv() to Error Markus Armbruster
@ 2023-09-26  1:37   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  1:37 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
> 
> qio_channel_rdma_readv() violates this principle: it calls
> error_report() via qemu_rdma_exchange_recv().  I elected not to
> investigate how callers handle the error, i.e. precise impact is not
> known.
> 
> Clean this up by converting qemu_rdma_exchange_recv() to Error.
> 
> Necessitates setting an error when qemu_rdma_exchange_get_response()
> failed.  Since this error will go away later in this series, simply
> use "FIXME temporary error message" there.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>


Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> ---
>   migration/rdma.c | 15 +++++++++------
>   1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index c88cd1f468..50546b3a27 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1957,7 +1957,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
>    * control-channel message.
>    */
>   static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
> -                                   uint32_t expecting)
> +                                   uint32_t expecting, Error **errp)
>   {
>       RDMAControlHeader ready = {
>                                   .len = 0,
> @@ -1972,7 +1972,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
>       ret = qemu_rdma_post_send_control(rdma, NULL, &ready);
>   
>       if (ret < 0) {
> -        error_report("Failed to send control buffer!");
> +        error_setg(errp, "Failed to send control buffer!");
>           return -1;
>       }
>   
> @@ -1983,6 +1983,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
>                                             expecting, RDMA_WRID_READY);
>   
>       if (ret < 0) {
> +        error_setg(errp, "FIXME temporary error message");
>           return -1;
>       }
>   
> @@ -1993,7 +1994,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
>        */
>       ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
>       if (ret < 0) {
> -        error_report("rdma migration: error posting second control recv!");
> +        error_setg(errp, "rdma migration: error posting second control recv!");
>           return -1;
>       }
>   
> @@ -2908,11 +2909,11 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>           /* We've got nothing at all, so lets wait for
>            * more to arrive
>            */
> -        ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_QEMU_FILE);
> +        ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_QEMU_FILE,
> +                                      errp);
>   
>           if (ret < 0) {
>               rdma->errored = true;
> -            error_setg(errp, "qemu_rdma_exchange_recv failed");
>               return -1;
>           }
>   
> @@ -3536,6 +3537,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>       RDMAControlHeader blocks = { .type = RDMA_CONTROL_RAM_BLOCKS_RESULT,
>                                    .repeat = 1 };
>       QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
> +    Error *err = NULL;
>       RDMAContext *rdma;
>       RDMALocalBlocks *local;
>       RDMAControlHeader head;
> @@ -3565,9 +3567,10 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>       do {
>           trace_qemu_rdma_registration_handle_wait();
>   
> -        ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_NONE);
> +        ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_NONE, &err);
>   
>           if (ret < 0) {
> +            error_report_err(err);
>               break;
>           }
>   

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 35/52] migration/rdma: Convert qemu_rdma_exchange_send() to Error
  2023-09-18 14:41 ` [PATCH 35/52] migration/rdma: Convert qemu_rdma_exchange_send() " Markus Armbruster
@ 2023-09-26  1:42   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  1:42 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
> 
> qio_channel_rdma_writev() violates this principle: it calls
> error_report() via qemu_rdma_exchange_send().  I elected not to
> investigate how callers handle the error, i.e. precise impact is not
> known.
> 
> Clean this up by converting qemu_rdma_exchange_send() to Error.
> 
> Necessitates setting an error when qemu_rdma_post_recv_control(),
> callback(), or qemu_rdma_exchange_get_response() failed.  Since these
> errors will go away later in this series, simply use "FIXME temporary
> error message" there.
> 
> Signed-off-by: Markus Armbruster<armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 36/52] migration/rdma: Convert qemu_rdma_exchange_get_response() to Error
  2023-09-18 14:41 ` [PATCH 36/52] migration/rdma: Convert qemu_rdma_exchange_get_response() " Markus Armbruster
@ 2023-09-26  1:45   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  1:45 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
> 
> qemu_rdma_exchange_send() and qemu_rdma_exchange_recv() violate this
> principle: they call error_report() via
> qemu_rdma_exchange_get_response().  I elected not to investigate how
> callers handle the error, i.e. precise impact is not known.
> 
> Clean this up by converting qemu_rdma_exchange_get_response() to
> Error.
> 
> Signed-off-by: Markus Armbruster<armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 37/52] migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() to Error
  2023-09-18 14:41 ` [PATCH 37/52] migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() " Markus Armbruster
@ 2023-09-26  1:51   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  1:51 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
> 
> qemu_rdma_exchange_send() violates this principle: it calls
> error_report() via callback qemu_rdma_reg_whole_ram_blocks().  I
> elected not to investigate how callers handle the error, i.e. precise
> impact is not known.
> 
> Clean this up by converting the callback to Error.
> 
> Signed-off-by: Markus Armbruster<armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 38/52] migration/rdma: Convert qemu_rdma_write_flush() to Error
  2023-09-18 14:41 ` [PATCH 38/52] migration/rdma: Convert qemu_rdma_write_flush() " Markus Armbruster
@ 2023-09-26  1:56   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  1:56 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
> 
> qio_channel_rdma_writev() violates this principle: it calls
> error_report() via qemu_rdma_write_flush().  I elected not to
> investigate how callers handle the error, i.e. precise impact is not
> known.
> 
> Clean this up by converting qemu_rdma_write_flush() to Error.
> 
> Necessitates setting an error when qemu_rdma_write_one() failed.
> Since this error will go away later in this series, simply use "FIXME
> temporary error message" there.
> 
> Signed-off-by: Markus Armbruster<armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error
  2023-09-18 14:41 ` [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() " Markus Armbruster
@ 2023-09-26  5:24   ` Zhijian Li (Fujitsu)
  2023-09-26  5:50   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  5:24 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
> 
> qemu_rdma_write_flush() violates this principle: it calls
> error_report() via qemu_rdma_write_one().  I elected not to
> investigate how callers handle the error, i.e. precise impact is not
> known.
> 
> Clean this up by converting qemu_rdma_write_one() to Error.
> 
> Signed-off-by: Markus Armbruster<armbru@redhat.com>


Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 40/52] migration/rdma: Convert qemu_rdma_write() to Error
  2023-09-18 14:41 ` [PATCH 40/52] migration/rdma: Convert qemu_rdma_write() " Markus Armbruster
@ 2023-09-26  5:25   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  5:25 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> Just for consistency with qemu_rdma_write_one() and
> qemu_rdma_write_flush(), and for slightly simpler code.
> 
> Signed-off-by: Markus Armbruster<armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 41/52] migration/rdma: Convert qemu_rdma_post_send_control() to Error
  2023-09-18 14:41 ` [PATCH 41/52] migration/rdma: Convert qemu_rdma_post_send_control() " Markus Armbruster
@ 2023-09-26  5:29   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  5:29 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
> 
> qemu_rdma_exchange_send() violates this principle: it calls
> error_report() via qemu_rdma_post_send_control().  I elected not to
> investigate how callers handle the error, i.e. precise impact is not
> known.
> 
> Clean this up by converting qemu_rdma_post_send_control() to Error.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 42/52] migration/rdma: Convert qemu_rdma_post_recv_control() to Error
  2023-09-18 14:41 ` [PATCH 42/52] migration/rdma: Convert qemu_rdma_post_recv_control() " Markus Armbruster
@ 2023-09-26  5:31   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  5:31 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> Just for symmetry with qemu_rdma_post_send_control().  Error messages
> lose detail I consider of no use to users.
> 
> Signed-off-by: Markus Armbruster<armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 43/52] migration/rdma: Convert qemu_rdma_alloc_pd_cq() to Error
  2023-09-18 14:41 ` [PATCH 43/52] migration/rdma: Convert qemu_rdma_alloc_pd_cq() " Markus Armbruster
@ 2023-09-26  5:43   ` Zhijian Li (Fujitsu)
  2023-09-26  6:41     ` Markus Armbruster
  0 siblings, 1 reply; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  5:43 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
> 
> qemu_rdma_source_init() violates this principle: it calls
> error_report() via qemu_rdma_alloc_pd_cq().  I elected not to
> investigate how callers handle the error, i.e. precise impact is not
> known.
> 
> Clean this up by converting qemu_rdma_alloc_pd_cq() to Error.
> 
> Signed-off-by: Markus Armbruster<armbru@redhat.com>
> ---
>   migration/rdma.c | 27 ++++++++++++++-------------
>   1 file changed, 14 insertions(+), 13 deletions(-)

[...]


> @@ -2451,6 +2451,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>   
>   static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
>   {
> +    ERRP_GUARD();
>       int ret, idx;
>   
>       /*
> @@ -2464,12 +2465,12 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
>           goto err_rdma_source_init;
>       }
>   
> -    ret = qemu_rdma_alloc_pd_cq(rdma);
> +    ret = qemu_rdma_alloc_pd_cq(rdma, errp);
>       if (ret < 0) {
> -        error_setg(errp, "RDMA ERROR: "
> -                   "rdma migration: error allocating pd and cq! Your mlock()"
> -                   " limits may be too low. Please check $ ulimit -a # and "
> -                   "search for 'ulimit -l' in the output");
> +        error_append_hint(errp,
> +                          "Your mlock() limits may be too low. "
> +                          "Please check $ ulimit -a # and "
> +                          "search for 'ulimit -l' in the output\n");


I think we could freely remove this error message as well, it may neither a exact resolution
nor some one will take care. Just report the error qemu_rdma_alloc_pd_cq() tell us.

Anyway

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


>           goto err_rdma_source_init;
>       }

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 44/52] migration/rdma: Silence qemu_rdma_resolve_host()
  2023-09-18 14:41 ` [PATCH 44/52] migration/rdma: Silence qemu_rdma_resolve_host() Markus Armbruster
@ 2023-09-26  5:44   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  5:44 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
> 
> qemu_rdma_resolve_host() violates this principle: it calls
> error_report().
> 
> Clean this up: drop error_report().
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> ---
>   migration/rdma.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 41f0ae4ddb..0e365db06a 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1003,7 +1003,6 @@ route:
>           error_setg(errp,
>                      "RDMA ERROR: result not equal to event_addr_resolved %s",
>                      rdma_event_str(cm_event->event));
> -        error_report("rdma_resolve_addr");
>           rdma_ack_cm_event(cm_event);
>           goto err_resolve_get_addr;
>       }

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error
  2023-09-18 14:41 ` [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() " Markus Armbruster
  2023-09-26  5:24   ` Zhijian Li (Fujitsu)
@ 2023-09-26  5:50   ` Zhijian Li (Fujitsu)
  2023-09-26  5:55     ` Zhijian Li (Fujitsu)
  2023-09-26  6:40     ` Markus Armbruster
  1 sibling, 2 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  5:50 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
> 
> qemu_rdma_write_flush() violates this principle: it calls
> error_report() via qemu_rdma_write_one().  I elected not to
> investigate how callers handle the error, i.e. precise impact is not
> known.
> 
> Clean this up by converting qemu_rdma_write_one() to Error.
> 
> Signed-off-by: Markus Armbruster<armbru@redhat.com>
> ---
>   migration/rdma.c | 25 +++++++++++--------------
>   1 file changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index c3c33fe242..9b8cbadfcd 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2019,9 +2019,8 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
>    */
>   static int qemu_rdma_write_one(QEMUFile *f, RDMAContext *rdma,
>                                  int current_index, uint64_t current_addr,
> -                               uint64_t length)
> +                               uint64_t length, Error **errp)
>   {
> -    Error *err = NULL;
>       struct ibv_sge sge;
>       struct ibv_send_wr send_wr = { 0 };
>       struct ibv_send_wr *bad_wr;

[...]

>           }
> @@ -2219,7 +2216,7 @@ retry:
>           goto retry;
>   
>       } else if (ret > 0) {
> -        perror("rdma migration: post rdma write failed");
> +        error_setg(errp, "rdma migration: post rdma write failed");

It reminds that do you miss to use error_setg_errno() instead.





>           return -1;
>       }

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error
  2023-09-26  5:50   ` Zhijian Li (Fujitsu)
@ 2023-09-26  5:55     ` Zhijian Li (Fujitsu)
  2023-09-26  9:26       ` Markus Armbruster
  2023-09-26  6:40     ` Markus Armbruster
  1 sibling, 1 reply; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  5:55 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 26/09/2023 13:50, Li Zhijian wrote:
> 
> 
> On 18/09/2023 22:41, Markus Armbruster wrote:
>> Functions that use an Error **errp parameter to return errors should
>> not also report them to the user, because reporting is the caller's
>> job.  When the caller does, the error is reported twice.  When it
>> doesn't (because it recovered from the error), there is no error to
>> report, i.e. the report is bogus.
>>
>> qemu_rdma_write_flush() violates this principle: it calls
>> error_report() via qemu_rdma_write_one().  I elected not to
>> investigate how callers handle the error, i.e. precise impact is not
>> known.
>>
>> Clean this up by converting qemu_rdma_write_one() to Error.
>>
>> Signed-off-by: Markus Armbruster<armbru@redhat.com>
>> ---
>>   migration/rdma.c | 25 +++++++++++--------------
>>   1 file changed, 11 insertions(+), 14 deletions(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index c3c33fe242..9b8cbadfcd 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -2019,9 +2019,8 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
>>    */
>>   static int qemu_rdma_write_one(QEMUFile *f, RDMAContext *rdma,
>>                                  int current_index, uint64_t current_addr,
>> -                               uint64_t length)
>> +                               uint64_t length, Error **errp)
>>   {
>> -    Error *err = NULL;
>>       struct ibv_sge sge;
>>       struct ibv_send_wr send_wr = { 0 };
>>       struct ibv_send_wr *bad_wr;
> 
> [...]
> 
>>           }
>> @@ -2219,7 +2216,7 @@ retry:
>>           goto retry;
>>       } else if (ret > 0) {
>> -        perror("rdma migration: post rdma write failed");
>> +        error_setg(errp, "rdma migration: post rdma write failed");
> 
> It reminds that do you miss to use error_setg_errno() instead.
> 

Answer it myself:
ibv_post_send(3) says:

RETURN VALUE
        ibv_post_send() returns 0 on success, or the value of errno on failure (which indicates the failure reason).


the global error is not defined here.



> 
> 
> 
> 
>>           return -1;
>>       }

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 45/52] migration/rdma: Silence qemu_rdma_connect()
  2023-09-18 14:41 ` [PATCH 45/52] migration/rdma: Silence qemu_rdma_connect() Markus Armbruster
@ 2023-09-26  6:00   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  6:00 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
> 
> qemu_rdma_connect() violates this principle: it calls error_report()
> and perror().  I elected not to investigate how callers handle the
> error, i.e. precise impact is not known.
> 
> Clean this up: replace perror() by changing error_setg() to
> error_setg_errno(), and drop error_report().  I believe the callers'
> error reports suffice then.  If they don't, we need to convert to
> Error instead.
> 
> Signed-off-by: Markus Armbruster<armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 46/52] migration/rdma: Silence qemu_rdma_reg_control()
  2023-09-18 14:42 ` [PATCH 46/52] migration/rdma: Silence qemu_rdma_reg_control() Markus Armbruster
@ 2023-09-26  6:00   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  6:00 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:42, Markus Armbruster wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
> 
> qemu_rdma_source_init() and qemu_rdma_accept() violate this principle:
> they call error_report() via qemu_rdma_reg_control().  I elected not
> to investigate how callers handle the error, i.e. precise impact is
> not known.
> 
> Clean this up by dropping the error reporting from
> qemu_rdma_reg_control().  I believe the callers' error reports
> suffice.  If they don't, we need to convert to Error instead.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>


Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>



> ---
>   migration/rdma.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index bf4e67d68d..29ad8ae832 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1349,7 +1349,6 @@ static int qemu_rdma_reg_control(RDMAContext *rdma, int idx)
>           rdma->total_registrations++;
>           return 0;
>       }
> -    error_report("qemu_rdma_reg_control failed");
>       return -1;
>   }
>   

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 47/52] migration/rdma: Don't report received completion events as error
  2023-09-18 14:42 ` [PATCH 47/52] migration/rdma: Don't report received completion events as error Markus Armbruster
@ 2023-09-26  6:06   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  6:06 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:42, Markus Armbruster wrote:
> When qemu_rdma_wait_comp_channel() receives an event from the
> completion channel, it reports an error "receive cm event while wait
> comp channel,cm event is T", where T is the numeric event type.
> However, the function fails only when T is a disconnect or device
> removal.  Events other than these two are not actually an error, and
> reporting them as an error is wrong.  If we need to report them to the
> user, we should use something else, and what to use depends on why we
> need to report them to the user.
> 
> For now, report this error only when the function actually fails.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> ---
>   migration/rdma.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 29ad8ae832..cbf5e6b9a8 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1566,11 +1566,11 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
>                           return -1;
>                       }
>   
> -                    error_report("receive cm event while wait comp channel,"
> -                                 "cm event is %d", cm_event->event);
>                       if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
>                           cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
>                           rdma_ack_cm_event(cm_event);
> +                        error_report("receive cm event while wait comp channel,"
> +                                     "cm event is %d", cm_event->event);
>                           return -1;
>                       }
>                       rdma_ack_cm_event(cm_event);

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 48/52] migration/rdma: Silence qemu_rdma_block_for_wrid()
  2023-09-18 14:42 ` [PATCH 48/52] migration/rdma: Silence qemu_rdma_block_for_wrid() Markus Armbruster
@ 2023-09-26  6:17   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  6:17 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:42, Markus Armbruster wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
> 
> qemu_rdma_post_send_control(), qemu_rdma_exchange_get_response(), and
> qemu_rdma_write_one() violate this principle: they call
> error_report(), fprintf(stderr, ...), and perror() via
> qemu_rdma_block_for_wrid(), qemu_rdma_poll(), and
> qemu_rdma_wait_comp_channel().  I elected not to investigate how
> callers handle the error, i.e. precise impact is not known.
> 
> Clean this up by dropping the error reporting from qemu_rdma_poll(),
> qemu_rdma_wait_comp_channel(), and qemu_rdma_block_for_wrid().  I
> believe the callers' error reports suffice.  If they don't, we need to
> convert to Error instead.
> 
> Signed-off-by: Markus Armbruster<armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 49/52] migration/rdma: Silence qemu_rdma_register_and_get_keys()
  2023-09-18 14:42 ` [PATCH 49/52] migration/rdma: Silence qemu_rdma_register_and_get_keys() Markus Armbruster
@ 2023-09-26  6:21   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  6:21 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:42, Markus Armbruster wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
> 
> qemu_rdma_write_one() violates this principle: it reports errors to
> stderr via qemu_rdma_register_and_get_keys().  I elected not to
> investigate how callers handle the error, i.e. precise impact is not
> known.
> 
> Clean this up: silence qemu_rdma_register_and_get_keys().  I believe
> the caller's error reports suffice.  If they don't, we need to convert
> to Error instead.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>


Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> ---
>   migration/rdma.c | 9 ---------
>   1 file changed, 9 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 99dccdeae5..d9f80ef390 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1314,15 +1314,6 @@ static int qemu_rdma_register_and_get_keys(RDMAContext *rdma,
>           }
>       }
>       if (!block->pmr[chunk]) {
> -        perror("Failed to register chunk!");
> -        fprintf(stderr, "Chunk details: block: %d chunk index %d"
> -                        " start %" PRIuPTR " end %" PRIuPTR
> -                        " host %" PRIuPTR
> -                        " local %" PRIuPTR " registrations: %d\n",
> -                        block->index, chunk, (uintptr_t)chunk_start,
> -                        (uintptr_t)chunk_end, host_addr,
> -                        (uintptr_t)block->local_host_addr,
> -                        rdma->total_registrations);
>           return -1;
>       }
>       rdma->total_registrations++;

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 51/52] migration/rdma: Use error_report() & friends instead of stderr
  2023-09-18 14:42 ` [PATCH 51/52] migration/rdma: Use error_report() & friends instead of stderr Markus Armbruster
@ 2023-09-26  6:35   ` Zhijian Li (Fujitsu)
  2023-09-27 12:16     ` Markus Armbruster
  0 siblings, 1 reply; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  6:35 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:42, Markus Armbruster wrote:
>           if (local->nb_blocks != nb_dest_blocks) {
> -            fprintf(stderr, "ram blocks mismatch (Number of blocks %d vs %d) "
> -                    "Your QEMU command line parameters are probably "
> -                    "not identical on both the source and destination.",
> -                    local->nb_blocks, nb_dest_blocks);
> +            error_report("ram blocks mismatch (Number of blocks %d vs %d)",
> +                         local->nb_blocks, nb_dest_blocks);
> +            error_printf("Your QEMU command line parameters are probably "
> +                         "not identical on both the source and destination.");


Why is this one handled specifically? It seems like it would be fine with error_report().





^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error
  2023-09-26  5:50   ` Zhijian Li (Fujitsu)
  2023-09-26  5:55     ` Zhijian Li (Fujitsu)
@ 2023-09-26  6:40     ` Markus Armbruster
  1 sibling, 0 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-26  6:40 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu); +Cc: qemu-devel, quintela, peterx, leobras

"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> On 18/09/2023 22:41, Markus Armbruster wrote:
>> Functions that use an Error **errp parameter to return errors should
>> not also report them to the user, because reporting is the caller's
>> job.  When the caller does, the error is reported twice.  When it
>> doesn't (because it recovered from the error), there is no error to
>> report, i.e. the report is bogus.
>> 
>> qemu_rdma_write_flush() violates this principle: it calls
>> error_report() via qemu_rdma_write_one().  I elected not to
>> investigate how callers handle the error, i.e. precise impact is not
>> known.
>> 
>> Clean this up by converting qemu_rdma_write_one() to Error.
>> 
>> Signed-off-by: Markus Armbruster<armbru@redhat.com>
>> ---
>>   migration/rdma.c | 25 +++++++++++--------------
>>   1 file changed, 11 insertions(+), 14 deletions(-)
>> 
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index c3c33fe242..9b8cbadfcd 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -2019,9 +2019,8 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
>>    */
>>   static int qemu_rdma_write_one(QEMUFile *f, RDMAContext *rdma,
>>                                  int current_index, uint64_t current_addr,
>> -                               uint64_t length)
>> +                               uint64_t length, Error **errp)
>>   {
>> -    Error *err = NULL;
>>       struct ibv_sge sge;
>>       struct ibv_send_wr send_wr = { 0 };
>>       struct ibv_send_wr *bad_wr;
>
> [...]
>
>>           }
>> @@ -2219,7 +2216,7 @@ retry:
>>           goto retry;
>>   
>>       } else if (ret > 0) {
>> -        perror("rdma migration: post rdma write failed");
>> +        error_setg(errp, "rdma migration: post rdma write failed");
>
> It reminds that do you miss to use error_setg_errno() instead.

Indeed!

I'll adjust the commit message as well.

>>           return -1;
>>       }



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 43/52] migration/rdma: Convert qemu_rdma_alloc_pd_cq() to Error
  2023-09-26  5:43   ` Zhijian Li (Fujitsu)
@ 2023-09-26  6:41     ` Markus Armbruster
  2023-09-26  6:55       ` Zhijian Li (Fujitsu)
  0 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-26  6:41 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu); +Cc: qemu-devel, quintela, peterx, leobras

"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> On 18/09/2023 22:41, Markus Armbruster wrote:
>> Functions that use an Error **errp parameter to return errors should
>> not also report them to the user, because reporting is the caller's
>> job.  When the caller does, the error is reported twice.  When it
>> doesn't (because it recovered from the error), there is no error to
>> report, i.e. the report is bogus.
>> 
>> qemu_rdma_source_init() violates this principle: it calls
>> error_report() via qemu_rdma_alloc_pd_cq().  I elected not to
>> investigate how callers handle the error, i.e. precise impact is not
>> known.
>> 
>> Clean this up by converting qemu_rdma_alloc_pd_cq() to Error.
>> 
>> Signed-off-by: Markus Armbruster<armbru@redhat.com>
>> ---
>>   migration/rdma.c | 27 ++++++++++++++-------------
>>   1 file changed, 14 insertions(+), 13 deletions(-)
>
> [...]
>
>
>> @@ -2451,6 +2451,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>>   
>>   static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
>>   {
>> +    ERRP_GUARD();
>>       int ret, idx;
>>   
>>       /*
>> @@ -2464,12 +2465,12 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
>>           goto err_rdma_source_init;
>>       }
>>   
>> -    ret = qemu_rdma_alloc_pd_cq(rdma);
>> +    ret = qemu_rdma_alloc_pd_cq(rdma, errp);
>>       if (ret < 0) {
>> -        error_setg(errp, "RDMA ERROR: "
>> -                   "rdma migration: error allocating pd and cq! Your mlock()"
>> -                   " limits may be too low. Please check $ ulimit -a # and "
>> -                   "search for 'ulimit -l' in the output");
>> +        error_append_hint(errp,
>> +                          "Your mlock() limits may be too low. "
>> +                          "Please check $ ulimit -a # and "
>> +                          "search for 'ulimit -l' in the output\n");
>
>
> I think we could freely remove this error message as well, it may neither a exact resolution
> nor some one will take care. Just report the error qemu_rdma_alloc_pd_cq() tell us.

Double-checking: you recommend to drop error_append_hint()?

> Anyway
>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
>
>
>>           goto err_rdma_source_init;
>>       }

Thanks!



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 52/52] migration/rdma: Fix how we show device details on open
  2023-09-18 14:42 ` [PATCH 52/52] migration/rdma: Fix how we show device details on open Markus Armbruster
@ 2023-09-26  6:49   ` Zhijian Li (Fujitsu)
  2023-09-26  9:19     ` Markus Armbruster
  0 siblings, 1 reply; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  6:49 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:42, Markus Armbruster wrote:
> qemu_rdma_dump_id() dumps RDMA device details to stdout.
> 
> rdma_start_outgoing_migration() calls it via qemu_rdma_source_init()
> and qemu_rdma_resolve_host() to show source device details.
> rdma_start_incoming_migration() arranges its call via
> rdma_accept_incoming_migration() and qemu_rdma_accept() to show
> destination device details.
> 
> Two issues:
> 
> 1. rdma_start_outgoing_migration() can run in HMP context.  The
>     information should arguably go the monitor, not stdout.
> 
> 2. ibv_query_port() failure is reported as error.  Its callers remain
>     unaware of this failure (qemu_rdma_dump_id() can't fail), so
>     reporting this to the user as an error is problematic.
> 
> Use qemu_printf() instead of printf() and error_report().
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   migration/rdma.c | 20 +++++++++++---------
>   1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 9e9904984e..8c84fbab7a 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -30,6 +30,7 @@
>   #include "qemu/sockets.h"
>   #include "qemu/bitmap.h"
>   #include "qemu/coroutine.h"
> +#include "qemu/qemu-print.h"
>   #include "exec/memory.h"
>   #include <sys/socket.h>
>   #include <netdb.h>
> @@ -742,24 +743,25 @@ static void qemu_rdma_dump_id(const char *who, struct ibv_context *verbs)
>       struct ibv_port_attr port;
>   
>       if (ibv_query_port(verbs, 1, &port)) {
> -        error_report("Failed to query port information");
> +        qemu_printf("%s RDMA Device opened, but can't query port information",
> +                    who);


'\n' newline is missing ?


>           return;
>       }
>   
> -    printf("%s RDMA Device opened: kernel name %s "
> -           "uverbs device name %s, "
> -           "infiniband_verbs class device path %s, "
> -           "infiniband class device path %s, "
> -           "transport: (%d) %s\n",
> +    qemu_printf("%s RDMA Device opened: kernel name %s "
> +                "uverbs device name %s, "
> +                "infiniband_verbs class device path %s, "
> +                "infiniband class device path %s, "
> +                "transport: (%d) %s\n",
>                   who,
>                   verbs->device->name,
>                   verbs->device->dev_name,
>                   verbs->device->dev_path,
>                   verbs->device->ibdev_path,
>                   port.link_layer,
> -                (port.link_layer == IBV_LINK_LAYER_INFINIBAND) ? "Infiniband" :
> -                 ((port.link_layer == IBV_LINK_LAYER_ETHERNET)
> -                    ? "Ethernet" : "Unknown"));
> +                port.link_layer == IBV_LINK_LAYER_INFINIBAND ? "Infiniband"
> +                : port.link_layer == IBV_LINK_LAYER_ETHERNET ? "Ethernet"
> +                : "Unknown");


Most of the time, these messages are not needed, so i would prefer to put it to the trace instead.



>   }
>   
>   /*

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 43/52] migration/rdma: Convert qemu_rdma_alloc_pd_cq() to Error
  2023-09-26  6:41     ` Markus Armbruster
@ 2023-09-26  6:55       ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26  6:55 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, quintela, peterx, leobras



On 26/09/2023 14:41, Markus Armbruster wrote:
> "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:
> 
>> On 18/09/2023 22:41, Markus Armbruster wrote:
>>> Functions that use an Error **errp parameter to return errors should
>>> not also report them to the user, because reporting is the caller's
>>> job.  When the caller does, the error is reported twice.  When it
>>> doesn't (because it recovered from the error), there is no error to
>>> report, i.e. the report is bogus.
>>>
>>> qemu_rdma_source_init() violates this principle: it calls
>>> error_report() via qemu_rdma_alloc_pd_cq().  I elected not to
>>> investigate how callers handle the error, i.e. precise impact is not
>>> known.
>>>
>>> Clean this up by converting qemu_rdma_alloc_pd_cq() to Error.
>>>
>>> Signed-off-by: Markus Armbruster<armbru@redhat.com>
>>> ---
>>>    migration/rdma.c | 27 ++++++++++++++-------------
>>>    1 file changed, 14 insertions(+), 13 deletions(-)
>>
>> [...]
>>
>>
>>> @@ -2451,6 +2451,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>>>    
>>>    static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
>>>    {
>>> +    ERRP_GUARD();
>>>        int ret, idx;
>>>    
>>>        /*
>>> @@ -2464,12 +2465,12 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
>>>            goto err_rdma_source_init;
>>>        }
>>>    
>>> -    ret = qemu_rdma_alloc_pd_cq(rdma);
>>> +    ret = qemu_rdma_alloc_pd_cq(rdma, errp);
>>>        if (ret < 0) {
>>> -        error_setg(errp, "RDMA ERROR: "
>>> -                   "rdma migration: error allocating pd and cq! Your mlock()"
>>> -                   " limits may be too low. Please check $ ulimit -a # and "
>>> -                   "search for 'ulimit -l' in the output");
>>> +        error_append_hint(errp,
>>> +                          "Your mlock() limits may be too low. "
>>> +                          "Please check $ ulimit -a # and "
>>> +                          "search for 'ulimit -l' in the output\n");
>>
>>
>> I think we could freely remove this error message as well, it may neither a exact resolution
>> nor some one will take care. Just report the error qemu_rdma_alloc_pd_cq() tell us.
> 
> Double-checking: you recommend to drop error_append_hint()?


Yes





> 
>> Anyway
>>
>> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
>>
>>
>>>            goto err_rdma_source_init;
>>>        }
> 
> Thanks!
> 

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 52/52] migration/rdma: Fix how we show device details on open
  2023-09-26  6:49   ` Zhijian Li (Fujitsu)
@ 2023-09-26  9:19     ` Markus Armbruster
  0 siblings, 0 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-26  9:19 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu); +Cc: qemu-devel, quintela, peterx, leobras

"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> On 18/09/2023 22:42, Markus Armbruster wrote:
>> qemu_rdma_dump_id() dumps RDMA device details to stdout.
>> 
>> rdma_start_outgoing_migration() calls it via qemu_rdma_source_init()
>> and qemu_rdma_resolve_host() to show source device details.
>> rdma_start_incoming_migration() arranges its call via
>> rdma_accept_incoming_migration() and qemu_rdma_accept() to show
>> destination device details.
>> 
>> Two issues:
>> 
>> 1. rdma_start_outgoing_migration() can run in HMP context.  The
>>     information should arguably go the monitor, not stdout.
>> 
>> 2. ibv_query_port() failure is reported as error.  Its callers remain
>>     unaware of this failure (qemu_rdma_dump_id() can't fail), so
>>     reporting this to the user as an error is problematic.
>> 
>> Use qemu_printf() instead of printf() and error_report().
>> 
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   migration/rdma.c | 20 +++++++++++---------
>>   1 file changed, 11 insertions(+), 9 deletions(-)
>> 
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index 9e9904984e..8c84fbab7a 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -30,6 +30,7 @@
>>   #include "qemu/sockets.h"
>>   #include "qemu/bitmap.h"
>>   #include "qemu/coroutine.h"
>> +#include "qemu/qemu-print.h"
>>   #include "exec/memory.h"
>>   #include <sys/socket.h>
>>   #include <netdb.h>
>> @@ -742,24 +743,25 @@ static void qemu_rdma_dump_id(const char *who, struct ibv_context *verbs)
>>       struct ibv_port_attr port;
>>   
>>       if (ibv_query_port(verbs, 1, &port)) {
>> -        error_report("Failed to query port information");
>> +        qemu_printf("%s RDMA Device opened, but can't query port information",
>> +                    who);
>
>
> '\n' newline is missing ?

Yes.

>>           return;
>>       }
>>   
>> -    printf("%s RDMA Device opened: kernel name %s "
>> -           "uverbs device name %s, "
>> -           "infiniband_verbs class device path %s, "
>> -           "infiniband class device path %s, "
>> -           "transport: (%d) %s\n",
>> +    qemu_printf("%s RDMA Device opened: kernel name %s "
>> +                "uverbs device name %s, "
>> +                "infiniband_verbs class device path %s, "
>> +                "infiniband class device path %s, "
>> +                "transport: (%d) %s\n",
>>                   who,
>>                   verbs->device->name,
>>                   verbs->device->dev_name,
>>                   verbs->device->dev_path,
>>                   verbs->device->ibdev_path,
>>                   port.link_layer,
>> -                (port.link_layer == IBV_LINK_LAYER_INFINIBAND) ? "Infiniband" :
>> -                 ((port.link_layer == IBV_LINK_LAYER_ETHERNET)
>> -                    ? "Ethernet" : "Unknown"));
>> +                port.link_layer == IBV_LINK_LAYER_INFINIBAND ? "Infiniband"
>> +                : port.link_layer == IBV_LINK_LAYER_ETHERNET ? "Ethernet"
>> +                : "Unknown");
>
>
> Most of the time, these messages are not needed, so i would prefer to put it to the trace instead.

Makes sense.

>>   }
>>   
>>   /*

Thanks!



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error
  2023-09-26  5:55     ` Zhijian Li (Fujitsu)
@ 2023-09-26  9:26       ` Markus Armbruster
  2023-09-27 11:46         ` Markus Armbruster
  0 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-26  9:26 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu); +Cc: qemu-devel, quintela, peterx, leobras

"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> On 26/09/2023 13:50, Li Zhijian wrote:
>> 
>> 
>> On 18/09/2023 22:41, Markus Armbruster wrote:
>>> Functions that use an Error **errp parameter to return errors should
>>> not also report them to the user, because reporting is the caller's
>>> job.  When the caller does, the error is reported twice.  When it
>>> doesn't (because it recovered from the error), there is no error to
>>> report, i.e. the report is bogus.
>>>
>>> qemu_rdma_write_flush() violates this principle: it calls
>>> error_report() via qemu_rdma_write_one().  I elected not to
>>> investigate how callers handle the error, i.e. precise impact is not
>>> known.
>>>
>>> Clean this up by converting qemu_rdma_write_one() to Error.
>>>
>>> Signed-off-by: Markus Armbruster<armbru@redhat.com>
>>> ---
>>>   migration/rdma.c | 25 +++++++++++--------------
>>>   1 file changed, 11 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/migration/rdma.c b/migration/rdma.c
>>> index c3c33fe242..9b8cbadfcd 100644
>>> --- a/migration/rdma.c
>>> +++ b/migration/rdma.c
>>> @@ -2019,9 +2019,8 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
>>>    */
>>>   static int qemu_rdma_write_one(QEMUFile *f, RDMAContext *rdma,
>>>                                  int current_index, uint64_t current_addr,
>>> -                               uint64_t length)
>>> +                               uint64_t length, Error **errp)
>>>   {
>>> -    Error *err = NULL;
>>>       struct ibv_sge sge;
>>>       struct ibv_send_wr send_wr = { 0 };
>>>       struct ibv_send_wr *bad_wr;
>> 
>> [...]
>> 
>>>           }
>>> @@ -2219,7 +2216,7 @@ retry:
>>>           goto retry;
>>>       } else if (ret > 0) {
>>> -        perror("rdma migration: post rdma write failed");
>>> +        error_setg(errp, "rdma migration: post rdma write failed");
>> 
>> It reminds that do you miss to use error_setg_errno() instead.
>> 
>
> Answer it myself:
> ibv_post_send(3) says:
>
> RETURN VALUE
>         ibv_post_send() returns 0 on success, or the value of errno on failure (which indicates the failure reason).

I read this as "assign error code to errno and return it."  But...

> the global error is not defined here.

... your assertion made me check the source code, and it looks like it
does *not* assign to errno, at least not reliably.  Which means perror()
prints garbage.

I'll delete the perror() in a separate patch.

>>>           return -1;
>>>       }



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 50/52] migration/rdma: Silence qemu_rdma_cleanup()
  2023-09-18 14:42 ` [PATCH 50/52] migration/rdma: Silence qemu_rdma_cleanup() Markus Armbruster
@ 2023-09-26 10:12   ` Zhijian Li (Fujitsu)
  2023-09-26 11:52     ` Markus Armbruster
  0 siblings, 1 reply; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26 10:12 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:42, Markus Armbruster wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
> 
> qemu_rdma_source_init(), qemu_rdma_connect(),
> rdma_start_incoming_migration(), and rdma_start_outgoing_migration()
> violate this principle: they call error_report() via
> qemu_rdma_cleanup().
> 
> Moreover, qemu_rdma_cleanup() can't fail.  It is called on error
> paths, and QIOChannel close and finalization.  Are the conditions it
> reports really errors?  I doubt it.

I'm not very sure, it's fine if it's call from the error path. but when
the caller is migration_cancle from HMP/QMP, shall we report something more
though we know QEMU can recover.

maybe change to warning etc...




> 
> Clean this up: silence qemu_rdma_cleanup().  I believe that's fine for
> all these callers.  If it isn't, we need to convert to Error instead.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   migration/rdma.c | 6 +-----
>   1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index d9f80ef390..be2db7946d 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2330,7 +2330,6 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
>   
>   static void qemu_rdma_cleanup(RDMAContext *rdma)
>   {
> -    Error *err = NULL;
>       int idx;
>   
>       if (rdma->cm_id && rdma->connected) {
> @@ -2341,10 +2340,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>                                          .type = RDMA_CONTROL_ERROR,
>                                          .repeat = 1,
>                                        };
> -            error_report("Early error. Sending error.");
> -            if (qemu_rdma_post_send_control(rdma, NULL, &head, &err) < 0) {
> -                error_report_err(err);
> -            }
> +            qemu_rdma_post_send_control(rdma, NULL, &head, NULL);
>           }
>   
>           rdma_disconnect(rdma->cm_id);

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 24/52] migration/rdma: Return -1 instead of negative errno code
  2023-09-18 14:41 ` [PATCH 24/52] migration/rdma: Return -1 instead of negative errno code Markus Armbruster
@ 2023-09-26 10:15   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26 10:15 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> Several functions return negative errno codes on failure.  Callers
> check for specific codes exactly never.  For some of the functions,
> callers couldn't check even if they wanted to, because the functions
> also return negative values that aren't errno codes, leaving readers
> confused on what the function actually returns.
> 
> Clean up and simplify: return -1 instead of negative errno code.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>


Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> ---
>   migration/rdma.c | 44 ++++++++++++++++++++++----------------------
>   1 file changed, 22 insertions(+), 22 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index efbb3c7754..d0af258468 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -857,14 +857,14 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
>                   } else {
>                       error_setg_errno(errp, errno,
>                                        "could not open RDMA device context");
> -                    return -EINVAL;
> +                    return -1;
>                   }
>               }
>   
>               if (ibv_query_port(verbs, 1, &port_attr)) {
>                   ibv_close_device(verbs);
>                   ERROR(errp, "Could not query initial IB port");
> -                return -EINVAL;
> +                return -1;
>               }
>   
>               if (port_attr.link_layer == IBV_LINK_LAYER_INFINIBAND) {
> @@ -889,7 +889,7 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
>                   ERROR(errp, "You only have RoCE / iWARP devices in your systems"
>                               " and your management software has specified '[::]'"
>                               ", but IPv6 over RoCE / iWARP is not supported in Linux.");
> -                return -ENONET;
> +                return -1;
>               }
>           }
>   
> @@ -905,13 +905,13 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
>       /* IB ports start with 1, not 0 */
>       if (ibv_query_port(verbs, 1, &port_attr)) {
>           ERROR(errp, "Could not query initial IB port");
> -        return -EINVAL;
> +        return -1;
>       }
>   
>       if (port_attr.link_layer == IBV_LINK_LAYER_ETHERNET) {
>           ERROR(errp, "Linux kernel's RoCE / iWARP does not support IPv6 "
>                       "(but patches on linux-rdma in progress)");
> -        return -ENONET;
> +        return -1;
>       }
>   
>   #endif
> @@ -1409,7 +1409,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
>   
>           if (ret != 0) {
>               perror("unregistration chunk failed");
> -            return -ret;
> +            return -1;
>           }
>           rdma->total_registrations--;
>   
> @@ -1554,7 +1554,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
>                       if (ret) {
>                           error_report("failed to get cm event while wait "
>                                        "completion channel");
> -                        return -EPIPE;
> +                        return -1;
>                       }
>   
>                       error_report("receive cm event while wait comp channel,"
> @@ -1562,7 +1562,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
>                       if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
>                           cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
>                           rdma_ack_cm_event(cm_event);
> -                        return -EPIPE;
> +                        return -1;
>                       }
>                       rdma_ack_cm_event(cm_event);
>                   }
> @@ -1575,18 +1575,18 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
>                         * I don't trust errno from qemu_poll_ns
>                        */
>                   error_report("%s: poll failed", __func__);
> -                return -EPIPE;
> +                return -1;
>               }
>   
>               if (migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) {
>                   /* Bail out and let the cancellation happen */
> -                return -EPIPE;
> +                return -1;
>               }
>           }
>       }
>   
>       if (rdma->received_error) {
> -        return -EPIPE;
> +        return -1;
>       }
>       return -!!rdma->error_state;
>   }
> @@ -1751,7 +1751,7 @@ static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
>   
>       if (ret > 0) {
>           error_report("Failed to use post IB SEND for control");
> -        return -ret;
> +        return -1;
>       }
>   
>       ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_SEND_CONTROL, NULL);
> @@ -1820,15 +1820,15 @@ static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
>           if (head->type == RDMA_CONTROL_ERROR) {
>               rdma->received_error = true;
>           }
> -        return -EIO;
> +        return -1;
>       }
>       if (head->len > RDMA_CONTROL_MAX_BUFFER - sizeof(*head)) {
>           error_report("too long length: %d", head->len);
> -        return -EINVAL;
> +        return -1;
>       }
>       if (sizeof(*head) + head->len != byte_len) {
>           error_report("Malformed length: %d byte_len %d", head->len, byte_len);
> -        return -EINVAL;
> +        return -1;
>       }
>   
>       return 0;
> @@ -2092,7 +2092,7 @@ retry:
>                                   (uint8_t *) &comp, NULL, NULL, NULL);
>   
>                   if (ret < 0) {
> -                    return -EIO;
> +                    return -1;
>                   }
>   
>                   stat64_add(&mig_stats.zero_pages,
> @@ -2127,7 +2127,7 @@ retry:
>                                                   &sge.lkey, NULL, chunk,
>                                                   chunk_start, chunk_end)) {
>                   error_report("cannot get lkey");
> -                return -EINVAL;
> +                return -1;
>               }
>   
>               reg_result = (RDMARegisterResult *)
> @@ -2146,7 +2146,7 @@ retry:
>                                                   &sge.lkey, NULL, chunk,
>                                                   chunk_start, chunk_end)) {
>                   error_report("cannot get lkey!");
> -                return -EINVAL;
> +                return -1;
>               }
>           }
>   
> @@ -2158,7 +2158,7 @@ retry:
>                                                        &sge.lkey, NULL, chunk,
>                                                        chunk_start, chunk_end)) {
>               error_report("cannot get lkey!");
> -            return -EINVAL;
> +            return -1;
>           }
>       }
>   
> @@ -2200,7 +2200,7 @@ retry:
>   
>       } else if (ret > 0) {
>           perror("rdma migration: post rdma write failed");
> -        return -ret;
> +        return -1;
>       }
>   
>       set_bit(chunk, block->transit_bitmap);
> @@ -2920,14 +2920,14 @@ static int qemu_rdma_drain_cq(QEMUFile *f, RDMAContext *rdma)
>       int ret;
>   
>       if (qemu_rdma_write_flush(f, rdma) < 0) {
> -        return -EIO;
> +        return -1;
>       }
>   
>       while (rdma->nb_sent) {
>           ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL);
>           if (ret < 0) {
>               error_report("rdma migration: complete polling error!");
> -            return -EIO;
> +            return -1;
>           }
>       }
>   

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 25/52] migration/rdma: Dumb down remaining int error values to -1
  2023-09-18 14:41 ` [PATCH 25/52] migration/rdma: Dumb down remaining int error values to -1 Markus Armbruster
@ 2023-09-26 10:16   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26 10:16 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras



On 18/09/2023 22:41, Markus Armbruster wrote:
> This is just to make the error value more obvious.  Callers don't
> mind.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 26/52] migration/rdma: Replace int error_state by bool errored
  2023-09-25  7:09     ` Markus Armbruster
@ 2023-09-26 10:18       ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-26 10:18 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, quintela, peterx, leobras



On 25/09/2023 15:09, Markus Armbruster wrote:
> "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:
> 
>> On 18/09/2023 22:41, Markus Armbruster wrote:
>>> All we do with the value of RDMAContext member @error_state is test
>>> whether it's zero.  Change to bool and rename to @errored.
>>>
>>
>> make sense!
>>
>> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
>>
>> Can we move this patch ahead "[PATCH 23/52] migration/rdma: Clean up qemu_rdma_wait_comp_channel()'s error value",
>> so that [23/52] [24/52] [25/52] will be more easy to review.
> 
> I think I could squash PATCH 23 into "[PATCH 25/52] migration/rdma: Dumb
> down remaining int error values to -1".  Would that work for you?

Yeah~, thank you


> 

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 50/52] migration/rdma: Silence qemu_rdma_cleanup()
  2023-09-26 10:12   ` Zhijian Li (Fujitsu)
@ 2023-09-26 11:52     ` Markus Armbruster
  2023-09-27  1:41       ` Zhijian Li (Fujitsu)
  0 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-09-26 11:52 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu); +Cc: qemu-devel, quintela, peterx, leobras

"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> On 18/09/2023 22:42, Markus Armbruster wrote:
>> Functions that use an Error **errp parameter to return errors should
>> not also report them to the user, because reporting is the caller's
>> job.  When the caller does, the error is reported twice.  When it
>> doesn't (because it recovered from the error), there is no error to
>> report, i.e. the report is bogus.
>> 
>> qemu_rdma_source_init(), qemu_rdma_connect(),
>> rdma_start_incoming_migration(), and rdma_start_outgoing_migration()
>> violate this principle: they call error_report() via
>> qemu_rdma_cleanup().
>> 
>> Moreover, qemu_rdma_cleanup() can't fail.  It is called on error
>> paths, and QIOChannel close and finalization.  Are the conditions it
>> reports really errors?  I doubt it.
>
> I'm not very sure, it's fine if it's call from the error path. but when
> the caller is migration_cancle from HMP/QMP, shall we report something more
> though we know QEMU can recover.
>
> maybe change to warning etc...

The part I'm sure about is that reporting an error to the user is wrong
when we actually recover from the error.  Which qemu_rdma_cleanup()
does.

I'm not sure whether the (complicated!) condition that triggers
qemu_rdma_cleanup()'s ill-advised error report needs to be reported in
some other form.  The remainder of the function ignores failure...

If you think we should to downgrade the error to a warning, and no
maintainer disagrees, then I'll downgrade.  Do you?

>> Clean this up: silence qemu_rdma_cleanup().  I believe that's fine for
>> all these callers.  If it isn't, we need to convert to Error instead.
>> 
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   migration/rdma.c | 6 +-----
>>   1 file changed, 1 insertion(+), 5 deletions(-)
>> 
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index d9f80ef390..be2db7946d 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -2330,7 +2330,6 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
>>   
>>   static void qemu_rdma_cleanup(RDMAContext *rdma)
>>   {
>> -    Error *err = NULL;
>>       int idx;
>>   
>>       if (rdma->cm_id && rdma->connected) {
>> @@ -2341,10 +2340,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>>                                          .type = RDMA_CONTROL_ERROR,
>>                                          .repeat = 1,
>>                                        };
>> -            error_report("Early error. Sending error.");
>> -            if (qemu_rdma_post_send_control(rdma, NULL, &head, &err) < 0) {
>> -                error_report_err(err);
>> -            }
>> +            qemu_rdma_post_send_control(rdma, NULL, &head, NULL);
>>           }
>>   
>>           rdma_disconnect(rdma->cm_id);



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 50/52] migration/rdma: Silence qemu_rdma_cleanup()
  2023-09-26 11:52     ` Markus Armbruster
@ 2023-09-27  1:41       ` Zhijian Li (Fujitsu)
  2023-09-27  5:30         ` Markus Armbruster
  0 siblings, 1 reply; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-27  1:41 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, quintela, peterx, leobras



On 26/09/2023 19:52, Markus Armbruster wrote:
> "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:
> 
>> On 18/09/2023 22:42, Markus Armbruster wrote:
>>> Functions that use an Error **errp parameter to return errors should
>>> not also report them to the user, because reporting is the caller's
>>> job.  When the caller does, the error is reported twice.  When it
>>> doesn't (because it recovered from the error), there is no error to
>>> report, i.e. the report is bogus.
>>>
>>> qemu_rdma_source_init(), qemu_rdma_connect(),
>>> rdma_start_incoming_migration(), and rdma_start_outgoing_migration()
>>> violate this principle: they call error_report() via
>>> qemu_rdma_cleanup().
>>>
>>> Moreover, qemu_rdma_cleanup() can't fail.  It is called on error
>>> paths, and QIOChannel close and finalization.  Are the conditions it
>>> reports really errors?  I doubt it.
>>
>> I'm not very sure, it's fine if it's call from the error path. but when
>> the caller is migration_cancle from HMP/QMP, shall we report something more
>> though we know QEMU can recover.
>>
>> maybe change to warning etc...
> 
> The part I'm sure about is that reporting an error to the user is wrong
> when we actually recover from the error.  Which qemu_rdma_cleanup()
> does.

Yes, i have no doubt about this.


> 
> I'm not sure whether the (complicated!) condition that triggers
> qemu_rdma_cleanup()'s ill-advised error report needs to be reported in
> some other form.  The remainder of the function ignores failure...
> 
> If you think we should to downgrade the error to a warning, and no
> maintainer disagrees, then I'll downgrade.  Do you?

Yes, I'd like downgrade error to a warning.


Thanks
Zhijian

> 
>>> Clean this up: silence qemu_rdma_cleanup().  I believe that's fine for
>>> all these callers.  If it isn't, we need to convert to Error instead.
>>>
>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>> ---
>>>    migration/rdma.c | 6 +-----
>>>    1 file changed, 1 insertion(+), 5 deletions(-)
>>>
>>> diff --git a/migration/rdma.c b/migration/rdma.c
>>> index d9f80ef390..be2db7946d 100644
>>> --- a/migration/rdma.c
>>> +++ b/migration/rdma.c
>>> @@ -2330,7 +2330,6 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
>>>    
>>>    static void qemu_rdma_cleanup(RDMAContext *rdma)
>>>    {
>>> -    Error *err = NULL;
>>>        int idx;
>>>    
>>>        if (rdma->cm_id && rdma->connected) {
>>> @@ -2341,10 +2340,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>>>                                           .type = RDMA_CONTROL_ERROR,
>>>                                           .repeat = 1,
>>>                                         };
>>> -            error_report("Early error. Sending error.");
>>> -            if (qemu_rdma_post_send_control(rdma, NULL, &head, &err) < 0) {
>>> -                error_report_err(err);
>>> -            }
>>> +            qemu_rdma_post_send_control(rdma, NULL, &head, NULL);
>>>            }
>>>    
>>>            rdma_disconnect(rdma->cm_id);
> 

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 50/52] migration/rdma: Silence qemu_rdma_cleanup()
  2023-09-27  1:41       ` Zhijian Li (Fujitsu)
@ 2023-09-27  5:30         ` Markus Armbruster
  0 siblings, 0 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-27  5:30 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu); +Cc: qemu-devel, quintela, peterx, leobras

"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> On 26/09/2023 19:52, Markus Armbruster wrote:
>> "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:
>> 
>>> On 18/09/2023 22:42, Markus Armbruster wrote:
>>>> Functions that use an Error **errp parameter to return errors should
>>>> not also report them to the user, because reporting is the caller's
>>>> job.  When the caller does, the error is reported twice.  When it
>>>> doesn't (because it recovered from the error), there is no error to
>>>> report, i.e. the report is bogus.
>>>>
>>>> qemu_rdma_source_init(), qemu_rdma_connect(),
>>>> rdma_start_incoming_migration(), and rdma_start_outgoing_migration()
>>>> violate this principle: they call error_report() via
>>>> qemu_rdma_cleanup().
>>>>
>>>> Moreover, qemu_rdma_cleanup() can't fail.  It is called on error
>>>> paths, and QIOChannel close and finalization.  Are the conditions it
>>>> reports really errors?  I doubt it.
>>>
>>> I'm not very sure, it's fine if it's call from the error path. but when
>>> the caller is migration_cancle from HMP/QMP, shall we report something more
>>> though we know QEMU can recover.
>>>
>>> maybe change to warning etc...
>> 
>> The part I'm sure about is that reporting an error to the user is wrong
>> when we actually recover from the error.  Which qemu_rdma_cleanup()
>> does.
>
> Yes, i have no doubt about this.
>
>
>> 
>> I'm not sure whether the (complicated!) condition that triggers
>> qemu_rdma_cleanup()'s ill-advised error report needs to be reported in
>> some other form.  The remainder of the function ignores failure...
>> 
>> If you think we should to downgrade the error to a warning, and no
>> maintainer disagrees, then I'll downgrade.  Do you?
>
> Yes, I'd like downgrade error to a warning.

Got it, thanks!



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error
  2023-09-26  9:26       ` Markus Armbruster
@ 2023-09-27 11:46         ` Markus Armbruster
  2023-09-28  6:49           ` Markus Armbruster
  2023-10-07  3:40           ` Zhijian Li (Fujitsu)
  0 siblings, 2 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-27 11:46 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu); +Cc: qemu-devel, quintela, peterx, leobras

migration/rdma.c uses errno directly or via perror() after the following
functions:

* poll()

  POSIX specifies errno is set on error.  Good.

* rdma_get_cm_event(), rdma_connect(), rdma_get_cm_event()

  Manual page promises "if an error occurs, errno will be set".  Good.

* ibv_open_device()

  Manual page does not mention errno.  Using it seems ill-advised.

  qemu_rdma_broken_ipv6_kernel() recovers from EPERM by trying the next
  device.  Wrong if ibv_open_device() doesn't actually set errno.

  What is to be done here?

* ibv_reg_mr()

  Manual page does not mention errno.  Using it seems ill-advised.

  qemu_rdma_reg_whole_ram_blocks() and qemu_rdma_register_and_get_keys()
  recover from errno = ENOTSUP by retrying with modified @access
  argument.  Wrong if ibv_reg_mr() doesn't actually set errno.

  What is to be done here?

* ibv_get_cq_event()

  Manual page does not mention errno.  Using it seems ill-advised.

  qemu_rdma_block_for_wrid() calls perror().  Removed in PATCH 48.  Good
  enough.

* ibv_post_send()

  Manual page has the function return "the value of errno on failure".
  Sounds like it sets errno to the value it returns.  However, the
  rdma-core repository defines it as

    static inline int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr,
                                    struct ibv_send_wr **bad_wr)
    {
            return qp->context->ops.post_send(qp, wr, bad_wr);
    }

  and at least one of the methods fails without setting errno:

    static int mana_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
                              struct ibv_send_wr **bad)
    {
            /* This version of driver supports RAW QP only.
             * Posting WR is done directly in the application.
             */
            return EOPNOTSUPP;
    }

  qemu_rdma_write_one() calls perror().  PATCH 39 (this one) replaces it
  by error_setg(), not error_setg_errno().  Seems prudent, but should be
  called out in the commit message.

* ibv_advise_mr()

  Manual page has the function return "the value of errno on failure".
  Sounds like it sets errno to the value it returns, but my findings for
  ibv_post_send() make me doubt it.

  qemu_rdma_advise_prefetch_mr() traces strerror(errno).  Could be
  misleading.  Drop that part?

* ibv_dereg_mr()

  Manual page has the function return "the value of errno on failure".
  Sounds like it sets errno to the value it returns, but my findings for
  ibv_post_send() make me doubt it.

  qemu_rdma_unregister_waiting() calls perror().  Removed in PATCH 51.
  Good enough.

* qemu_get_cm_event_timeout()

  Can fail without setting errno.

  qemu_rdma_connect() calls perror().  Removed in PATCH 45.  Good
  enough.

Thoughts?


[...]

[*] https://github.com/linux-rdma/rdma-core.git
    commit 55fa316b4b18f258d8ac1ceb4aa5a7a35b094dcf



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 51/52] migration/rdma: Use error_report() & friends instead of stderr
  2023-09-26  6:35   ` Zhijian Li (Fujitsu)
@ 2023-09-27 12:16     ` Markus Armbruster
  0 siblings, 0 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-27 12:16 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu); +Cc: qemu-devel, quintela, peterx, leobras

"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> On 18/09/2023 22:42, Markus Armbruster wrote:
>>           if (local->nb_blocks != nb_dest_blocks) {
>> -            fprintf(stderr, "ram blocks mismatch (Number of blocks %d vs %d) "
>> -                    "Your QEMU command line parameters are probably "
>> -                    "not identical on both the source and destination.",
>> -                    local->nb_blocks, nb_dest_blocks);
>> +            error_report("ram blocks mismatch (Number of blocks %d vs %d)",
>> +                         local->nb_blocks, nb_dest_blocks);
>> +            error_printf("Your QEMU command line parameters are probably "
>> +                         "not identical on both the source and destination.");
>
>
> Why is this one handled specifically? It seems like it would be fine with error_report().

Error message before the patch:

    ram blocks mismatch (Number of blocks %d vs %d) Your QEMU command line parameters are probably not identical on both the source and destination.

Afterwards:

    <ERROR-MSG-PREFIX>ram blocks mismatch (Number of blocks %d vs %d)
    Your QEMU command line parameters are probably not identical on both the source and destination.

where <ERROR-MSG-PREFIX> shows current time (if enabled), guest name (if
enabled), program name (unless in monitor context), and "location" (if
we have one).

The first line is the error message.  It's consists of our common error
message prefix and a single phrase without trailing punctuation.

The second line is a hint.  Repeating the error message prefix there
is unnecessary.  Not repeating it there makes it more obvious that it's
not an error of its own, merely additional information.

Splitting the old message into error message and hint makes the error
message conform to the "single phrase" convention.

Makes sense?



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 10/52] migration/rdma: Eliminate error_propagate()
  2023-09-18 14:41 ` [PATCH 10/52] migration/rdma: Eliminate error_propagate() Markus Armbruster
  2023-09-18 17:20   ` Fabiano Rosas
  2023-09-21  9:31   ` Zhijian Li (Fujitsu)
@ 2023-09-27 16:20   ` Eric Blake
  2023-09-27 19:02     ` Markus Armbruster
  2 siblings, 1 reply; 174+ messages in thread
From: Eric Blake @ 2023-09-27 16:20 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, quintela, peterx, leobras

On Mon, Sep 18, 2023 at 04:41:24PM +0200, Markus Armbruster wrote:
> When all we do with an Error we receive into a local variable is
> propagating to somewhere else, we can just as well receive it there
> right away.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  
>      ret = qemu_rdma_alloc_pd_cq(rdma);
>      if (ret) {
> -        ERROR(temp, "rdma migration: error allocating pd and cq! Your mlock()"
> +        ERROR(errp, "rdma migration: error allocating pd and cq! Your mlock()"
>                      " limits may be too low. Please check $ ulimit -a # and "
>                      "search for 'ulimit -l' in the output");

Not this patch's problem, but noticing it while here:

it would help if we had a consistent style on whether to break long
strings after the space instead of carrying the space to the next
line, rather than using both styles in the same concatenated string.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 13/52] migration/rdma: Make qemu_rdma_buffer_mergable() return bool
  2023-09-18 14:41 ` [PATCH 13/52] migration/rdma: Make qemu_rdma_buffer_mergable() return bool Markus Armbruster
  2023-09-18 17:36   ` Fabiano Rosas
  2023-09-22  7:51   ` Zhijian Li (Fujitsu)
@ 2023-09-27 16:26   ` Eric Blake
  2023-09-27 19:04     ` Markus Armbruster
  2 siblings, 1 reply; 174+ messages in thread
From: Eric Blake @ 2023-09-27 16:26 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, quintela, peterx, leobras

On Mon, Sep 18, 2023 at 04:41:27PM +0200, Markus Armbruster wrote:
> qemu_rdma_buffer_mergable() is semantically a predicate.  It returns
> int 0 or 1.  Return bool instead.

While at it, this would be a perfect time to s/mergable/mergeable/g

> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  migration/rdma.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 26/52] migration/rdma: Replace int error_state by bool errored
  2023-09-18 14:41 ` [PATCH 26/52] migration/rdma: Replace int error_state by bool errored Markus Armbruster
  2023-09-25  6:17   ` Zhijian Li (Fujitsu)
@ 2023-09-27 17:38   ` Eric Blake
  2023-09-27 19:04     ` Markus Armbruster
  1 sibling, 1 reply; 174+ messages in thread
From: Eric Blake @ 2023-09-27 17:38 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, quintela, peterx, leobras

On Mon, Sep 18, 2023 at 04:41:40PM +0200, Markus Armbruster wrote:
> All we do with the value of RDMAContext member @error_state is test
> whether it's zero.  Change to bool and rename to @errored.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  migration/rdma.c | 66 ++++++++++++++++++++++++------------------------
>  1 file changed, 33 insertions(+), 33 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index ad314cc10a..85f6b274bf 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -352,7 +352,7 @@ typedef struct RDMAContext {
>       * memory registration, then do not attempt any future work
>       * and remember the error state.
>       */
> -    int error_state;
> +    int errored;

Commit message says 'change to bool', but this is still int.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 10/52] migration/rdma: Eliminate error_propagate()
  2023-09-27 16:20   ` Eric Blake
@ 2023-09-27 19:02     ` Markus Armbruster
  0 siblings, 0 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-27 19:02 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, quintela, peterx, leobras

Eric Blake <eblake@redhat.com> writes:

> On Mon, Sep 18, 2023 at 04:41:24PM +0200, Markus Armbruster wrote:
>> When all we do with an Error we receive into a local variable is
>> propagating to somewhere else, we can just as well receive it there
>> right away.
>> 
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  
>>      ret = qemu_rdma_alloc_pd_cq(rdma);
>>      if (ret) {
>> -        ERROR(temp, "rdma migration: error allocating pd and cq! Your mlock()"
>> +        ERROR(errp, "rdma migration: error allocating pd and cq! Your mlock()"
>>                      " limits may be too low. Please check $ ulimit -a # and "
>>                      "search for 'ulimit -l' in the output");
>
> Not this patch's problem, but noticing it while here:
>
> it would help if we had a consistent style on whether to break long
> strings after the space instead of carrying the space to the next
> line, rather than using both styles in the same concatenated string.

Oh yes.  I prefer to break lines before space, because leading space is
more visible than trailing space.

However, I elected to refrain from touching error messages in this
series.  It's long enough as it is.



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 13/52] migration/rdma: Make qemu_rdma_buffer_mergable() return bool
  2023-09-27 16:26   ` Eric Blake
@ 2023-09-27 19:04     ` Markus Armbruster
  0 siblings, 0 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-27 19:04 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, quintela, peterx, leobras

Eric Blake <eblake@redhat.com> writes:

> On Mon, Sep 18, 2023 at 04:41:27PM +0200, Markus Armbruster wrote:
>> qemu_rdma_buffer_mergable() is semantically a predicate.  It returns
>> int 0 or 1.  Return bool instead.
>
> While at it, this would be a perfect time to s/mergable/mergeable/g

Will do, thanks!



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 26/52] migration/rdma: Replace int error_state by bool errored
  2023-09-27 17:38   ` Eric Blake
@ 2023-09-27 19:04     ` Markus Armbruster
  0 siblings, 0 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-27 19:04 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, quintela, peterx, leobras

Eric Blake <eblake@redhat.com> writes:

> On Mon, Sep 18, 2023 at 04:41:40PM +0200, Markus Armbruster wrote:
>> All we do with the value of RDMAContext member @error_state is test
>> whether it's zero.  Change to bool and rename to @errored.
>> 
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  migration/rdma.c | 66 ++++++++++++++++++++++++------------------------
>>  1 file changed, 33 insertions(+), 33 deletions(-)
>> 
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index ad314cc10a..85f6b274bf 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -352,7 +352,7 @@ typedef struct RDMAContext {
>>       * memory registration, then do not attempt any future work
>>       * and remember the error state.
>>       */
>> -    int error_state;
>> +    int errored;
>
> Commit message says 'change to bool', but this is still int.

Accident, will fix, thanks!



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error
  2023-09-27 11:46         ` Markus Armbruster
@ 2023-09-28  6:49           ` Markus Armbruster
  2023-10-07  3:40           ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-28  6:49 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu); +Cc: qemu-devel, quintela, peterx, leobras

Let me try to map solutions.

Markus Armbruster <armbru@redhat.com> writes:

> migration/rdma.c uses errno directly or via perror() after the following
> functions:
>
> * poll()
>
>   POSIX specifies errno is set on error.  Good.

Nothing wrong, nothing to do.

> * rdma_get_cm_event(), rdma_connect(), rdma_get_cm_event()
>
>   Manual page promises "if an error occurs, errno will be set".  Good.

Nothing wrong, nothing to do.

> * ibv_open_device()
>
>   Manual page does not mention errno.  Using it seems ill-advised.
>
>   qemu_rdma_broken_ipv6_kernel() recovers from EPERM by trying the next
>   device.  Wrong if ibv_open_device() doesn't actually set errno.
>
>   What is to be done here?

1. Investigate whether ibv_open_device() sets errno.  I can't spare time
   for that.

2. Add a comment pointing out the problem, in the hope somebody
   investigates later.

3. Do nothing.

> * ibv_reg_mr()
>
>   Manual page does not mention errno.  Using it seems ill-advised.
>
>   qemu_rdma_reg_whole_ram_blocks() and qemu_rdma_register_and_get_keys()
>   recover from errno = ENOTSUP by retrying with modified @access
>   argument.  Wrong if ibv_reg_mr() doesn't actually set errno.
>
>   What is to be done here?

Likewise.

> * ibv_get_cq_event()
>
>   Manual page does not mention errno.  Using it seems ill-advised.
>
>   qemu_rdma_block_for_wrid() calls perror().  Removed in PATCH 48.  Good
>   enough.

1. Add a comment pointing out the problem, remove it in PATCH 48.

2. Nothing wrong after the series, nothing to do.

> * ibv_post_send()
>
>   Manual page has the function return "the value of errno on failure".
>   Sounds like it sets errno to the value it returns.  However, the
>   rdma-core repository defines it as
>
>     static inline int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr,
>                                     struct ibv_send_wr **bad_wr)
>     {
>             return qp->context->ops.post_send(qp, wr, bad_wr);
>     }
>
>   and at least one of the methods fails without setting errno:
>
>     static int mana_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
>                               struct ibv_send_wr **bad)
>     {
>             /* This version of driver supports RAW QP only.
>              * Posting WR is done directly in the application.
>              */
>             return EOPNOTSUPP;
>     }
>
>   qemu_rdma_write_one() calls perror().  PATCH 39 (this one) replaces it
>   by error_setg(), not error_setg_errno().  Seems prudent, but should be
>   called out in the commit message.

1. Add a comment pointing out the problem, remove it in PATCH 39.

2. Pass @ret, not @errno to error_setg_errno().

3. Nothing wrong after the series, nothing to do.

Since 2. is easy, let's do it.

> * ibv_advise_mr()
>
>   Manual page has the function return "the value of errno on failure".
>   Sounds like it sets errno to the value it returns, but my findings for
>   ibv_post_send() make me doubt it.
>
>   qemu_rdma_advise_prefetch_mr() traces strerror(errno).  Could be
>   misleading.  Drop that part?

1. Change sterror(errno) to strerror(ret)

2. Drop strerror(errno)

3. Do nothing.

Since 1. is easy, let's do it.

> * ibv_dereg_mr()
>
>   Manual page has the function return "the value of errno on failure".
>   Sounds like it sets errno to the value it returns, but my findings for
>   ibv_post_send() make me doubt it.
>
>   qemu_rdma_unregister_waiting() calls perror().  Removed in PATCH 51.
>   Good enough.

1. Add a comment pointing out the problem, remove it in PATCH 51.

2. Nothing wrong after the series, nothing to do.

> * qemu_get_cm_event_timeout()
>
>   Can fail without setting errno.
>
>   qemu_rdma_connect() calls perror().  Removed in PATCH 45.  Good
>   enough.
>
> Thoughts?

Considering all of the above...  I'd like to stick a patch documenting
problematic errno use early in the series, and fix all the easy ones
later in the series, leaving just the two difficult ones in
qemu_rdma_broken_ipv6_kernel() and qemu_rdma_reg_whole_ram_blocks().

Makes sense?

> [...]
>
> [*] https://github.com/linux-rdma/rdma-core.git
>     commit 55fa316b4b18f258d8ac1ceb4aa5a7a35b094dcf



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 08/52] migration/rdma: Fix qemu_rdma_accept() to return failure on errors
  2023-09-21  9:07   ` Zhijian Li (Fujitsu)
@ 2023-09-28 10:53     ` Markus Armbruster
  0 siblings, 0 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-28 10:53 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu)
  Cc: Markus Armbruster, qemu-devel, quintela, peterx, leobras

"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> On 18/09/2023 22:41, Markus Armbruster wrote:
>> qemu_rdma_accept() returns 0 in some cases even when it didn't
>> complete its job due to errors.  Impact is not obvious.  I figure the
>> caller will soon fail again with a misleading error message.
>> 
>> Fix it to return -1 on any failure.
>> 
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>
> I noticed that ret initialization is also meaningless in qemu_rdma_accept()
>
> 3354 static int qemu_rdma_accept(RDMAContext *rdma)
> 3355 {
> 3356     RDMACapabilities cap;
> 3357     struct rdma_conn_param conn_param = {
> 3358                                             .responder_resources = 2,
> 3359                                             .private_data = &cap,
> 3360                                             .private_data_len = sizeof(cap),
> 3361                                          };
> 3362     RDMAContext *rdma_return_path = NULL;
> 3363     struct rdma_cm_event *cm_event;
> 3364     struct ibv_context *verbs;
> 3365     int ret = -EINVAL;     <<<<< drop it ?
> 3366     int idx;

PATCH 27 will drop it.

> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Thanks!



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 00/52] migration/rdma: Error handling fixes
  2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
                   ` (52 preceding siblings ...)
  2023-09-19 16:49 ` [PATCH 00/52] migration/rdma: Error handling fixes Peter Xu
@ 2023-09-28 11:08 ` Markus Armbruster
  53 siblings, 0 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-09-28 11:08 UTC (permalink / raw)
  To: Li Zhijian, Fabiano Rosas; +Cc: qemu-devel, quintela, peterx, leobras

Thank you for reviewing!  Especially Li Zhijian, who went through pretty
much all the patches.  Big chunk of work, much appreciated.

Will send v2 shortly.



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 28/52] migration/rdma: Check negative error values the same way everywhere
  2023-09-25  7:29     ` Markus Armbruster
@ 2023-10-04 16:32       ` Juan Quintela
  2023-10-04 16:57         ` Peter Xu
  0 siblings, 1 reply; 174+ messages in thread
From: Juan Quintela @ 2023-10-04 16:32 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Zhijian Li (Fujitsu), qemu-devel, peterx, leobras

Markus Armbruster <armbru@redhat.com> wrote:
> "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:
>
>> On 18/09/2023 22:41, Markus Armbruster wrote:
>>> When a function returns 0 on success, negative value on error,
>>> checking for non-zero suffices, but checking for negative is clearer.
>>> So do that.
>>> 
>>
>> This patch is no my favor convention.
>
> Certainly a matter of taste, which means maintainers get to decide, not
> me.
>
> Failure checks can be confusing in C.  Is
>
>     if (foo(...))
>
> checking for success or for failure?  Impossible to tell.  If foo()
> returns a pointer, it almost certainly checks for success.  If it
> returns bool, likewise.  But if it returns an integer, it probably
> checks for failure.
>
> Getting a condition backwards is a common coding mistake.  Consider
> patch review of
>
>     if (condition) {
>         obviously the error case
>     }
>
> Patch review is more likely to catch a backward condition when the
> condition's sense is locally obvious.
>
> Conventions can help.  Here's the one I like:
>
> * Check for a function's failure the same way everywhere.
>
> * When a function returns something "truthy" on success, and something
>   "falsy" on failure, use
>
>     if (!fun(...))
>
>   Special cases:
>
>   - bool true on success, false on failure
>
>   - non-null pointer on success, null pointer on failure
>
> * When a function returns non-negative value on success, negative value
>   on failure, use
>
>     if (fun(...) < 0)
>
> * Avoid non-negative integer error values.
>
> * Avoid if (fun(...)), because it's locally ambiguous.
>
>> @Peter, Juan
>>
>> I'd like to hear your opinions.
>
> Yes, please.

I agree with Markus here for three reasons:

1 - He is my C - lawyer of reference O-)

2 - He has done a lot of work cleaning the error handling on this file,
    that was completely ugly.

3 - I fully agree that code is more maintenable this way.  I.e. if any
    function changes to return positive values for non error, we get
    better coverage.

Later, Juan.



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 28/52] migration/rdma: Check negative error values the same way everywhere
  2023-10-04 16:32       ` Juan Quintela
@ 2023-10-04 16:57         ` Peter Xu
  2023-10-05  5:45           ` Markus Armbruster
  0 siblings, 1 reply; 174+ messages in thread
From: Peter Xu @ 2023-10-04 16:57 UTC (permalink / raw)
  To: Juan Quintela
  Cc: Markus Armbruster, Zhijian Li (Fujitsu), qemu-devel, leobras

Sorry Zhijian, I missed this email.

On Wed, Oct 04, 2023 at 06:32:14PM +0200, Juan Quintela wrote:
> > * Avoid non-negative integer error values.

Perhaps we need to forbid that if doing this.

I can see Zhijian's point, where "if (ret)" can also capture unexpected
positive returns, while "if (ret < 0)" is not clear on who's handling ret>0
case.  Personally I like that, too.

> 3 - I fully agree that code is more maintenable this way.  I.e. if any
>     function changes to return positive values for non error, we get
>     better coverage.

This patch does at least try to unify error handling, though..

$ git grep "ret < 0" migration/rdma.c | wc -l
36

So we have half doing this and half doing that, makes sense to do the same.

Let's assume no vote from my side due to pros and cons, so it's 2:1. :)

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 00/52] migration/rdma: Error handling fixes
  2023-09-19 18:29   ` Daniel P. Berrangé
  2023-09-19 18:57     ` Fabiano Rosas
  2023-09-20 13:22     ` Markus Armbruster
@ 2023-10-04 18:00     ` Juan Quintela
  2023-10-05  7:14       ` Daniel P. Berrangé
  2 siblings, 1 reply; 174+ messages in thread
From: Juan Quintela @ 2023-10-04 18:00 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Peter Xu, Markus Armbruster, qemu-devel, leobras, Li Zhijian

Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Tue, Sep 19, 2023 at 12:49:46PM -0400, Peter Xu wrote:
>> On Mon, Sep 18, 2023 at 04:41:14PM +0200, Markus Armbruster wrote:
>> > Oh dear, where to start.  There's so much wrong, and in pretty obvious
>> > ways.  This code should never have passed review.  I'm refraining from
>> > saying more; see the commit messages instead.
>> > 
>> > Issues remaining after this series include:
>> > 
>> > * Terrible error messages
>> > 
>> > * Some error message cascades remain
>> > 
>> > * There is no written contract for QEMUFileHooks, and the
>> >   responsibility for reporting errors is unclear
>> 
>> Even being removed.. because no one is really extending that..
>> 
>> https://lore.kernel.org/all/20230509120700.78359-1-quintela@redhat.com/#t
>
> One day (in another 5-10 years) I still hope we'll get to
> the point where QEMUFile itself is obsolete :-)

If you see the patches on list, I have move rate limit check outside of
QEMUFile, so right now it is only a buffer to write in the main
migration thread.

> Getting
> rid of QEMUFileHooks is a great step in that direction.

Thanks.

> Me finishing a old PoC to bring buffering to QIOChannel
> would be another big step.

I want to get rid of qemu_file_set_error() and friends first.  We should
handle errors correctly when they happen, and go from there.

> The data rate limiting would be the biggest missing piece
> to enable migration/vmstate logic to directly consume
> a QIOChannel.

As said, that is done.  I have three atomic counters:

- qemu_file_bytes
- rdma_bytes
- multifd_bytes

We do the rate limiting adding that 3 counters.  The only thing that
QEMUFile does know is increase qemu_file_bytes when it needs to do it.

> Eliminating QEMUFile would help to bring Error **errp
> to all the vmstate codepaths.

Yes!

See the last problem on the list where they couldn't use
migrate_set_error() in vmstate.c because that breaks test-vmstate.c.

>> I always see rdma as "odd fixes" stage.. for a long time.  But maybe I was
>> wrong.
>
> In the MAINTAINERS file RDMA still get classified as formally
> supported under the migration maintainers.  I'm not convinced
> that is an accurate description of its status.  I tend to agree
> with you that it is 'odd fixes' at the very best.

It is not exact, we wanted to get rid of it.

> Dave Gilbert had previously speculated about whether we should
> even consider deprecating it on the basis that latest non-RDMA
> migration is too much better than in the past, with multifd
> and zerocopy, that RDMA might not even offer a significant
> enough peformance win to justify.

The main problem with RDMA is that it uses a really weird model for
migration point of view:
- everything there is asynchronous (nothing else is like that)
- it uses its own everything, i.e. abuses QEMUFile and QIOChannel to try
  to look like one, but it fails.
- Its error handling is "peculiar", to be friendly.  See this series
  from Markus to make it look normal.

Thanks, Juan.



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 28/52] migration/rdma: Check negative error values the same way everywhere
  2023-10-04 16:57         ` Peter Xu
@ 2023-10-05  5:45           ` Markus Armbruster
  2023-10-05 14:52             ` Peter Xu
  0 siblings, 1 reply; 174+ messages in thread
From: Markus Armbruster @ 2023-10-05  5:45 UTC (permalink / raw)
  To: Peter Xu; +Cc: Juan Quintela, Zhijian Li (Fujitsu), qemu-devel, leobras

Peter Xu <peterx@redhat.com> writes:

> Sorry Zhijian, I missed this email.
>
> On Wed, Oct 04, 2023 at 06:32:14PM +0200, Juan Quintela wrote:
>> > * Avoid non-negative integer error values.
>
> Perhaps we need to forbid that if doing this.
>
> I can see Zhijian's point, where "if (ret)" can also capture unexpected
> positive returns, while "if (ret < 0)" is not clear on who's handling ret>0
> case.  Personally I like that, too.

It's clear either way :)

The problem is calling a function whose contract specifies "return 0 on
success, negative value on failure".

If it returns positive value, the contract is broken, and all bets are
off.

If you check the return value like

    if (ret < 0) {
        ... handle error and fail ...
    }
    ... carry on ...

then an unexpected positive value will clearly be treated as success.

If you check it like

    if (ret) {
        ... handle error and fail ...
    }
    ... carry on ...

then it will clearly be treated as failure.

But we don't know what it is!  Treating it as success can be wrong,
treating it as failure can be just as wrong.

>> 3 - I fully agree that code is more maintenable this way.  I.e. if any
>>     function changes to return positive values for non error, we get
>>     better coverage.
>
> This patch does at least try to unify error handling, though..
>
> $ git grep "ret < 0" migration/rdma.c | wc -l
> 36
>
> So we have half doing this and half doing that, makes sense to do the same.
>
> Let's assume no vote from my side due to pros and cons, so it's 2:1. :)
>
> Thanks,

Since I'm not a maintainer here, my vote is advisory.



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 00/52] migration/rdma: Error handling fixes
  2023-10-04 18:00     ` Juan Quintela
@ 2023-10-05  7:14       ` Daniel P. Berrangé
  2023-10-31 10:25         ` Juan Quintela
  0 siblings, 1 reply; 174+ messages in thread
From: Daniel P. Berrangé @ 2023-10-05  7:14 UTC (permalink / raw)
  To: Juan Quintela
  Cc: Peter Xu, Markus Armbruster, qemu-devel, leobras, Li Zhijian

On Wed, Oct 04, 2023 at 08:00:34PM +0200, Juan Quintela wrote:
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > On Tue, Sep 19, 2023 at 12:49:46PM -0400, Peter Xu wrote:
> >> On Mon, Sep 18, 2023 at 04:41:14PM +0200, Markus Armbruster wrote:
> >> > Oh dear, where to start.  There's so much wrong, and in pretty obvious
> >> > ways.  This code should never have passed review.  I'm refraining from
> >> > saying more; see the commit messages instead.
> >> > 
> >> > Issues remaining after this series include:
> >> > 
> >> > * Terrible error messages
> >> > 
> >> > * Some error message cascades remain
> >> > 
> >> > * There is no written contract for QEMUFileHooks, and the
> >> >   responsibility for reporting errors is unclear
> >> 
> >> Even being removed.. because no one is really extending that..
> >> 
> >> https://lore.kernel.org/all/20230509120700.78359-1-quintela@redhat.com/#t
> >
> > One day (in another 5-10 years) I still hope we'll get to
> > the point where QEMUFile itself is obsolete :-)
> 
> If you see the patches on list, I have move rate limit check outside of
> QEMUFile, so right now it is only a buffer to write in the main
> migration thread.

Can you point me to that patch(es) as I'm not identifying
them yet.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 28/52] migration/rdma: Check negative error values the same way everywhere
  2023-10-05  5:45           ` Markus Armbruster
@ 2023-10-05 14:52             ` Peter Xu
  2023-10-05 16:06               ` Markus Armbruster
  0 siblings, 1 reply; 174+ messages in thread
From: Peter Xu @ 2023-10-05 14:52 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Juan Quintela, Zhijian Li (Fujitsu), qemu-devel, leobras

On Thu, Oct 05, 2023 at 07:45:00AM +0200, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > Sorry Zhijian, I missed this email.
> >
> > On Wed, Oct 04, 2023 at 06:32:14PM +0200, Juan Quintela wrote:
> >> > * Avoid non-negative integer error values.
> >
> > Perhaps we need to forbid that if doing this.
> >
> > I can see Zhijian's point, where "if (ret)" can also capture unexpected
> > positive returns, while "if (ret < 0)" is not clear on who's handling ret>0
> > case.  Personally I like that, too.
> 
> It's clear either way :)
> 
> The problem is calling a function whose contract specifies "return 0 on
> success, negative value on failure".
> 
> If it returns positive value, the contract is broken, and all bets are
> off.
> 
> If you check the return value like
> 
>     if (ret < 0) {
>         ... handle error and fail ...
>     }
>     ... carry on ...
> 
> then an unexpected positive value will clearly be treated as success.
> 
> If you check it like
> 
>     if (ret) {
>         ... handle error and fail ...
>     }
>     ... carry on ...
> 
> then it will clearly be treated as failure.
> 
> But we don't know what it is!  Treating it as success can be wrong,
> treating it as failure can be just as wrong.

Right, IMHO the major difference is when there's a bug in the retval
protocl of the API we're invoking.

With "if (ret)" we capture that protocol bug, treating it as a failure (of
that buggy API). With "if (ret<0)" we don't yet capture it, either
everything will just keep working, or something weird happens later.  Not
so predictable in this case.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 28/52] migration/rdma: Check negative error values the same way everywhere
  2023-10-05 14:52             ` Peter Xu
@ 2023-10-05 16:06               ` Markus Armbruster
  0 siblings, 0 replies; 174+ messages in thread
From: Markus Armbruster @ 2023-10-05 16:06 UTC (permalink / raw)
  To: Peter Xu; +Cc: Juan Quintela, Zhijian Li (Fujitsu), qemu-devel, leobras

Peter Xu <peterx@redhat.com> writes:

> On Thu, Oct 05, 2023 at 07:45:00AM +0200, Markus Armbruster wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > Sorry Zhijian, I missed this email.
>> >
>> > On Wed, Oct 04, 2023 at 06:32:14PM +0200, Juan Quintela wrote:
>> >> > * Avoid non-negative integer error values.
>> >
>> > Perhaps we need to forbid that if doing this.
>> >
>> > I can see Zhijian's point, where "if (ret)" can also capture unexpected
>> > positive returns, while "if (ret < 0)" is not clear on who's handling ret>0
>> > case.  Personally I like that, too.
>> 
>> It's clear either way :)
>> 
>> The problem is calling a function whose contract specifies "return 0 on
>> success, negative value on failure".
>> 
>> If it returns positive value, the contract is broken, and all bets are
>> off.
>> 
>> If you check the return value like
>> 
>>     if (ret < 0) {
>>         ... handle error and fail ...
>>     }
>>     ... carry on ...
>> 
>> then an unexpected positive value will clearly be treated as success.
>> 
>> If you check it like
>> 
>>     if (ret) {
>>         ... handle error and fail ...
>>     }
>>     ... carry on ...
>> 
>> then it will clearly be treated as failure.
>> 
>> But we don't know what it is!  Treating it as success can be wrong,
>> treating it as failure can be just as wrong.
>
> Right, IMHO the major difference is when there's a bug in the retval
> protocl of the API we're invoking.
>
> With "if (ret)" we capture that protocol bug, treating it as a failure (of
> that buggy API). With "if (ret<0)" we don't yet capture it, either
> everything will just keep working, or something weird happens later.  Not
> so predictable in this case.

I don't think misinterpreting a violation of the contract as failure is
safer than misinterpreting it as success.

Where we have reason to worry about contract violations, we should
assert() it's not violated.



^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error
  2023-09-27 11:46         ` Markus Armbruster
  2023-09-28  6:49           ` Markus Armbruster
@ 2023-10-07  3:40           ` Zhijian Li (Fujitsu)
  2023-10-16 12:11             ` Jason Gunthorpe
  1 sibling, 1 reply; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-10-07  3:40 UTC (permalink / raw)
  To: Markus Armbruster, linux-rdma; +Cc: qemu-devel, quintela, peterx, leobras

+rdma-core


Is global variable *errno* reliable when the documentation only states
"returns 0 on success, or the value of errno on failure (which indicates the failure reason)."

Someone read it as "assign error code to errno and return it.", I used to think the same way.
but ibv_post_send() doesn't always follow this rule. see ibv_post_send() -> mana_post_send()

Actually, QEMU are using errno after calling libibverbs APIs, so we hope the man page can be
more clear. like posix does:

RETURN VALUE
        Upon successful completion fopen(), fdopen() and freopen() return a FILE pointer.  Otherwise, NULL is returned and errno is set to indicate the error

Thanks
Zhijian


On 27/09/2023 19:46, Markus Armbruster wrote:
> migration/rdma.c uses errno directly or via perror() after the following
> functions:
> 
> * poll()
> 
>    POSIX specifies errno is set on error.  Good.
> 
> * rdma_get_cm_event(), rdma_connect(), rdma_get_cm_event()
> 
>    Manual page promises "if an error occurs, errno will be set".  Good.
> 
> * ibv_open_device()
> 
>    Manual page does not mention errno.  Using it seems ill-advised.
> 
>    qemu_rdma_broken_ipv6_kernel() recovers from EPERM by trying the next
>    device.  Wrong if ibv_open_device() doesn't actually set errno.
> 
>    What is to be done here?
> 
> * ibv_reg_mr()
> 
>    Manual page does not mention errno.  Using it seems ill-advised.
> 
>    qemu_rdma_reg_whole_ram_blocks() and qemu_rdma_register_and_get_keys()
>    recover from errno = ENOTSUP by retrying with modified @access
>    argument.  Wrong if ibv_reg_mr() doesn't actually set errno.
> 
>    What is to be done here?
> 
> * ibv_get_cq_event()
> 
>    Manual page does not mention errno.  Using it seems ill-advised.
> 
>    qemu_rdma_block_for_wrid() calls perror().  Removed in PATCH 48.  Good
>    enough.
> 
> * ibv_post_send()
> 
>    Manual page has the function return "the value of errno on failure".
>    Sounds like it sets errno to the value it returns.  However, the
>    rdma-core repository defines it as
> 
>      static inline int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr,
>                                      struct ibv_send_wr **bad_wr)
>      {
>              return qp->context->ops.post_send(qp, wr, bad_wr);
>      }
> 
>    and at least one of the methods fails without setting errno:
> 
>      static int mana_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
>                                struct ibv_send_wr **bad)
>      {
>              /* This version of driver supports RAW QP only.
>               * Posting WR is done directly in the application.
>               */
>              return EOPNOTSUPP;
>      }
> 
>    qemu_rdma_write_one() calls perror().  PATCH 39 (this one) replaces it
>    by error_setg(), not error_setg_errno().  Seems prudent, but should be
>    called out in the commit message.
> 
> * ibv_advise_mr()
> 
>    Manual page has the function return "the value of errno on failure".
>    Sounds like it sets errno to the value it returns, but my findings for
>    ibv_post_send() make me doubt it.
> 
>    qemu_rdma_advise_prefetch_mr() traces strerror(errno).  Could be
>    misleading.  Drop that part?
> 
> * ibv_dereg_mr()
> 
>    Manual page has the function return "the value of errno on failure".
>    Sounds like it sets errno to the value it returns, but my findings for
>    ibv_post_send() make me doubt it.
> 
>    qemu_rdma_unregister_waiting() calls perror().  Removed in PATCH 51.
>    Good enough.
> 
> * qemu_get_cm_event_timeout()
> 
>    Can fail without setting errno.
> 
>    qemu_rdma_connect() calls perror().  Removed in PATCH 45.  Good
>    enough.
> 
> Thoughts?
> 
> 
> [...]
> 
> [*] https://github.com/linux-rdma/rdma-core.git
>      commit 55fa316b4b18f258d8ac1ceb4aa5a7a35b094dcf
> 

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error
  2023-10-07  3:40           ` Zhijian Li (Fujitsu)
@ 2023-10-16 12:11             ` Jason Gunthorpe
  2023-10-17  5:22               ` Zhijian Li (Fujitsu)
  0 siblings, 1 reply; 174+ messages in thread
From: Jason Gunthorpe @ 2023-10-16 12:11 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu)
  Cc: Markus Armbruster, linux-rdma, qemu-devel, quintela, peterx, leobras

On Sat, Oct 07, 2023 at 03:40:50AM +0000, Zhijian Li (Fujitsu) wrote:
> +rdma-core
> 
> 
> Is global variable *errno* reliable when the documentation only states
> "returns 0 on success, or the value of errno on failure (which indicates the failure reason)."

I think the intention of this language was that an errno constant is
returned, the caller should not assume it is stored in errno.

errno is difficult, many things overwrite it, you can loose its value
during error handling.

Jason


^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error
  2023-10-16 12:11             ` Jason Gunthorpe
@ 2023-10-17  5:22               ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 174+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-10-17  5:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Markus Armbruster, linux-rdma, qemu-devel, quintela, peterx, leobras



On 16/10/2023 20:11, Jason Gunthorpe wrote:
> On Sat, Oct 07, 2023 at 03:40:50AM +0000, Zhijian Li (Fujitsu) wrote:
>> +rdma-core
>>
>>
>> Is global variable *errno* reliable when the documentation only states
>> "returns 0 on success, or the value of errno on failure (which indicates the failure reason)."
> 
> I think the intention of this language was that an errno constant is
> returned, the caller should not assume it is stored in errno.
> 
Understood.
If that's the case, I think some pieces of code were misunderstood in QEMU before.


Thanks
Zhijian

> errno is difficult, many things overwrite it, you can loose its value
> during error handling.
> 
> Jason

^ permalink raw reply	[flat|nested] 174+ messages in thread

* Re: [PATCH 00/52] migration/rdma: Error handling fixes
  2023-10-05  7:14       ` Daniel P. Berrangé
@ 2023-10-31 10:25         ` Juan Quintela
  0 siblings, 0 replies; 174+ messages in thread
From: Juan Quintela @ 2023-10-31 10:25 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Peter Xu, Markus Armbruster, qemu-devel, leobras, Li Zhijian

Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Wed, Oct 04, 2023 at 08:00:34PM +0200, Juan Quintela wrote:
>> Daniel P. Berrangé <berrange@redhat.com> wrote:
>> > On Tue, Sep 19, 2023 at 12:49:46PM -0400, Peter Xu wrote:
>> >> On Mon, Sep 18, 2023 at 04:41:14PM +0200, Markus Armbruster wrote:
>> >> > Oh dear, where to start.  There's so much wrong, and in pretty obvious
>> >> > ways.  This code should never have passed review.  I'm refraining from
>> >> > saying more; see the commit messages instead.
>> >> > 
>> >> > Issues remaining after this series include:
>> >> > 
>> >> > * Terrible error messages
>> >> > 
>> >> > * Some error message cascades remain
>> >> > 
>> >> > * There is no written contract for QEMUFileHooks, and the
>> >> >   responsibility for reporting errors is unclear
>> >> 
>> >> Even being removed.. because no one is really extending that..
>> >> 
>> >> https://lore.kernel.org/all/20230509120700.78359-1-quintela@redhat.com/#t
>> >
>> > One day (in another 5-10 years) I still hope we'll get to
>> > the point where QEMUFile itself is obsolete :-)
>> 
>> If you see the patches on list, I have move rate limit check outside of
>> QEMUFile, so right now it is only a buffer to write in the main
>> migration thread.
>
> Can you point me to that patch(es) as I'm not identifying
> them yet.

Yet another set of counters.

They are on today PULL request.

Later, Juan.



^ permalink raw reply	[flat|nested] 174+ messages in thread

end of thread, other threads:[~2023-10-31 10:26 UTC | newest]

Thread overview: 174+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-18 14:41 [PATCH 00/52] migration/rdma: Error handling fixes Markus Armbruster
2023-09-18 14:41 ` [PATCH 01/52] migration/rdma: Clean up qemu_rdma_poll()'s return type Markus Armbruster
2023-09-18 16:50   ` Fabiano Rosas
2023-09-21  8:52   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 02/52] migration/rdma: Clean up qemu_rdma_data_init()'s " Markus Armbruster
2023-09-18 16:51   ` Fabiano Rosas
2023-09-21  8:52   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 03/52] migration/rdma: Clean up rdma_delete_block()'s " Markus Armbruster
2023-09-18 16:53   ` Fabiano Rosas
2023-09-21  8:53   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 04/52] migration/rdma: Drop fragile wr_id formatting Markus Armbruster
2023-09-18 17:01   ` Fabiano Rosas
2023-09-21  8:54   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 05/52] migration/rdma: Consistently use uint64_t for work request IDs Markus Armbruster
2023-09-18 17:03   ` Fabiano Rosas
2023-09-19  5:39   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 06/52] migration/rdma: Clean up two more harmless signed vs. unsigned issues Markus Armbruster
2023-09-18 17:10   ` Fabiano Rosas
2023-09-20 13:11     ` Markus Armbruster
2023-09-18 14:41 ` [PATCH 07/52] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage Markus Armbruster
2023-09-18 17:11   ` Fabiano Rosas
2023-09-21  9:00   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 08/52] migration/rdma: Fix qemu_rdma_accept() to return failure on errors Markus Armbruster
2023-09-18 17:15   ` Fabiano Rosas
2023-09-21  9:07   ` Zhijian Li (Fujitsu)
2023-09-28 10:53     ` Markus Armbruster
2023-09-18 14:41 ` [PATCH 09/52] migration/rdma: Put @errp parameter last Markus Armbruster
2023-09-18 17:17   ` Fabiano Rosas
2023-09-21  9:08   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 10/52] migration/rdma: Eliminate error_propagate() Markus Armbruster
2023-09-18 17:20   ` Fabiano Rosas
2023-09-21  9:31   ` Zhijian Li (Fujitsu)
2023-09-27 16:20   ` Eric Blake
2023-09-27 19:02     ` Markus Armbruster
2023-09-18 14:41 ` [PATCH 11/52] migration/rdma: Drop rdma_add_block() error handling Markus Armbruster
2023-09-18 17:32   ` Fabiano Rosas
2023-09-21  9:39   ` Zhijian Li (Fujitsu)
2023-09-21 11:15     ` Markus Armbruster
2023-09-22  7:49       ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 12/52] migration/rdma: Drop qemu_rdma_search_ram_block() " Markus Armbruster
2023-09-18 17:35   ` Fabiano Rosas
2023-09-20 13:11     ` Markus Armbruster
2023-09-22  7:50   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 13/52] migration/rdma: Make qemu_rdma_buffer_mergable() return bool Markus Armbruster
2023-09-18 17:36   ` Fabiano Rosas
2023-09-22  7:51   ` Zhijian Li (Fujitsu)
2023-09-27 16:26   ` Eric Blake
2023-09-27 19:04     ` Markus Armbruster
2023-09-18 14:41 ` [PATCH 14/52] migration/rdma: Use bool for two RDMAContext flags Markus Armbruster
2023-09-18 17:37   ` Fabiano Rosas
2023-09-22  7:54   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 15/52] migration/rdma: Ditch useless numeric error codes in error messages Markus Armbruster
2023-09-18 18:47   ` Fabiano Rosas
2023-09-22  8:44   ` Zhijian Li (Fujitsu)
2023-09-22  9:43     ` Markus Armbruster
2023-09-18 14:41 ` [PATCH 16/52] migration/rdma: Fix io_writev(), io_readv() methods to obey contract Markus Armbruster
2023-09-18 18:57   ` Fabiano Rosas
2023-09-22  8:59   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 17/52] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE() Markus Armbruster
2023-09-18 18:57   ` Fabiano Rosas
2023-09-22  9:01   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 18/52] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error Markus Armbruster
2023-09-18 19:00   ` Fabiano Rosas
2023-09-22  9:10   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 19/52] migration/rdma: Fix qemu_get_cm_event_timeout() to always " Markus Armbruster
2023-09-19 16:02   ` Peter Xu
2023-09-22  9:12   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 20/52] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port Markus Armbruster
2023-09-19 16:02   ` Peter Xu
2023-09-20 13:13     ` Markus Armbruster
2023-09-18 14:41 ` [PATCH 21/52] migration/rdma: Fix QEMUFileHooks method return values Markus Armbruster
2023-09-25  4:08   ` Zhijian Li (Fujitsu)
2023-09-25  6:36     ` Markus Armbruster
2023-09-25  7:03       ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 22/52] migration/rdma: Fix rdma_getaddrinfo() error checking Markus Armbruster
2023-09-25  5:21   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 23/52] migration/rdma: Clean up qemu_rdma_wait_comp_channel()'s error value Markus Armbruster
2023-09-25  5:43   ` Zhijian Li (Fujitsu)
2023-09-25  6:55     ` Markus Armbruster
2023-09-18 14:41 ` [PATCH 24/52] migration/rdma: Return -1 instead of negative errno code Markus Armbruster
2023-09-26 10:15   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 25/52] migration/rdma: Dumb down remaining int error values to -1 Markus Armbruster
2023-09-26 10:16   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 26/52] migration/rdma: Replace int error_state by bool errored Markus Armbruster
2023-09-25  6:17   ` Zhijian Li (Fujitsu)
2023-09-25  7:09     ` Markus Armbruster
2023-09-26 10:18       ` Zhijian Li (Fujitsu)
2023-09-27 17:38   ` Eric Blake
2023-09-27 19:04     ` Markus Armbruster
2023-09-18 14:41 ` [PATCH 27/52] migration/rdma: Drop superfluous assignments to @ret Markus Armbruster
2023-09-25  6:20   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 28/52] migration/rdma: Check negative error values the same way everywhere Markus Armbruster
2023-09-25  6:26   ` Zhijian Li (Fujitsu)
2023-09-25  7:29     ` Markus Armbruster
2023-10-04 16:32       ` Juan Quintela
2023-10-04 16:57         ` Peter Xu
2023-10-05  5:45           ` Markus Armbruster
2023-10-05 14:52             ` Peter Xu
2023-10-05 16:06               ` Markus Armbruster
2023-09-18 14:41 ` [PATCH 29/52] migration/rdma: Plug a memory leak and improve a message Markus Armbruster
2023-09-25  6:31   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 30/52] migration/rdma: Delete inappropriate error_report() in macro ERROR() Markus Armbruster
2023-09-25  6:35   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 31/52] migration/rdma: Retire " Markus Armbruster
2023-09-25  7:31   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 32/52] migration/rdma: Fix error handling around rdma_getaddrinfo() Markus Armbruster
2023-09-25  7:32   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 33/52] migration/rdma: Drop "@errp is clear" guards around error_setg() Markus Armbruster
2023-09-25  7:32   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 34/52] migration/rdma: Convert qemu_rdma_exchange_recv() to Error Markus Armbruster
2023-09-26  1:37   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 35/52] migration/rdma: Convert qemu_rdma_exchange_send() " Markus Armbruster
2023-09-26  1:42   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 36/52] migration/rdma: Convert qemu_rdma_exchange_get_response() " Markus Armbruster
2023-09-26  1:45   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 37/52] migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() " Markus Armbruster
2023-09-26  1:51   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 38/52] migration/rdma: Convert qemu_rdma_write_flush() " Markus Armbruster
2023-09-26  1:56   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() " Markus Armbruster
2023-09-26  5:24   ` Zhijian Li (Fujitsu)
2023-09-26  5:50   ` Zhijian Li (Fujitsu)
2023-09-26  5:55     ` Zhijian Li (Fujitsu)
2023-09-26  9:26       ` Markus Armbruster
2023-09-27 11:46         ` Markus Armbruster
2023-09-28  6:49           ` Markus Armbruster
2023-10-07  3:40           ` Zhijian Li (Fujitsu)
2023-10-16 12:11             ` Jason Gunthorpe
2023-10-17  5:22               ` Zhijian Li (Fujitsu)
2023-09-26  6:40     ` Markus Armbruster
2023-09-18 14:41 ` [PATCH 40/52] migration/rdma: Convert qemu_rdma_write() " Markus Armbruster
2023-09-26  5:25   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 41/52] migration/rdma: Convert qemu_rdma_post_send_control() " Markus Armbruster
2023-09-26  5:29   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 42/52] migration/rdma: Convert qemu_rdma_post_recv_control() " Markus Armbruster
2023-09-26  5:31   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 43/52] migration/rdma: Convert qemu_rdma_alloc_pd_cq() " Markus Armbruster
2023-09-26  5:43   ` Zhijian Li (Fujitsu)
2023-09-26  6:41     ` Markus Armbruster
2023-09-26  6:55       ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 44/52] migration/rdma: Silence qemu_rdma_resolve_host() Markus Armbruster
2023-09-26  5:44   ` Zhijian Li (Fujitsu)
2023-09-18 14:41 ` [PATCH 45/52] migration/rdma: Silence qemu_rdma_connect() Markus Armbruster
2023-09-26  6:00   ` Zhijian Li (Fujitsu)
2023-09-18 14:42 ` [PATCH 46/52] migration/rdma: Silence qemu_rdma_reg_control() Markus Armbruster
2023-09-26  6:00   ` Zhijian Li (Fujitsu)
2023-09-18 14:42 ` [PATCH 47/52] migration/rdma: Don't report received completion events as error Markus Armbruster
2023-09-26  6:06   ` Zhijian Li (Fujitsu)
2023-09-18 14:42 ` [PATCH 48/52] migration/rdma: Silence qemu_rdma_block_for_wrid() Markus Armbruster
2023-09-26  6:17   ` Zhijian Li (Fujitsu)
2023-09-18 14:42 ` [PATCH 49/52] migration/rdma: Silence qemu_rdma_register_and_get_keys() Markus Armbruster
2023-09-26  6:21   ` Zhijian Li (Fujitsu)
2023-09-18 14:42 ` [PATCH 50/52] migration/rdma: Silence qemu_rdma_cleanup() Markus Armbruster
2023-09-26 10:12   ` Zhijian Li (Fujitsu)
2023-09-26 11:52     ` Markus Armbruster
2023-09-27  1:41       ` Zhijian Li (Fujitsu)
2023-09-27  5:30         ` Markus Armbruster
2023-09-18 14:42 ` [PATCH 51/52] migration/rdma: Use error_report() & friends instead of stderr Markus Armbruster
2023-09-26  6:35   ` Zhijian Li (Fujitsu)
2023-09-27 12:16     ` Markus Armbruster
2023-09-18 14:42 ` [PATCH 52/52] migration/rdma: Fix how we show device details on open Markus Armbruster
2023-09-26  6:49   ` Zhijian Li (Fujitsu)
2023-09-26  9:19     ` Markus Armbruster
2023-09-19 16:49 ` [PATCH 00/52] migration/rdma: Error handling fixes Peter Xu
2023-09-19 18:29   ` Daniel P. Berrangé
2023-09-19 18:57     ` Fabiano Rosas
2023-09-20 13:22     ` Markus Armbruster
2023-10-04 18:00     ` Juan Quintela
2023-10-05  7:14       ` Daniel P. Berrangé
2023-10-31 10:25         ` Juan Quintela
2023-09-21  8:27   ` Zhijian Li (Fujitsu)
2023-09-22 15:21     ` Peter Xu
2023-09-25  8:06       ` Zhijian Li (Fujitsu)
2023-09-28 11:08 ` Markus Armbruster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).