All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/53] migration/rdma: Error handling fixes
@ 2023-09-28 13:19 Markus Armbruster
  2023-09-28 13:19 ` [PATCH v2 01/53] migration/rdma: Clean up qemu_rdma_poll()'s return type Markus Armbruster
                   ` (53 more replies)
  0 siblings, 54 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Oh dear, where to start.  There's so much wrong, and in pretty obvious
ways.  This code should never have passed review.  I'm refraining from
saying more; see the commit messages instead.

Issues remaining after this series include:

* Terrible error messages

* Some error message cascades remain

* There is no written contract for QEMUFileHooks, and the
  responsibility for reporting errors is unclear

* There seem to be no tests whatsoever

PATCH 29 is arguably a matter of taste.  I made my case for it during
review of v1.  If maintainers don't want it, I'll drop it.

Related: [PATCH 1/7] migration/rdma: Fix save_page method to fail on
polling error

v2:
* PATCH 06: New
* PATCH 07: Tweak local variables in _readv() and _writev() for
  consistency [Fabiano]
* PATCH 13: Drop a comment that has become useless [Fabiano]
* PATCH 14: Fix spelling of qemu_rdma_buffer_mergeable() [Eric]
* PATCH 16: New; adding temporary FIXMEs
* PATCH 17: Fix typo in commit message [Li Zhijian]
* PATCH 26: Squash in old PATCH 23 [Li Zhijian]
* PATCH 27: Fix type of RDMAContext member @errored [Eric]
* PATCH 40: Resolve a temporary FIXME
* PATCH 44: Drop advice on qemu_rdma_alloc_pd_cq() failure in
  qemu_rdma_source_init() [Li Zhijian]
* PATCH 46+49: Resolve a temporary FIXME
* PATCH 51: Downgrade to warning instead of silence [Li Zhijian]
* PATCH 52: Resolve the last temporary FIXME
* PATCH 53: Convert to tracing instead [Li Zhijian]

Markus Armbruster (53):
  migration/rdma: Clean up qemu_rdma_poll()'s return type
  migration/rdma: Clean up qemu_rdma_data_init()'s return type
  migration/rdma: Clean up rdma_delete_block()'s return type
  migration/rdma: Drop fragile wr_id formatting
  migration/rdma: Consistently use uint64_t for work request IDs
  migration/rdma: Fix unwanted integer truncation
  migration/rdma: Clean up two more harmless signed vs. unsigned issues
  migration/rdma: Give qio_channel_rdma_source_funcs internal linkage
  migration/rdma: Fix qemu_rdma_accept() to return failure on errors
  migration/rdma: Put @errp parameter last
  migration/rdma: Eliminate error_propagate()
  migration/rdma: Drop rdma_add_block() error handling
  migration/rdma: Drop qemu_rdma_search_ram_block() error handling
  migration/rdma: Make qemu_rdma_buffer_mergeable() return bool
  migration/rdma: Use bool for two RDMAContext flags
  migration/rdma: Fix or document problematic uses of errno
  migration/rdma: Ditch useless numeric error codes in error messages
  migration/rdma: Fix io_writev(), io_readv() methods to obey contract
  migration/rdma: Replace dangerous macro CHECK_ERROR_STATE()
  migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error
  migration/rdma: Fix qemu_get_cm_event_timeout() to always set error
  migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port
  migration/rdma: Fix QEMUFileHooks method return values
  migration/rdma: Fix rdma_getaddrinfo() error checking
  migration/rdma: Return -1 instead of negative errno code
  migration/rdma: Dumb down remaining int error values to -1
  migration/rdma: Replace int error_state by bool errored
  migration/rdma: Drop superfluous assignments to @ret
  migration/rdma: Check negative error values the same way everywhere
  migration/rdma: Plug a memory leak and improve a message
  migration/rdma: Delete inappropriate error_report() in macro ERROR()
  migration/rdma: Retire macro ERROR()
  migration/rdma: Fix error handling around rdma_getaddrinfo()
  migration/rdma: Drop "@errp is clear" guards around error_setg()
  migration/rdma: Convert qemu_rdma_exchange_recv() to Error
  migration/rdma: Convert qemu_rdma_exchange_send() to Error
  migration/rdma: Convert qemu_rdma_exchange_get_response() to Error
  migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() to Error
  migration/rdma: Convert qemu_rdma_write_flush() to Error
  migration/rdma: Convert qemu_rdma_write_one() to Error
  migration/rdma: Convert qemu_rdma_write() to Error
  migration/rdma: Convert qemu_rdma_post_send_control() to Error
  migration/rdma: Convert qemu_rdma_post_recv_control() to Error
  migration/rdma: Convert qemu_rdma_alloc_pd_cq() to Error
  migration/rdma: Silence qemu_rdma_resolve_host()
  migration/rdma: Silence qemu_rdma_connect()
  migration/rdma: Silence qemu_rdma_reg_control()
  migration/rdma: Don't report received completion events as error
  migration/rdma: Silence qemu_rdma_block_for_wrid()
  migration/rdma: Silence qemu_rdma_register_and_get_keys()
  migration/rdma: Downgrade qemu_rdma_cleanup() errors to warnings
  migration/rdma: Use error_report() & friends instead of stderr
  migration/rdma: Replace flawed device detail dump by tracing

 migration/rdma.c       | 1026 ++++++++++++++++++++--------------------
 migration/trace-events |   10 +-
 2 files changed, 514 insertions(+), 522 deletions(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 121+ messages in thread

* [PATCH v2 01/53] migration/rdma: Clean up qemu_rdma_poll()'s return type
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 14:26   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 02/53] migration/rdma: Clean up qemu_rdma_data_init()'s " Markus Armbruster
                   ` (52 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

qemu_rdma_poll()'s return type is uint64_t, even though it returns 0,
-1, or @ret, which is int.  Its callers assign the return value to int
variables, then check whether it's negative.  Unclean.

Return int instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index a2a3db35b1..6ceddd044c 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1469,8 +1469,8 @@ static uint64_t qemu_rdma_make_wrid(uint64_t wr_id, uint64_t index,
  * (of any kind) has completed.
  * Return the work request ID that completed.
  */
-static uint64_t qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
-                               uint64_t *wr_id_out, uint32_t *byte_len)
+static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
+                          uint64_t *wr_id_out, uint32_t *byte_len)
 {
     int ret;
     struct ibv_wc wc;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 02/53] migration/rdma: Clean up qemu_rdma_data_init()'s return type
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
  2023-09-28 13:19 ` [PATCH v2 01/53] migration/rdma: Clean up qemu_rdma_poll()'s return type Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 14:35   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 03/53] migration/rdma: Clean up rdma_delete_block()'s " Markus Armbruster
                   ` (51 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

qemu_rdma_data_init() return type is void *.  It actually returns
RDMAContext *, and all its callers assign the value to an
RDMAContext *.  Unclean.

Return RDMAContext * instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 6ceddd044c..934771496c 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2739,7 +2739,7 @@ static void qemu_rdma_return_path_dest_init(RDMAContext *rdma_return_path,
     rdma_return_path->is_return_path = true;
 }
 
-static void *qemu_rdma_data_init(const char *host_port, Error **errp)
+static RDMAContext *qemu_rdma_data_init(const char *host_port, Error **errp)
 {
     RDMAContext *rdma = NULL;
     InetSocketAddress *addr;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 03/53] migration/rdma: Clean up rdma_delete_block()'s return type
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
  2023-09-28 13:19 ` [PATCH v2 01/53] migration/rdma: Clean up qemu_rdma_poll()'s return type Markus Armbruster
  2023-09-28 13:19 ` [PATCH v2 02/53] migration/rdma: Clean up qemu_rdma_data_init()'s " Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 14:36   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 04/53] migration/rdma: Drop fragile wr_id formatting Markus Armbruster
                   ` (50 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

rdma_delete_block() always returns 0, which its only caller ignores.
Return void instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 934771496c..d448c3e538 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -668,7 +668,7 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
  * Note: If used outside of cleanup, the caller must ensure that the destination
  * block structures are also updated
  */
-static int rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
+static void rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
 {
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
     RDMALocalBlock *old = local->block;
@@ -754,8 +754,6 @@ static int rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
                                 &local->block[x]);
         }
     }
-
-    return 0;
 }
 
 /*
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 04/53] migration/rdma: Drop fragile wr_id formatting
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (2 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 03/53] migration/rdma: Clean up rdma_delete_block()'s " Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 14:38   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 05/53] migration/rdma: Consistently use uint64_t for work request IDs Markus Armbruster
                   ` (49 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

wrid_desc[] uses 4001 pointers to map four integer values to strings.

print_wrid() accesses wrid_desc[] out of bounds when passed a negative
argument.  It returns null for values 2..1999 and 2001..3999.

qemu_rdma_poll() and qemu_rdma_block_for_wrid() print wrid_desc[wr_id]
and passes print_wrid(wr_id) to tracepoints.  Could conceivably crash
trying to format a null string.  I believe access out of bounds is not
possible.

Not worth cleaning up.  Dumb down to show just numeric wr_id.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c       | 32 +++++++-------------------------
 migration/trace-events |  8 ++++----
 2 files changed, 11 insertions(+), 29 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index d448c3e538..fc9eab0ff7 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -133,13 +133,6 @@ enum {
     RDMA_WRID_RECV_CONTROL = 4000,
 };
 
-static const char *wrid_desc[] = {
-    [RDMA_WRID_NONE] = "NONE",
-    [RDMA_WRID_RDMA_WRITE] = "WRITE RDMA",
-    [RDMA_WRID_SEND_CONTROL] = "CONTROL SEND",
-    [RDMA_WRID_RECV_CONTROL] = "CONTROL RECV",
-};
-
 /*
  * Work request IDs for IB SEND messages only (not RDMA writes).
  * This is used by the migration protocol to transmit
@@ -535,7 +528,6 @@ static void network_to_result(RDMARegisterResult *result)
     result->host_addr = ntohll(result->host_addr);
 };
 
-const char *print_wrid(int wrid);
 static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    uint8_t *data, RDMAControlHeader *resp,
                                    int *resp_idx,
@@ -1362,14 +1354,6 @@ static int qemu_rdma_reg_control(RDMAContext *rdma, int idx)
     return -1;
 }
 
-const char *print_wrid(int wrid)
-{
-    if (wrid >= RDMA_WRID_RECV_CONTROL) {
-        return wrid_desc[RDMA_WRID_RECV_CONTROL];
-    }
-    return wrid_desc[wrid];
-}
-
 /*
  * Perform a non-optimized memory unregistration after every transfer
  * for demonstration purposes, only if pin-all is not requested.
@@ -1491,15 +1475,15 @@ static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
     if (wc.status != IBV_WC_SUCCESS) {
         fprintf(stderr, "ibv_poll_cq wc.status=%d %s!\n",
                         wc.status, ibv_wc_status_str(wc.status));
-        fprintf(stderr, "ibv_poll_cq wrid=%s!\n", wrid_desc[wr_id]);
+        fprintf(stderr, "ibv_poll_cq wrid=%" PRIu64 "!\n", wr_id);
 
         return -1;
     }
 
     if (rdma->control_ready_expected &&
         (wr_id >= RDMA_WRID_RECV_CONTROL)) {
-        trace_qemu_rdma_poll_recv(wrid_desc[RDMA_WRID_RECV_CONTROL],
-                  wr_id - RDMA_WRID_RECV_CONTROL, wr_id, rdma->nb_sent);
+        trace_qemu_rdma_poll_recv(wr_id - RDMA_WRID_RECV_CONTROL, wr_id,
+                                  rdma->nb_sent);
         rdma->control_ready_expected = 0;
     }
 
@@ -1510,7 +1494,7 @@ static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
             (wc.wr_id & RDMA_WRID_BLOCK_MASK) >> RDMA_WRID_BLOCK_SHIFT;
         RDMALocalBlock *block = &(rdma->local_ram_blocks.block[index]);
 
-        trace_qemu_rdma_poll_write(print_wrid(wr_id), wr_id, rdma->nb_sent,
+        trace_qemu_rdma_poll_write(wr_id, rdma->nb_sent,
                                    index, chunk, block->local_host_addr,
                                    (void *)(uintptr_t)block->remote_host_addr);
 
@@ -1520,7 +1504,7 @@ static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
             rdma->nb_sent--;
         }
     } else {
-        trace_qemu_rdma_poll_other(print_wrid(wr_id), wr_id, rdma->nb_sent);
+        trace_qemu_rdma_poll_other(wr_id, rdma->nb_sent);
     }
 
     *wr_id_out = wc.wr_id;
@@ -1665,8 +1649,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested,
             break;
         }
         if (wr_id != wrid_requested) {
-            trace_qemu_rdma_block_for_wrid_miss(print_wrid(wrid_requested),
-                       wrid_requested, print_wrid(wr_id), wr_id);
+            trace_qemu_rdma_block_for_wrid_miss(wrid_requested, wr_id);
         }
     }
 
@@ -1705,8 +1688,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested,
                 break;
             }
             if (wr_id != wrid_requested) {
-                trace_qemu_rdma_block_for_wrid_miss(print_wrid(wrid_requested),
-                                   wrid_requested, print_wrid(wr_id), wr_id);
+                trace_qemu_rdma_block_for_wrid_miss(wrid_requested, wr_id);
             }
         }
 
diff --git a/migration/trace-events b/migration/trace-events
index 4666f19325..b78808f28b 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -207,7 +207,7 @@ qemu_rdma_accept_incoming_migration(void) ""
 qemu_rdma_accept_incoming_migration_accepted(void) ""
 qemu_rdma_accept_pin_state(bool pin) "%d"
 qemu_rdma_accept_pin_verbsc(void *verbs) "Verbs context after listen: %p"
-qemu_rdma_block_for_wrid_miss(const char *wcompstr, int wcomp, const char *gcompstr, uint64_t req) "A Wanted wrid %s (%d) but got %s (%" PRIu64 ")"
+qemu_rdma_block_for_wrid_miss(int wcomp, uint64_t req) "A Wanted wrid %d but got %" PRIu64
 qemu_rdma_cleanup_disconnect(void) ""
 qemu_rdma_close(void) ""
 qemu_rdma_connect_pin_all_requested(void) ""
@@ -221,9 +221,9 @@ qemu_rdma_exchange_send_waiting(const char *desc) "Waiting for response %s"
 qemu_rdma_exchange_send_received(const char *desc) "Response %s received."
 qemu_rdma_fill(size_t control_len, size_t size) "RDMA %zd of %zd bytes already in buffer"
 qemu_rdma_init_ram_blocks(int blocks) "Allocated %d local ram block structures"
-qemu_rdma_poll_recv(const char *compstr, int64_t comp, int64_t id, int sent) "completion %s #%" PRId64 " received (%" PRId64 ") left %d"
-qemu_rdma_poll_write(const char *compstr, int64_t comp, int left, uint64_t block, uint64_t chunk, void *local, void *remote) "completions %s (%" PRId64 ") left %d, block %" PRIu64 ", chunk: %" PRIu64 " %p %p"
-qemu_rdma_poll_other(const char *compstr, int64_t comp, int left) "other completion %s (%" PRId64 ") received left %d"
+qemu_rdma_poll_recv(int64_t comp, int64_t id, int sent) "completion %" PRId64 " received (%" PRId64 ") left %d"
+qemu_rdma_poll_write(int64_t comp, int left, uint64_t block, uint64_t chunk, void *local, void *remote) "completions %" PRId64 " left %d, block %" PRIu64 ", chunk: %" PRIu64 " %p %p"
+qemu_rdma_poll_other(int64_t comp, int left) "other completion %" PRId64 " received left %d"
 qemu_rdma_post_send_control(const char *desc) "CONTROL: sending %s.."
 qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %" PRIu64 " bytes @ %p"
 qemu_rdma_register_odp_mr(const char *name) "Try to register On-Demand Paging memory region: %s"
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 05/53] migration/rdma: Consistently use uint64_t for work request IDs
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (3 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 04/53] migration/rdma: Drop fragile wr_id formatting Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 14:39   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 06/53] migration/rdma: Fix unwanted integer truncation Markus Armbruster
                   ` (48 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

We use int instead of uint64_t in a few places.  Change them to
uint64_t.

This cleans up a comparison of signed qemu_rdma_block_for_wrid()
parameter @wrid_requested with unsigned @wr_id.  Harmless, because the
actual arguments are non-negative enumeration constants.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c       | 7 ++++---
 migration/trace-events | 8 ++++----
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index fc9eab0ff7..4289346617 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1599,13 +1599,13 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
     return rdma->error_state;
 }
 
-static struct ibv_comp_channel *to_channel(RDMAContext *rdma, int wrid)
+static struct ibv_comp_channel *to_channel(RDMAContext *rdma, uint64_t wrid)
 {
     return wrid < RDMA_WRID_RECV_CONTROL ? rdma->send_comp_channel :
            rdma->recv_comp_channel;
 }
 
-static struct ibv_cq *to_cq(RDMAContext *rdma, int wrid)
+static struct ibv_cq *to_cq(RDMAContext *rdma, uint64_t wrid)
 {
     return wrid < RDMA_WRID_RECV_CONTROL ? rdma->send_cq : rdma->recv_cq;
 }
@@ -1623,7 +1623,8 @@ static struct ibv_cq *to_cq(RDMAContext *rdma, int wrid)
  * completions only need to be recorded, but do not actually
  * need further processing.
  */
-static int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested,
+static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
+                                    uint64_t wrid_requested,
                                     uint32_t *byte_len)
 {
     int num_cq_events = 0, ret = 0;
diff --git a/migration/trace-events b/migration/trace-events
index b78808f28b..d733107ec6 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -207,7 +207,7 @@ qemu_rdma_accept_incoming_migration(void) ""
 qemu_rdma_accept_incoming_migration_accepted(void) ""
 qemu_rdma_accept_pin_state(bool pin) "%d"
 qemu_rdma_accept_pin_verbsc(void *verbs) "Verbs context after listen: %p"
-qemu_rdma_block_for_wrid_miss(int wcomp, uint64_t req) "A Wanted wrid %d but got %" PRIu64
+qemu_rdma_block_for_wrid_miss(uint64_t wcomp, uint64_t req) "A Wanted wrid %" PRIu64 " but got %" PRIu64
 qemu_rdma_cleanup_disconnect(void) ""
 qemu_rdma_close(void) ""
 qemu_rdma_connect_pin_all_requested(void) ""
@@ -221,9 +221,9 @@ qemu_rdma_exchange_send_waiting(const char *desc) "Waiting for response %s"
 qemu_rdma_exchange_send_received(const char *desc) "Response %s received."
 qemu_rdma_fill(size_t control_len, size_t size) "RDMA %zd of %zd bytes already in buffer"
 qemu_rdma_init_ram_blocks(int blocks) "Allocated %d local ram block structures"
-qemu_rdma_poll_recv(int64_t comp, int64_t id, int sent) "completion %" PRId64 " received (%" PRId64 ") left %d"
-qemu_rdma_poll_write(int64_t comp, int left, uint64_t block, uint64_t chunk, void *local, void *remote) "completions %" PRId64 " left %d, block %" PRIu64 ", chunk: %" PRIu64 " %p %p"
-qemu_rdma_poll_other(int64_t comp, int left) "other completion %" PRId64 " received left %d"
+qemu_rdma_poll_recv(uint64_t comp, int64_t id, int sent) "completion %" PRIu64 " received (%" PRId64 ") left %d"
+qemu_rdma_poll_write(uint64_t comp, int left, uint64_t block, uint64_t chunk, void *local, void *remote) "completions %" PRIu64 " left %d, block %" PRIu64 ", chunk: %" PRIu64 " %p %p"
+qemu_rdma_poll_other(uint64_t comp, int left) "other completion %" PRIu64 " received left %d"
 qemu_rdma_post_send_control(const char *desc) "CONTROL: sending %s.."
 qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %" PRIu64 " bytes @ %p"
 qemu_rdma_register_odp_mr(const char *name) "Try to register On-Demand Paging memory region: %s"
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 06/53] migration/rdma: Fix unwanted integer truncation
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (4 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 05/53] migration/rdma: Consistently use uint64_t for work request IDs Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-09-28 14:20   ` Fabiano Rosas
                     ` (2 more replies)
  2023-09-28 13:19 ` [PATCH v2 07/53] migration/rdma: Clean up two more harmless signed vs. unsigned issues Markus Armbruster
                   ` (47 subsequent siblings)
  53 siblings, 3 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

qio_channel_rdma_readv() assigns the size_t value of qemu_rdma_fill()
to an int variable before it adds it to @done / subtracts it from
@want, both size_t.  Truncation when qemu_rdma_fill() copies more than
INT_MAX bytes.  Seems vanishingly unlikely, but needs fixing all the
same.

Fixes: 6ddd2d76ca6f (migration: convert RDMA to use QIOChannel interface)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 4289346617..5f423f66f0 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2852,7 +2852,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
     RDMAControlHeader head;
     int ret = 0;
     ssize_t i;
-    size_t done = 0;
+    size_t done = 0, len;
 
     RCU_READ_LOCK_GUARD();
     rdma = qatomic_rcu_read(&rioc->rdmain);
@@ -2873,9 +2873,9 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
          * were given and dish out the bytes until we run
          * out of bytes.
          */
-        ret = qemu_rdma_fill(rdma, data, want, 0);
-        done += ret;
-        want -= ret;
+        len = qemu_rdma_fill(rdma, data, want, 0);
+        done += len;
+        want -= len;
         /* Got what we needed, so go to next iovec */
         if (want == 0) {
             continue;
@@ -2902,9 +2902,9 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
         /*
          * SEND was received with new bytes, now try again.
          */
-        ret = qemu_rdma_fill(rdma, data, want, 0);
-        done += ret;
-        want -= ret;
+        len = qemu_rdma_fill(rdma, data, want, 0);
+        done += len;
+        want -= len;
 
         /* Still didn't get enough, so lets just return */
         if (want) {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 07/53] migration/rdma: Clean up two more harmless signed vs. unsigned issues
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (5 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 06/53] migration/rdma: Fix unwanted integer truncation Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 14:44   ` Juan Quintela
  2023-10-07  2:38   ` Zhijian Li (Fujitsu)
  2023-09-28 13:19 ` [PATCH v2 08/53] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage Markus Armbruster
                   ` (46 subsequent siblings)
  53 siblings, 2 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

qemu_rdma_exchange_get_response() compares int parameter @expecting
with uint32_t head->type.  Actual arguments are non-negative
enumeration constants, RDMAControlHeader uint32_t member type, or
qemu_rdma_exchange_recv() int parameter expecting.  Actual arguments
for the latter are non-negative enumeration constants.  Change both
parameters to uint32_t.

In qio_channel_rdma_readv(), loop control variable @i is ssize_t, and
counts from 0 up to @niov, which is size_t.  Change @i to size_t.

While there, make qio_channel_rdma_readv() and
qio_channel_rdma_writev() more consistent: change the former's @done
to ssize_t, and delete the latter's useless initialization of @len.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
---
 migration/rdma.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 5f423f66f0..faca0ed998 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1801,7 +1801,7 @@ static int qemu_rdma_post_recv_control(RDMAContext *rdma, int idx)
  * Block and wait for a RECV control channel message to arrive.
  */
 static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
-                RDMAControlHeader *head, int expecting, int idx)
+                RDMAControlHeader *head, uint32_t expecting, int idx)
 {
     uint32_t byte_len;
     int ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RECV_CONTROL + idx,
@@ -1959,7 +1959,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
  * control-channel message.
  */
 static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
-                                int expecting)
+                                   uint32_t expecting)
 {
     RDMAControlHeader ready = {
                                 .len = 0,
@@ -2765,8 +2765,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
     RDMAContext *rdma;
     int ret;
     ssize_t done = 0;
-    size_t i;
-    size_t len = 0;
+    size_t i, len;
 
     RCU_READ_LOCK_GUARD();
     rdma = qatomic_rcu_read(&rioc->rdmaout);
@@ -2851,8 +2850,8 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
     RDMAContext *rdma;
     RDMAControlHeader head;
     int ret = 0;
-    ssize_t i;
-    size_t done = 0, len;
+    ssize_t done = 0;
+    size_t i, len;
 
     RCU_READ_LOCK_GUARD();
     rdma = qatomic_rcu_read(&rioc->rdmain);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 08/53] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (6 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 07/53] migration/rdma: Clean up two more harmless signed vs. unsigned issues Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 14:50   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 09/53] migration/rdma: Fix qemu_rdma_accept() to return failure on errors Markus Armbruster
                   ` (45 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index faca0ed998..0e991175f9 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3056,7 +3056,7 @@ qio_channel_rdma_source_finalize(GSource *source)
     object_unref(OBJECT(ssource->rioc));
 }
 
-GSourceFuncs qio_channel_rdma_source_funcs = {
+static GSourceFuncs qio_channel_rdma_source_funcs = {
     qio_channel_rdma_source_prepare,
     qio_channel_rdma_source_check,
     qio_channel_rdma_source_dispatch,
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 09/53] migration/rdma: Fix qemu_rdma_accept() to return failure on errors
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (7 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 08/53] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 14:51   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 10/53] migration/rdma: Put @errp parameter last Markus Armbruster
                   ` (44 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

qemu_rdma_accept() returns 0 in some cases even when it didn't
complete its job due to errors.  Impact is not obvious.  I figure the
caller will soon fail again with a misleading error message.

Fix it to return -1 on any failure.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 0e991175f9..94b828b45d 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3352,6 +3352,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 
     if (cm_event->event != RDMA_CM_EVENT_CONNECT_REQUEST) {
         rdma_ack_cm_event(cm_event);
+        ret = -1;
         goto err_rdma_dest_wait;
     }
 
@@ -3364,6 +3365,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
         rdma_return_path = qemu_rdma_data_init(rdma->host_port, NULL);
         if (rdma_return_path == NULL) {
             rdma_ack_cm_event(cm_event);
+            ret = -1;
             goto err_rdma_dest_wait;
         }
 
@@ -3375,10 +3377,11 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     network_to_caps(&cap);
 
     if (cap.version < 1 || cap.version > RDMA_CONTROL_VERSION_CURRENT) {
-            error_report("Unknown source RDMA version: %d, bailing...",
-                            cap.version);
-            rdma_ack_cm_event(cm_event);
-            goto err_rdma_dest_wait;
+        error_report("Unknown source RDMA version: %d, bailing...",
+                     cap.version);
+        rdma_ack_cm_event(cm_event);
+        ret = -1;
+        goto err_rdma_dest_wait;
     }
 
     /*
@@ -3408,9 +3411,10 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     if (!rdma->verbs) {
         rdma->verbs = verbs;
     } else if (rdma->verbs != verbs) {
-            error_report("ibv context not matching %p, %p!", rdma->verbs,
-                         verbs);
-            goto err_rdma_dest_wait;
+        error_report("ibv context not matching %p, %p!", rdma->verbs,
+                     verbs);
+        ret = -1;
+        goto err_rdma_dest_wait;
     }
 
     qemu_rdma_dump_id("dest_init", verbs);
@@ -3467,6 +3471,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     if (cm_event->event != RDMA_CM_EVENT_ESTABLISHED) {
         error_report("rdma_accept not event established");
         rdma_ack_cm_event(cm_event);
+        ret = -1;
         goto err_rdma_dest_wait;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 10/53] migration/rdma: Put @errp parameter last
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (8 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 09/53] migration/rdma: Fix qemu_rdma_accept() to return failure on errors Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 14:54   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 11/53] migration/rdma: Eliminate error_propagate() Markus Armbruster
                   ` (43 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

include/qapi/error.h demands:

 * - Functions that use Error to report errors have an Error **errp
 *   parameter.  It should be the last parameter, except for functions
 *   taking variable arguments.

qemu_rdma_connect() does not conform.  Clean it up.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 94b828b45d..176fe1a5f1 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2532,7 +2532,8 @@ static int qemu_get_cm_event_timeout(RDMAContext *rdma,
     }
 }
 
-static int qemu_rdma_connect(RDMAContext *rdma, Error **errp, bool return_path)
+static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
+                             Error **errp)
 {
     RDMACapabilities cap = {
                                 .version = RDMA_CONTROL_VERSION_CURRENT,
@@ -4175,7 +4176,7 @@ void rdma_start_outgoing_migration(void *opaque,
     }
 
     trace_rdma_start_outgoing_migration_after_rdma_source_init();
-    ret = qemu_rdma_connect(rdma, errp, false);
+    ret = qemu_rdma_connect(rdma, false, errp);
 
     if (ret) {
         goto err;
@@ -4196,7 +4197,7 @@ void rdma_start_outgoing_migration(void *opaque,
             goto return_path_err;
         }
 
-        ret = qemu_rdma_connect(rdma_return_path, errp, true);
+        ret = qemu_rdma_connect(rdma_return_path, true, errp);
 
         if (ret) {
             goto return_path_err;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 11/53] migration/rdma: Eliminate error_propagate()
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (9 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 10/53] migration/rdma: Put @errp parameter last Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 14:58   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 12/53] migration/rdma: Drop rdma_add_block() error handling Markus Armbruster
                   ` (42 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 176fe1a5f1..380d550229 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2445,7 +2445,6 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
 static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
 {
     int ret, idx;
-    Error *local_err = NULL, **temp = &local_err;
 
     /*
      * Will be validated against destination's actual capabilities
@@ -2453,14 +2452,14 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
      */
     rdma->pin_all = pin_all;
 
-    ret = qemu_rdma_resolve_host(rdma, temp);
+    ret = qemu_rdma_resolve_host(rdma, errp);
     if (ret) {
         goto err_rdma_source_init;
     }
 
     ret = qemu_rdma_alloc_pd_cq(rdma);
     if (ret) {
-        ERROR(temp, "rdma migration: error allocating pd and cq! Your mlock()"
+        ERROR(errp, "rdma migration: error allocating pd and cq! Your mlock()"
                     " limits may be too low. Please check $ ulimit -a # and "
                     "search for 'ulimit -l' in the output");
         goto err_rdma_source_init;
@@ -2468,13 +2467,13 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
 
     ret = qemu_rdma_alloc_qp(rdma);
     if (ret) {
-        ERROR(temp, "rdma migration: error allocating qp!");
+        ERROR(errp, "rdma migration: error allocating qp!");
         goto err_rdma_source_init;
     }
 
     ret = qemu_rdma_init_ram_blocks(rdma);
     if (ret) {
-        ERROR(temp, "rdma migration: error initializing ram blocks!");
+        ERROR(errp, "rdma migration: error initializing ram blocks!");
         goto err_rdma_source_init;
     }
 
@@ -2489,7 +2488,7 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
         if (ret) {
-            ERROR(temp, "rdma migration: error registering %d control!",
+            ERROR(errp, "rdma migration: error registering %d control!",
                                                             idx);
             goto err_rdma_source_init;
         }
@@ -2498,7 +2497,6 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
     return 0;
 
 err_rdma_source_init:
-    error_propagate(errp, local_err);
     qemu_rdma_cleanup(rdma);
     return -1;
 }
@@ -4103,7 +4101,6 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
 {
     int ret;
     RDMAContext *rdma;
-    Error *local_err = NULL;
 
     trace_rdma_start_incoming_migration();
 
@@ -4113,13 +4110,12 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
         return;
     }
 
-    rdma = qemu_rdma_data_init(host_port, &local_err);
+    rdma = qemu_rdma_data_init(host_port, errp);
     if (rdma == NULL) {
         goto err;
     }
 
-    ret = qemu_rdma_dest_init(rdma, &local_err);
-
+    ret = qemu_rdma_dest_init(rdma, errp);
     if (ret) {
         goto err;
     }
@@ -4142,7 +4138,6 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
 cleanup_rdma:
     qemu_rdma_cleanup(rdma);
 err:
-    error_propagate(errp, local_err);
     if (rdma) {
         g_free(rdma->host);
         g_free(rdma->host_port);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 12/53] migration/rdma: Drop rdma_add_block() error handling
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (10 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 11/53] migration/rdma: Eliminate error_propagate() Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 14:58   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 13/53] migration/rdma: Drop qemu_rdma_search_ram_block() " Markus Armbruster
                   ` (41 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

rdma_add_block() can't fail.  Return void, and drop the unreachable
error handling.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 30 +++++++++---------------------
 1 file changed, 9 insertions(+), 21 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 380d550229..f2f811ace2 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -559,9 +559,9 @@ static inline uint8_t *ram_chunk_end(const RDMALocalBlock *rdma_ram_block,
     return result;
 }
 
-static int rdma_add_block(RDMAContext *rdma, const char *block_name,
-                         void *host_addr,
-                         ram_addr_t block_offset, uint64_t length)
+static void rdma_add_block(RDMAContext *rdma, const char *block_name,
+                           void *host_addr,
+                           ram_addr_t block_offset, uint64_t length)
 {
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
     RDMALocalBlock *block;
@@ -615,8 +615,6 @@ static int rdma_add_block(RDMAContext *rdma, const char *block_name,
                          block->nb_chunks);
 
     local->nb_blocks++;
-
-    return 0;
 }
 
 /*
@@ -630,7 +628,8 @@ static int qemu_rdma_init_one_block(RAMBlock *rb, void *opaque)
     void *host_addr = qemu_ram_get_host_addr(rb);
     ram_addr_t block_offset = qemu_ram_get_offset(rb);
     ram_addr_t length = qemu_ram_get_used_length(rb);
-    return rdma_add_block(opaque, block_name, host_addr, block_offset, length);
+    rdma_add_block(opaque, block_name, host_addr, block_offset, length);
+    return 0;
 }
 
 /*
@@ -638,7 +637,7 @@ static int qemu_rdma_init_one_block(RAMBlock *rb, void *opaque)
  * identify chunk boundaries inside each RAMBlock and also be referenced
  * during dynamic page registration.
  */
-static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
+static void qemu_rdma_init_ram_blocks(RDMAContext *rdma)
 {
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
     int ret;
@@ -646,14 +645,11 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
     assert(rdma->blockmap == NULL);
     memset(local, 0, sizeof *local);
     ret = foreach_not_ignored_block(qemu_rdma_init_one_block, rdma);
-    if (ret) {
-        return ret;
-    }
+    assert(!ret);
     trace_qemu_rdma_init_ram_blocks(local->nb_blocks);
     rdma->dest_blocks = g_new0(RDMADestBlock,
                                rdma->local_ram_blocks.nb_blocks);
     local->init = true;
-    return 0;
 }
 
 /*
@@ -2471,11 +2467,7 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
         goto err_rdma_source_init;
     }
 
-    ret = qemu_rdma_init_ram_blocks(rdma);
-    if (ret) {
-        ERROR(errp, "rdma migration: error initializing ram blocks!");
-        goto err_rdma_source_init;
-    }
+    qemu_rdma_init_ram_blocks(rdma);
 
     /* Build the hash that maps from offset to RAMBlock */
     rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
@@ -3430,11 +3422,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
         goto err_rdma_dest_wait;
     }
 
-    ret = qemu_rdma_init_ram_blocks(rdma);
-    if (ret) {
-        error_report("rdma migration: error initializing ram blocks!");
-        goto err_rdma_dest_wait;
-    }
+    qemu_rdma_init_ram_blocks(rdma);
 
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 13/53] migration/rdma: Drop qemu_rdma_search_ram_block() error handling
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (11 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 12/53] migration/rdma: Drop rdma_add_block() error handling Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 15:00   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 14/53] migration/rdma: Make qemu_rdma_buffer_mergeable() return bool Markus Armbruster
                   ` (40 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

qemu_rdma_search_ram_block() can't fail.  Return void, and drop the
unreachable error handling.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index f2f811ace2..7bea4d3947 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1231,15 +1231,13 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
  *
  * Once the block is found, also identify which 'chunk' within that
  * block that the page belongs to.
- *
- * This search cannot fail or the migration will fail.
  */
-static int qemu_rdma_search_ram_block(RDMAContext *rdma,
-                                      uintptr_t block_offset,
-                                      uint64_t offset,
-                                      uint64_t length,
-                                      uint64_t *block_index,
-                                      uint64_t *chunk_index)
+static void qemu_rdma_search_ram_block(RDMAContext *rdma,
+                                       uintptr_t block_offset,
+                                       uint64_t offset,
+                                       uint64_t length,
+                                       uint64_t *block_index,
+                                       uint64_t *chunk_index)
 {
     uint64_t current_addr = block_offset + offset;
     RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
@@ -1251,8 +1249,6 @@ static int qemu_rdma_search_ram_block(RDMAContext *rdma,
     *block_index = block->index;
     *chunk_index = ram_chunk_index(block->local_host_addr,
                 block->local_host_addr + (current_addr - block->offset));
-
-    return 0;
 }
 
 /*
@@ -2321,12 +2317,8 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
         rdma->current_length = 0;
         rdma->current_addr = current_addr;
 
-        ret = qemu_rdma_search_ram_block(rdma, block_offset,
-                                         offset, len, &index, &chunk);
-        if (ret) {
-            error_report("ram block search failed");
-            return ret;
-        }
+        qemu_rdma_search_ram_block(rdma, block_offset,
+                                   offset, len, &index, &chunk);
         rdma->current_index = index;
         rdma->current_chunk = chunk;
     }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 14/53] migration/rdma: Make qemu_rdma_buffer_mergeable() return bool
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (12 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 13/53] migration/rdma: Drop qemu_rdma_search_ram_block() " Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 15:01   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 15/53] migration/rdma: Use bool for two RDMAContext flags Markus Armbruster
                   ` (39 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

qemu_rdma_buffer_mergeable() is semantically a predicate.  It returns
int 0 or 1.  Return bool instead, and fix the function name's
spelling.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 7bea4d3947..73dd34d8f3 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2244,7 +2244,7 @@ static int qemu_rdma_write_flush(QEMUFile *f, RDMAContext *rdma)
     return 0;
 }
 
-static inline int qemu_rdma_buffer_mergable(RDMAContext *rdma,
+static inline bool qemu_rdma_buffer_mergeable(RDMAContext *rdma,
                     uint64_t offset, uint64_t len)
 {
     RDMALocalBlock *block;
@@ -2252,11 +2252,11 @@ static inline int qemu_rdma_buffer_mergable(RDMAContext *rdma,
     uint8_t *chunk_end;
 
     if (rdma->current_index < 0) {
-        return 0;
+        return false;
     }
 
     if (rdma->current_chunk < 0) {
-        return 0;
+        return false;
     }
 
     block = &(rdma->local_ram_blocks.block[rdma->current_index]);
@@ -2264,29 +2264,29 @@ static inline int qemu_rdma_buffer_mergable(RDMAContext *rdma,
     chunk_end = ram_chunk_end(block, rdma->current_chunk);
 
     if (rdma->current_length == 0) {
-        return 0;
+        return false;
     }
 
     /*
      * Only merge into chunk sequentially.
      */
     if (offset != (rdma->current_addr + rdma->current_length)) {
-        return 0;
+        return false;
     }
 
     if (offset < block->offset) {
-        return 0;
+        return false;
     }
 
     if ((offset + len) > (block->offset + block->length)) {
-        return 0;
+        return false;
     }
 
     if ((host_addr + len) > chunk_end) {
-        return 0;
+        return false;
     }
 
-    return 1;
+    return true;
 }
 
 /*
@@ -2309,7 +2309,7 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
     int ret;
 
     /* If we cannot merge it, we flush the current buffer first. */
-    if (!qemu_rdma_buffer_mergable(rdma, current_addr, len)) {
+    if (!qemu_rdma_buffer_mergeable(rdma, current_addr, len)) {
         ret = qemu_rdma_write_flush(f, rdma);
         if (ret) {
             return ret;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 15/53] migration/rdma: Use bool for two RDMAContext flags
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (13 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 14/53] migration/rdma: Make qemu_rdma_buffer_mergeable() return bool Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 15:56   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 16/53] migration/rdma: Fix or document problematic uses of errno Markus Armbruster
                   ` (38 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

@error_reported and @received_error are flags.  The latter is even
assigned bool true.  Change them from int to bool.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 73dd34d8f3..28097ce604 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -91,7 +91,7 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
             if (!rdma->error_reported) { \
                 error_report("RDMA is in an error state waiting migration" \
                                 " to abort!"); \
-                rdma->error_reported = 1; \
+                rdma->error_reported = true; \
             } \
             return rdma->error_state; \
         } \
@@ -365,8 +365,8 @@ typedef struct RDMAContext {
      * and remember the error state.
      */
     int error_state;
-    int error_reported;
-    int received_error;
+    bool error_reported;
+    bool received_error;
 
     /*
      * Description of ram blocks used throughout the code.
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 16/53] migration/rdma: Fix or document problematic uses of errno
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (14 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 15/53] migration/rdma: Use bool for two RDMAContext flags Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-09-29 15:09   ` Fabiano Rosas
                     ` (2 more replies)
  2023-09-28 13:19 ` [PATCH v2 17/53] migration/rdma: Ditch useless numeric error codes in error messages Markus Armbruster
                   ` (37 subsequent siblings)
  53 siblings, 3 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

We use errno after calling Libibverbs functions that are not
documented to set errno (manual page does not mention errno), or where
the documentation is unclear ("returns [...] the value of errno on
failure").  While this could be read as "sets errno and returns it",
a glance at the source code[*] kills that hope:

    static inline int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr,
                                    struct ibv_send_wr **bad_wr)
    {
            return qp->context->ops.post_send(qp, wr, bad_wr);
    }

The callback can be

    static int mana_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
                              struct ibv_send_wr **bad)
    {
            /* This version of driver supports RAW QP only.
             * Posting WR is done directly in the application.
             */
            return EOPNOTSUPP;
    }

Neither of them touches errno.

One of these errno uses is easy to fix, so do that now.  Several more
will go away later in the series; add temporary FIXME commments.
Three will remain; add TODO comments.  TODO, not FIXME, because the
bug might be in Libibverbs documentation.

[*] https://github.com/linux-rdma/rdma-core.git
    commit 55fa316b4b18f258d8ac1ceb4aa5a7a35b094dcf

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 45 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 39 insertions(+), 6 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 28097ce604..bba8c99fa9 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -853,6 +853,12 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
 
         for (x = 0; x < num_devices; x++) {
             verbs = ibv_open_device(dev_list[x]);
+            /*
+             * ibv_open_device() is not documented to set errno.  If
+             * it does, it's somebody else's doc bug.  If it doesn't,
+             * the use of errno below is wrong.
+             * TODO Find out whether ibv_open_device() sets errno.
+             */
             if (!verbs) {
                 if (errno == EPERM) {
                     continue;
@@ -1162,11 +1168,7 @@ static void qemu_rdma_advise_prefetch_mr(struct ibv_pd *pd, uint64_t addr,
     ret = ibv_advise_mr(pd, advice,
                         IBV_ADVISE_MR_FLAG_FLUSH, &sg_list, 1);
     /* ignore the error */
-    if (ret) {
-        trace_qemu_rdma_advise_mr(name, len, addr, strerror(errno));
-    } else {
-        trace_qemu_rdma_advise_mr(name, len, addr, "successed");
-    }
+    trace_qemu_rdma_advise_mr(name, len, addr, strerror(ret));
 #endif
 }
 
@@ -1183,7 +1185,12 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
                     local->block[i].local_host_addr,
                     local->block[i].length, access
                     );
-
+        /*
+         * ibv_reg_mr() is not documented to set errno.  If it does,
+         * it's somebody else's doc bug.  If it doesn't, the use of
+         * errno below is wrong.
+         * TODO Find out whether ibv_reg_mr() sets errno.
+         */
         if (!local->block[i].mr &&
             errno == ENOTSUP && rdma_support_odp(rdma->verbs)) {
                 access |= IBV_ACCESS_ON_DEMAND;
@@ -1291,6 +1298,12 @@ static int qemu_rdma_register_and_get_keys(RDMAContext *rdma,
         trace_qemu_rdma_register_and_get_keys(len, chunk_start);
 
         block->pmr[chunk] = ibv_reg_mr(rdma->pd, chunk_start, len, access);
+        /*
+         * ibv_reg_mr() is not documented to set errno.  If it does,
+         * it's somebody else's doc bug.  If it doesn't, the use of
+         * errno below is wrong.
+         * TODO Find out whether ibv_reg_mr() sets errno.
+         */
         if (!block->pmr[chunk] &&
             errno == ENOTSUP && rdma_support_odp(rdma->verbs)) {
             access |= IBV_ACCESS_ON_DEMAND;
@@ -1408,6 +1421,11 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
         block->remote_keys[chunk] = 0;
 
         if (ret != 0) {
+            /*
+             * FIXME perror() is problematic, bcause ibv_dereg_mr() is
+             * not documented to set errno.  Will go away later in
+             * this series.
+             */
             perror("unregistration chunk failed");
             return -ret;
         }
@@ -1658,6 +1676,11 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
 
         ret = ibv_get_cq_event(ch, &cq, &cq_ctx);
         if (ret) {
+            /*
+             * FIXME perror() is problematic, because ibv_reg_mr() is
+             * not documented to set errno.  Will go away later in
+             * this series.
+             */
             perror("ibv_get_cq_event");
             goto err_block_for_wrid;
         }
@@ -2199,6 +2222,11 @@ retry:
         goto retry;
 
     } else if (ret > 0) {
+        /*
+         * FIXME perror() is problematic, because whether
+         * ibv_post_send() sets errno is unclear.  Will go away later
+         * in this series.
+         */
         perror("rdma migration: post rdma write failed");
         return -ret;
     }
@@ -2559,6 +2587,11 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
         ret = rdma_get_cm_event(rdma->channel, &cm_event);
     }
     if (ret) {
+        /*
+         * FIXME perror() is wrong, because
+         * qemu_get_cm_event_timeout() can fail without setting errno.
+         * Will go away later in this series.
+         */
         perror("rdma_get_cm_event after rdma_connect");
         ERROR(errp, "connecting to destination!");
         goto err_rdma_source_connect;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 17/53] migration/rdma: Ditch useless numeric error codes in error messages
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (15 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 16/53] migration/rdma: Fix or document problematic uses of errno Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 15:06   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 18/53] migration/rdma: Fix io_writev(), io_readv() methods to obey contract Markus Armbruster
                   ` (36 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Several error messages include numeric error codes returned by failed
functions:

* ibv_poll_cq() returns an unspecified negative value.  Useless.

* rdma_accept and rdma_get_cm_event() return -1.  Useless.

* qemu_rdma_poll() returns either -1 or an unspecified negative
  value.  Useless.

* qemu_rdma_block_for_wrid(), qemu_rdma_write_flush(),
  qemu_rdma_exchange_send(), qemu_rdma_exchange_recv(),
  qemu_rdma_write() return a negative value that may or may not be an
  errno value.  While reporting human-readable errno
  information (which a number is not) can be useful, reporting an
  error code that may or may not be an errno value is useless.

Drop these error codes from the error messages.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index bba8c99fa9..0d2d119e6a 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1476,7 +1476,7 @@ static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
     }
 
     if (ret < 0) {
-        error_report("ibv_poll_cq return %d", ret);
+        error_report("ibv_poll_cq failed");
         return ret;
     }
 
@@ -2215,7 +2215,7 @@ retry:
         ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL);
         if (ret < 0) {
             error_report("rdma migration: failed to make "
-                         "room in full send queue! %d", ret);
+                         "room in full send queue!");
             return ret;
         }
 
@@ -2800,7 +2800,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
     ret = qemu_rdma_write_flush(f, rdma);
     if (ret < 0) {
         rdma->error_state = ret;
-        error_setg(errp, "qemu_rdma_write_flush returned %d", ret);
+        error_setg(errp, "qemu_rdma_write_flush failed");
         return -1;
     }
 
@@ -2820,7 +2820,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
 
             if (ret < 0) {
                 rdma->error_state = ret;
-                error_setg(errp, "qemu_rdma_exchange_send returned %d", ret);
+                error_setg(errp, "qemu_rdma_exchange_send failed");
                 return -1;
             }
 
@@ -2910,7 +2910,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
 
         if (ret < 0) {
             rdma->error_state = ret;
-            error_setg(errp, "qemu_rdma_exchange_recv returned %d", ret);
+            error_setg(errp, "qemu_rdma_exchange_recv failed");
             return -1;
         }
 
@@ -3253,7 +3253,7 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
      */
     ret = qemu_rdma_write(f, rdma, block_offset, offset, size);
     if (ret < 0) {
-        error_report("rdma migration: write error! %d", ret);
+        error_report("rdma migration: write error");
         goto err;
     }
 
@@ -3280,7 +3280,7 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
         uint64_t wr_id, wr_id_in;
         int ret = qemu_rdma_poll(rdma, rdma->recv_cq, &wr_id_in, NULL);
         if (ret < 0) {
-            error_report("rdma migration: polling error! %d", ret);
+            error_report("rdma migration: polling error");
             goto err;
         }
 
@@ -3295,7 +3295,7 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
         uint64_t wr_id, wr_id_in;
         int ret = qemu_rdma_poll(rdma, rdma->send_cq, &wr_id_in, NULL);
         if (ret < 0) {
-            error_report("rdma migration: polling error! %d", ret);
+            error_report("rdma migration: polling error");
             goto err;
         }
 
@@ -3470,13 +3470,13 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 
     ret = rdma_accept(rdma->cm_id, &conn_param);
     if (ret) {
-        error_report("rdma_accept returns %d", ret);
+        error_report("rdma_accept failed");
         goto err_rdma_dest_wait;
     }
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
     if (ret) {
-        error_report("rdma_accept get_cm_event failed %d", ret);
+        error_report("rdma_accept get_cm_event failed");
         goto err_rdma_dest_wait;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 18/53] migration/rdma: Fix io_writev(), io_readv() methods to obey contract
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (16 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 17/53] migration/rdma: Ditch useless numeric error codes in error messages Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 15:09   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 19/53] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE() Markus Armbruster
                   ` (35 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

QIOChannelClass methods qio_channel_rdma_readv() and
qio_channel_rdma_writev() violate their method contract when
rdma->error_state is non-zero:

1. They return whatever is in rdma->error_state then.  Only -1 will be
   fine.  -2 will be misinterpreted as "would block".  Anything less
   than -2 isn't defined in the contract.  A positive value would be
   misinterpreted as success, but I believe that's not actually
   possible.

2. They neglect to set an error then.  If something up the call stack
   dereferences the error when failure is returned, it will crash.  If
   it ignores the return value and checks the error instead, it will
   miss the error.

Crap like this happens when return statements hide in macros,
especially when their uses are far away from the definition.

I elected not to investigate how callers are impacted.

Expand the two bad macro uses, so we can set an error and return -1.
The next commit will then get rid of the macro altogether.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 0d2d119e6a..fb89b89e80 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2791,7 +2791,11 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
         return -1;
     }
 
-    CHECK_ERROR_STATE();
+    if (rdma->error_state) {
+        error_setg(errp,
+                   "RDMA is in an error state waiting migration to abort!");
+        return -1;
+    }
 
     /*
      * Push out any writes that
@@ -2877,7 +2881,11 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
         return -1;
     }
 
-    CHECK_ERROR_STATE();
+    if (rdma->error_state) {
+        error_setg(errp,
+                   "RDMA is in an error state waiting migration to abort!");
+        return -1;
+    }
 
     for (i = 0; i < niov; i++) {
         size_t want = iov[i].iov_len;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 19/53] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE()
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (17 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 18/53] migration/rdma: Fix io_writev(), io_readv() methods to obey contract Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 15:10   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 20/53] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error Markus Armbruster
                   ` (34 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Hiding return statements in macros is a bad idea.  Use a function
instead, and open code the return part.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 43 +++++++++++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index fb89b89e80..912cea6ad9 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -85,18 +85,6 @@
  */
 static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
 
-#define CHECK_ERROR_STATE() \
-    do { \
-        if (rdma->error_state) { \
-            if (!rdma->error_reported) { \
-                error_report("RDMA is in an error state waiting migration" \
-                                " to abort!"); \
-                rdma->error_reported = true; \
-            } \
-            return rdma->error_state; \
-        } \
-    } while (0)
-
 /*
  * A work request ID is 64-bits and we split up these bits
  * into 3 parts:
@@ -451,6 +439,16 @@ typedef struct QEMU_PACKED {
     uint64_t chunks;            /* how many sequential chunks to register */
 } RDMARegister;
 
+static int check_error_state(RDMAContext *rdma)
+{
+    if (rdma->error_state && !rdma->error_reported) {
+        error_report("RDMA is in an error state waiting migration"
+                     " to abort!");
+        rdma->error_reported = true;
+    }
+    return rdma->error_state;
+}
+
 static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
 {
     RDMALocalBlock *local_block;
@@ -3250,7 +3248,10 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
         return -EIO;
     }
 
-    CHECK_ERROR_STATE();
+    ret = check_error_state(rdma);
+    if (ret) {
+        return ret;
+    }
 
     qemu_fflush(f);
 
@@ -3566,7 +3567,10 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
         return -EIO;
     }
 
-    CHECK_ERROR_STATE();
+    ret = check_error_state(rdma);
+    if (ret) {
+        return ret;
+    }
 
     local = &rdma->local_ram_blocks;
     do {
@@ -3870,6 +3874,7 @@ static int qemu_rdma_registration_start(QEMUFile *f,
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
     RDMAContext *rdma;
+    int ret;
 
     if (migration_in_postcopy()) {
         return 0;
@@ -3881,7 +3886,10 @@ static int qemu_rdma_registration_start(QEMUFile *f,
         return -EIO;
     }
 
-    CHECK_ERROR_STATE();
+    ret = check_error_state(rdma);
+    if (ret) {
+        return ret;
+    }
 
     trace_qemu_rdma_registration_start(flags);
     qemu_put_be64(f, RAM_SAVE_FLAG_HOOK);
@@ -3912,7 +3920,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
         return -EIO;
     }
 
-    CHECK_ERROR_STATE();
+    ret = check_error_state(rdma);
+    if (ret) {
+        return ret;
+    }
 
     qemu_fflush(f);
     ret = qemu_rdma_drain_cq(f, rdma);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 20/53] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (18 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 19/53] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE() Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 15:10   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 21/53] migration/rdma: Fix qemu_get_cm_event_timeout() to always " Markus Armbruster
                   ` (33 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

qemu_rdma_resolve_host() and qemu_rdma_dest_init() try addresses until
they find on that works.  If none works, they return the first Error
set by qemu_rdma_broken_ipv6_kernel(), or else return a generic one.

qemu_rdma_broken_ipv6_kernel() neglects to set an Error when
ibv_open_device() fails.  If a later address fails differently, we use
that Error instead, or else the generic one.  Harmless enough, but
needs fixing all the same.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index 912cea6ad9..18905082a8 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -861,6 +861,8 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
                 if (errno == EPERM) {
                     continue;
                 } else {
+                    error_setg_errno(errp, errno,
+                                     "could not open RDMA device context");
                     return -EINVAL;
                 }
             }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 21/53] migration/rdma: Fix qemu_get_cm_event_timeout() to always set error
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (19 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 20/53] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 15:25   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 22/53] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port Markus Armbruster
                   ` (32 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

qemu_get_cm_event_timeout() neglects to set an error when it fails
because rdma_get_cm_event() fails.  Harmless, as its caller
qemu_rdma_connect() substitutes a generic error then.  Fix it anyway.

qemu_rdma_connect() also sets the generic error when its own call of
rdma_get_cm_event() fails.  Make the error handling more obvious: set
a specific error right after rdma_get_cm_event() fails.  Delete the
generic error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 18905082a8..1a0ad44411 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2535,7 +2535,11 @@ static int qemu_get_cm_event_timeout(RDMAContext *rdma,
         ERROR(errp, "failed to poll cm event, errno=%i", errno);
         return -1;
     } else if (poll_fd.revents & POLLIN) {
-        return rdma_get_cm_event(rdma->channel, cm_event);
+        if (rdma_get_cm_event(rdma->channel, cm_event) < 0) {
+            ERROR(errp, "failed to get cm event");
+            return -1;
+        }
+        return 0;
     } else {
         ERROR(errp, "no POLLIN event, revent=%x", poll_fd.revents);
         return -1;
@@ -2585,6 +2589,9 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
         ret = qemu_get_cm_event_timeout(rdma, &cm_event, 5000, errp);
     } else {
         ret = rdma_get_cm_event(rdma->channel, &cm_event);
+        if (ret < 0) {
+            ERROR(errp, "failed to get cm event");
+        }
     }
     if (ret) {
         /*
@@ -2593,7 +2600,6 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
          * Will go away later in this series.
          */
         perror("rdma_get_cm_event after rdma_connect");
-        ERROR(errp, "connecting to destination!");
         goto err_rdma_source_connect;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 22/53] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (20 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 21/53] migration/rdma: Fix qemu_get_cm_event_timeout() to always " Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-09-29 15:10   ` Fabiano Rosas
                     ` (2 more replies)
  2023-09-28 13:19 ` [PATCH v2 23/53] migration/rdma: Fix QEMUFileHooks method return values Markus Armbruster
                   ` (31 subsequent siblings)
  53 siblings, 3 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

qemu_rdma_data_init() neglects to set an Error when it fails because
@host_port is null.  Fortunately, no caller passes null, so this is
merely a latent bug.  Drop the flawed code handling null argument.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 1a0ad44411..1ae2f87906 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2747,25 +2747,22 @@ static RDMAContext *qemu_rdma_data_init(const char *host_port, Error **errp)
     RDMAContext *rdma = NULL;
     InetSocketAddress *addr;
 
-    if (host_port) {
-        rdma = g_new0(RDMAContext, 1);
-        rdma->current_index = -1;
-        rdma->current_chunk = -1;
+    rdma = g_new0(RDMAContext, 1);
+    rdma->current_index = -1;
+    rdma->current_chunk = -1;
 
-        addr = g_new(InetSocketAddress, 1);
-        if (!inet_parse(addr, host_port, NULL)) {
-            rdma->port = atoi(addr->port);
-            rdma->host = g_strdup(addr->host);
-            rdma->host_port = g_strdup(host_port);
-        } else {
-            ERROR(errp, "bad RDMA migration address '%s'", host_port);
-            g_free(rdma);
-            rdma = NULL;
-        }
-
-        qapi_free_InetSocketAddress(addr);
+    addr = g_new(InetSocketAddress, 1);
+    if (!inet_parse(addr, host_port, NULL)) {
+        rdma->port = atoi(addr->port);
+        rdma->host = g_strdup(addr->host);
+        rdma->host_port = g_strdup(host_port);
+    } else {
+        ERROR(errp, "bad RDMA migration address '%s'", host_port);
+        g_free(rdma);
+        rdma = NULL;
     }
 
+    qapi_free_InetSocketAddress(addr);
     return rdma;
 }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 23/53] migration/rdma: Fix QEMUFileHooks method return values
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (21 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 22/53] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 15:28   ` Juan Quintela
  2023-10-04 16:22   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 24/53] migration/rdma: Fix rdma_getaddrinfo() error checking Markus Armbruster
                   ` (30 subsequent siblings)
  53 siblings, 2 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

The QEMUFileHooks methods don't come with a written contract.  Digging
through the code calling them, we find:

* save_page():

  Negative values RAM_SAVE_CONTROL_DELAYED and
  RAM_SAVE_CONTROL_NOT_SUPP are special.  Any other negative value is
  an unspecified error.

  qemu_rdma_save_page() returns -EIO or rdma->error_state on error.  I
  believe the latter is always negative.  Nothing stops either of them
  to clash with the special values, though.  Feels unlikely, but fix
  it anyway to return only the special values and -1.

* before_ram_iterate(), after_ram_iterate():

  Negative value means error.  qemu_rdma_registration_start() and
  qemu_rdma_registration_stop() comply as far as I can tell.  Make
  them comply *obviously*, by returning -1 on error.

* hook_ram_load:

  Negative value means error.  rdma_load_hook() already returns -1 on
  error.  Leave it alone.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 79 +++++++++++++++++++++++-------------------------
 1 file changed, 37 insertions(+), 42 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 1ae2f87906..a58c2734e3 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3250,12 +3250,11 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
     rdma = qatomic_rcu_read(&rioc->rdmaout);
 
     if (!rdma) {
-        return -EIO;
+        return -1;
     }
 
-    ret = check_error_state(rdma);
-    if (ret) {
-        return ret;
+    if (check_error_state(rdma)) {
+        return -1;
     }
 
     qemu_fflush(f);
@@ -3321,9 +3320,10 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
     }
 
     return RAM_SAVE_CONTROL_DELAYED;
+
 err:
     rdma->error_state = ret;
-    return ret;
+    return -1;
 }
 
 static void rdma_accept_incoming_migration(void *opaque);
@@ -3569,12 +3569,11 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
     rdma = qatomic_rcu_read(&rioc->rdmain);
 
     if (!rdma) {
-        return -EIO;
+        return -1;
     }
 
-    ret = check_error_state(rdma);
-    if (ret) {
-        return ret;
+    if (check_error_state(rdma)) {
+        return -1;
     }
 
     local = &rdma->local_ram_blocks;
@@ -3607,7 +3606,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                              (unsigned int)comp->block_idx,
                              rdma->local_ram_blocks.nb_blocks);
                 ret = -EIO;
-                goto out;
+                goto err;
             }
             block = &(rdma->local_ram_blocks.block[comp->block_idx]);
 
@@ -3619,7 +3618,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
         case RDMA_CONTROL_REGISTER_FINISHED:
             trace_qemu_rdma_registration_handle_finished();
-            goto out;
+            return 0;
 
         case RDMA_CONTROL_RAM_BLOCKS_REQUEST:
             trace_qemu_rdma_registration_handle_ram_blocks();
@@ -3640,7 +3639,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                 if (ret) {
                     error_report("rdma migration: error dest "
                                     "registering ram blocks");
-                    goto out;
+                    goto err;
                 }
             }
 
@@ -3679,7 +3678,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
             if (ret < 0) {
                 error_report("rdma migration: error sending remote info");
-                goto out;
+                goto err;
             }
 
             break;
@@ -3706,7 +3705,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                                  (unsigned int)reg->current_index,
                                  rdma->local_ram_blocks.nb_blocks);
                     ret = -ENOENT;
-                    goto out;
+                    goto err;
                 }
                 block = &(rdma->local_ram_blocks.block[reg->current_index]);
                 if (block->is_ram_block) {
@@ -3716,7 +3715,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                             block->block_name, block->offset,
                             reg->key.current_addr);
                         ret = -ERANGE;
-                        goto out;
+                        goto err;
                     }
                     host_addr = (block->local_host_addr +
                                 (reg->key.current_addr - block->offset));
@@ -3732,7 +3731,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                             " chunk: %" PRIx64,
                             block->block_name, reg->key.chunk);
                         ret = -ERANGE;
-                        goto out;
+                        goto err;
                     }
                 }
                 chunk_start = ram_chunk_start(block, chunk);
@@ -3744,7 +3743,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                             chunk, chunk_start, chunk_end)) {
                     error_report("cannot get rkey");
                     ret = -EINVAL;
-                    goto out;
+                    goto err;
                 }
                 reg_result->rkey = tmp_rkey;
 
@@ -3761,7 +3760,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
             if (ret < 0) {
                 error_report("Failed to send control buffer");
-                goto out;
+                goto err;
             }
             break;
         case RDMA_CONTROL_UNREGISTER_REQUEST:
@@ -3784,7 +3783,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                 if (ret != 0) {
                     perror("rdma unregistration chunk failed");
                     ret = -ret;
-                    goto out;
+                    goto err;
                 }
 
                 rdma->total_registrations--;
@@ -3797,24 +3796,23 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
             if (ret < 0) {
                 error_report("Failed to send control buffer");
-                goto out;
+                goto err;
             }
             break;
         case RDMA_CONTROL_REGISTER_RESULT:
             error_report("Invalid RESULT message at dest.");
             ret = -EIO;
-            goto out;
+            goto err;
         default:
             error_report("Unknown control message %s", control_desc(head.type));
             ret = -EIO;
-            goto out;
+            goto err;
         }
     } while (1);
-out:
-    if (ret < 0) {
-        rdma->error_state = ret;
-    }
-    return ret;
+
+err:
+    rdma->error_state = ret;
+    return -1;
 }
 
 /* Destination:
@@ -3836,7 +3834,7 @@ rdma_block_notification_handle(QEMUFile *f, const char *name)
     rdma = qatomic_rcu_read(&rioc->rdmain);
 
     if (!rdma) {
-        return -EIO;
+        return -1;
     }
 
     /* Find the matching RAMBlock in our local list */
@@ -3849,7 +3847,7 @@ rdma_block_notification_handle(QEMUFile *f, const char *name)
 
     if (found == -1) {
         error_report("RAMBlock '%s' not found on destination", name);
-        return -ENOENT;
+        return -1;
     }
 
     rdma->local_ram_blocks.block[curr].src_index = rdma->next_src_index;
@@ -3879,7 +3877,6 @@ static int qemu_rdma_registration_start(QEMUFile *f,
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
     RDMAContext *rdma;
-    int ret;
 
     if (migration_in_postcopy()) {
         return 0;
@@ -3888,12 +3885,11 @@ static int qemu_rdma_registration_start(QEMUFile *f,
     RCU_READ_LOCK_GUARD();
     rdma = qatomic_rcu_read(&rioc->rdmaout);
     if (!rdma) {
-        return -EIO;
+        return -1;
     }
 
-    ret = check_error_state(rdma);
-    if (ret) {
-        return ret;
+    if (check_error_state(rdma)) {
+        return -1;
     }
 
     trace_qemu_rdma_registration_start(flags);
@@ -3922,12 +3918,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
     RCU_READ_LOCK_GUARD();
     rdma = qatomic_rcu_read(&rioc->rdmaout);
     if (!rdma) {
-        return -EIO;
+        return -1;
     }
 
-    ret = check_error_state(rdma);
-    if (ret) {
-        return ret;
+    if (check_error_state(rdma)) {
+        return -1;
     }
 
     qemu_fflush(f);
@@ -3958,7 +3953,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
                     qemu_rdma_reg_whole_ram_blocks : NULL);
         if (ret < 0) {
             fprintf(stderr, "receiving remote info!");
-            return ret;
+            return -1;
         }
 
         nb_dest_blocks = resp.len / sizeof(RDMADestBlock);
@@ -3981,7 +3976,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
                     "not identical on both the source and destination.",
                     local->nb_blocks, nb_dest_blocks);
             rdma->error_state = -EINVAL;
-            return -EINVAL;
+            return -1;
         }
 
         qemu_rdma_move_header(rdma, reg_result_idx, &resp);
@@ -3997,7 +3992,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
                         local->block[i].length,
                         rdma->dest_blocks[i].length);
                 rdma->error_state = -EINVAL;
-                return -EINVAL;
+                return -1;
             }
             local->block[i].remote_host_addr =
                     rdma->dest_blocks[i].remote_host_addr;
@@ -4017,7 +4012,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
     return 0;
 err:
     rdma->error_state = ret;
-    return ret;
+    return -1;
 }
 
 static const QEMUFileHooks rdma_read_hooks = {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 24/53] migration/rdma: Fix rdma_getaddrinfo() error checking
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (22 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 23/53] migration/rdma: Fix QEMUFileHooks method return values Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 15:30   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 25/53] migration/rdma: Return -1 instead of negative errno code Markus Armbruster
                   ` (29 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

rdma_getaddrinfo() returns 0 on success.  On error, it returns one of
the EAI_ error codes like getaddrinfo() does, or -1 with errno set.
This is broken by design: POSIX implicitly specifies the EAI_ error
codes to be non-zero, no more.  They could clash with -1.  Nothing we
can do about this design flaw.

Both callers of rdma_getaddrinfo() only recognize negative values as
error.  Works only because systems elect to make the EAI_ error codes
negative.

Best not to rely on that: change the callers to treat any non-zero
value as failure.  Also change them to return -1 instead of the value
received from getaddrinfo() on failure, to avoid positive error
values.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index a58c2734e3..37399d31d2 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -941,14 +941,14 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
 
     if (rdma->host == NULL || !strcmp(rdma->host, "")) {
         ERROR(errp, "RDMA hostname has not been set");
-        return -EINVAL;
+        return -1;
     }
 
     /* create CM channel */
     rdma->channel = rdma_create_event_channel();
     if (!rdma->channel) {
         ERROR(errp, "could not create CM channel");
-        return -EINVAL;
+        return -1;
     }
 
     /* create CM id */
@@ -962,7 +962,7 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
     port_str[15] = '\0';
 
     ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
-    if (ret < 0) {
+    if (ret) {
         ERROR(errp, "could not rdma_getaddrinfo address %s", rdma->host);
         goto err_resolve_get_addr;
     }
@@ -1004,7 +1004,6 @@ route:
                 rdma_event_str(cm_event->event));
         error_report("rdma_resolve_addr");
         rdma_ack_cm_event(cm_event);
-        ret = -EINVAL;
         goto err_resolve_get_addr;
     }
     rdma_ack_cm_event(cm_event);
@@ -1025,7 +1024,6 @@ route:
         ERROR(errp, "result not equal to event_route_resolved: %s",
                         rdma_event_str(cm_event->event));
         rdma_ack_cm_event(cm_event);
-        ret = -EINVAL;
         goto err_resolve_get_addr;
     }
     rdma_ack_cm_event(cm_event);
@@ -1040,7 +1038,7 @@ err_resolve_get_addr:
 err_resolve_create_id:
     rdma_destroy_event_channel(rdma->channel);
     rdma->channel = NULL;
-    return ret;
+    return -1;
 }
 
 /*
@@ -2675,7 +2673,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
     port_str[15] = '\0';
 
     ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
-    if (ret < 0) {
+    if (ret) {
         ERROR(errp, "could not rdma_getaddrinfo address %s", rdma->host);
         goto err_dest_init_bind_addr;
     }
@@ -2719,7 +2717,7 @@ err_dest_init_create_listen_id:
     rdma_destroy_event_channel(rdma->channel);
     rdma->channel = NULL;
     rdma->error_state = ret;
-    return ret;
+    return -1;
 
 }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 25/53] migration/rdma: Return -1 instead of negative errno code
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (23 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 24/53] migration/rdma: Fix rdma_getaddrinfo() error checking Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 16:19   ` Juan Quintela
  2023-10-04 16:23   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 26/53] migration/rdma: Dumb down remaining int error values to -1 Markus Armbruster
                   ` (28 subsequent siblings)
  53 siblings, 2 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Several functions return negative errno codes on failure.  Callers
check for specific codes exactly never.  For some of the functions,
callers couldn't check even if they wanted to, because the functions
also return negative values that aren't errno codes, leaving readers
confused on what the function actually returns.

Clean up and simplify: return -1 instead of negative errno code.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 44 ++++++++++++++++++++++----------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 37399d31d2..4817f1ea10 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -863,14 +863,14 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
                 } else {
                     error_setg_errno(errp, errno,
                                      "could not open RDMA device context");
-                    return -EINVAL;
+                    return -1;
                 }
             }
 
             if (ibv_query_port(verbs, 1, &port_attr)) {
                 ibv_close_device(verbs);
                 ERROR(errp, "Could not query initial IB port");
-                return -EINVAL;
+                return -1;
             }
 
             if (port_attr.link_layer == IBV_LINK_LAYER_INFINIBAND) {
@@ -895,7 +895,7 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
                 ERROR(errp, "You only have RoCE / iWARP devices in your systems"
                             " and your management software has specified '[::]'"
                             ", but IPv6 over RoCE / iWARP is not supported in Linux.");
-                return -ENONET;
+                return -1;
             }
         }
 
@@ -911,13 +911,13 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
     /* IB ports start with 1, not 0 */
     if (ibv_query_port(verbs, 1, &port_attr)) {
         ERROR(errp, "Could not query initial IB port");
-        return -EINVAL;
+        return -1;
     }
 
     if (port_attr.link_layer == IBV_LINK_LAYER_ETHERNET) {
         ERROR(errp, "Linux kernel's RoCE / iWARP does not support IPv6 "
                     "(but patches on linux-rdma in progress)");
-        return -ENONET;
+        return -1;
     }
 
 #endif
@@ -1425,7 +1425,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
              * this series.
              */
             perror("unregistration chunk failed");
-            return -ret;
+            return -1;
         }
         rdma->total_registrations--;
 
@@ -1570,7 +1570,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
                     if (ret) {
                         error_report("failed to get cm event while wait "
                                      "completion channel");
-                        return -EPIPE;
+                        return -1;
                     }
 
                     error_report("receive cm event while wait comp channel,"
@@ -1578,7 +1578,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
                     if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
                         cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
                         rdma_ack_cm_event(cm_event);
-                        return -EPIPE;
+                        return -1;
                     }
                     rdma_ack_cm_event(cm_event);
                 }
@@ -1591,18 +1591,18 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
                       * I don't trust errno from qemu_poll_ns
                      */
                 error_report("%s: poll failed", __func__);
-                return -EPIPE;
+                return -1;
             }
 
             if (migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) {
                 /* Bail out and let the cancellation happen */
-                return -EPIPE;
+                return -1;
             }
         }
     }
 
     if (rdma->received_error) {
-        return -EPIPE;
+        return -1;
     }
     return rdma->error_state;
 }
@@ -1772,7 +1772,7 @@ static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
 
     if (ret > 0) {
         error_report("Failed to use post IB SEND for control");
-        return -ret;
+        return -1;
     }
 
     ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_SEND_CONTROL, NULL);
@@ -1841,15 +1841,15 @@ static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
         if (head->type == RDMA_CONTROL_ERROR) {
             rdma->received_error = true;
         }
-        return -EIO;
+        return -1;
     }
     if (head->len > RDMA_CONTROL_MAX_BUFFER - sizeof(*head)) {
         error_report("too long length: %d", head->len);
-        return -EINVAL;
+        return -1;
     }
     if (sizeof(*head) + head->len != byte_len) {
         error_report("Malformed length: %d byte_len %d", head->len, byte_len);
-        return -EINVAL;
+        return -1;
     }
 
     return 0;
@@ -2113,7 +2113,7 @@ retry:
                                 (uint8_t *) &comp, NULL, NULL, NULL);
 
                 if (ret < 0) {
-                    return -EIO;
+                    return -1;
                 }
 
                 stat64_add(&mig_stats.zero_pages,
@@ -2148,7 +2148,7 @@ retry:
                                                 &sge.lkey, NULL, chunk,
                                                 chunk_start, chunk_end)) {
                 error_report("cannot get lkey");
-                return -EINVAL;
+                return -1;
             }
 
             reg_result = (RDMARegisterResult *)
@@ -2167,7 +2167,7 @@ retry:
                                                 &sge.lkey, NULL, chunk,
                                                 chunk_start, chunk_end)) {
                 error_report("cannot get lkey!");
-                return -EINVAL;
+                return -1;
             }
         }
 
@@ -2179,7 +2179,7 @@ retry:
                                                      &sge.lkey, NULL, chunk,
                                                      chunk_start, chunk_end)) {
             error_report("cannot get lkey!");
-            return -EINVAL;
+            return -1;
         }
     }
 
@@ -2226,7 +2226,7 @@ retry:
          * in this series.
          */
         perror("rdma migration: post rdma write failed");
-        return -ret;
+        return -1;
     }
 
     set_bit(chunk, block->transit_bitmap);
@@ -2950,14 +2950,14 @@ static int qemu_rdma_drain_cq(QEMUFile *f, RDMAContext *rdma)
     int ret;
 
     if (qemu_rdma_write_flush(f, rdma) < 0) {
-        return -EIO;
+        return -1;
     }
 
     while (rdma->nb_sent) {
         ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL);
         if (ret < 0) {
             error_report("rdma migration: complete polling error!");
-            return -EIO;
+            return -1;
         }
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 26/53] migration/rdma: Dumb down remaining int error values to -1
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (24 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 25/53] migration/rdma: Return -1 instead of negative errno code Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 16:25   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 27/53] migration/rdma: Replace int error_state by bool errored Markus Armbruster
                   ` (27 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

This is just to make the error value more obvious.  Callers don't
mind.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 45 +++++++++++++++++++++++----------------------
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 4817f1ea10..fe101236c4 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1434,7 +1434,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
         ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
                                 &resp, NULL, NULL);
         if (ret < 0) {
-            return ret;
+            return -1;
         }
 
         trace_qemu_rdma_unregister_waiting_complete(chunk);
@@ -1475,7 +1475,7 @@ static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
 
     if (ret < 0) {
         error_report("ibv_poll_cq failed");
-        return ret;
+        return -1;
     }
 
     wr_id = wc.wr_id & RDMA_WRID_TYPE_MASK;
@@ -1604,7 +1604,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
     if (rdma->received_error) {
         return -1;
     }
-    return rdma->error_state;
+    return -!!rdma->error_state;
 }
 
 static struct ibv_comp_channel *to_channel(RDMAContext *rdma, uint64_t wrid)
@@ -1649,7 +1649,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
     while (wr_id != wrid_requested) {
         ret = qemu_rdma_poll(rdma, poll_cq, &wr_id_in, byte_len);
         if (ret < 0) {
-            return ret;
+            return -1;
         }
 
         wr_id = wr_id_in & RDMA_WRID_TYPE_MASK;
@@ -1723,7 +1723,7 @@ err_block_for_wrid:
     }
 
     rdma->error_state = ret;
-    return ret;
+    return -1;
 }
 
 /*
@@ -1778,9 +1778,10 @@ static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
     ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_SEND_CONTROL, NULL);
     if (ret < 0) {
         error_report("rdma migration: send polling control error");
+        return -1;
     }
 
-    return ret;
+    return 0;
 }
 
 /*
@@ -1822,7 +1823,7 @@ static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
 
     if (ret < 0) {
         error_report("rdma migration: recv polling control error!");
-        return ret;
+        return -1;
     }
 
     network_to_control((void *) rdma->wr_data[idx].control);
@@ -1900,7 +1901,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
         ret = qemu_rdma_exchange_get_response(rdma,
                                     &resp, RDMA_CONTROL_READY, RDMA_WRID_READY);
         if (ret < 0) {
-            return ret;
+            return -1;
         }
     }
 
@@ -1912,7 +1913,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
         if (ret) {
             error_report("rdma migration: error posting"
                     " extra control recv for anticipated result!");
-            return ret;
+            return -1;
         }
     }
 
@@ -1922,7 +1923,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
     if (ret) {
         error_report("rdma migration: error posting first control recv!");
-        return ret;
+        return -1;
     }
 
     /*
@@ -1932,7 +1933,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
 
     if (ret < 0) {
         error_report("Failed to send control buffer!");
-        return ret;
+        return -1;
     }
 
     /*
@@ -1943,7 +1944,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
             trace_qemu_rdma_exchange_send_issue_callback();
             ret = callback(rdma);
             if (ret < 0) {
-                return ret;
+                return -1;
             }
         }
 
@@ -1952,7 +1953,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                               resp->type, RDMA_WRID_DATA);
 
         if (ret < 0) {
-            return ret;
+            return -1;
         }
 
         qemu_rdma_move_header(rdma, RDMA_WRID_DATA, resp);
@@ -1988,7 +1989,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
 
     if (ret < 0) {
         error_report("Failed to send control buffer!");
-        return ret;
+        return -1;
     }
 
     /*
@@ -1998,7 +1999,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
                                           expecting, RDMA_WRID_READY);
 
     if (ret < 0) {
-        return ret;
+        return -1;
     }
 
     qemu_rdma_move_header(rdma, RDMA_WRID_READY, head);
@@ -2009,7 +2010,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
     if (ret) {
         error_report("rdma migration: error posting second control recv!");
-        return ret;
+        return -1;
     }
 
     return 0;
@@ -2082,7 +2083,7 @@ retry:
                     "block %d chunk %" PRIu64
                     " current %" PRIu64 " len %" PRIu64 " %d",
                     current_index, chunk, sge.addr, length, rdma->nb_sent);
-            return ret;
+            return -1;
         }
     }
 
@@ -2140,7 +2141,7 @@ retry:
             ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
                                     &resp, &reg_result_idx, NULL);
             if (ret < 0) {
-                return ret;
+                return -1;
             }
 
             /* try to overlap this single registration with the one we sent. */
@@ -2214,7 +2215,7 @@ retry:
         if (ret < 0) {
             error_report("rdma migration: failed to make "
                          "room in full send queue!");
-            return ret;
+            return -1;
         }
 
         goto retry;
@@ -2256,7 +2257,7 @@ static int qemu_rdma_write_flush(QEMUFile *f, RDMAContext *rdma)
             rdma->current_index, rdma->current_addr, rdma->current_length);
 
     if (ret < 0) {
-        return ret;
+        return -1;
     }
 
     if (ret == 0) {
@@ -2338,7 +2339,7 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
     if (!qemu_rdma_buffer_mergeable(rdma, current_addr, len)) {
         ret = qemu_rdma_write_flush(f, rdma);
         if (ret) {
-            return ret;
+            return -1;
         }
         rdma->current_length = 0;
         rdma->current_addr = current_addr;
@@ -3516,7 +3517,7 @@ err_rdma_dest_wait:
     rdma->error_state = ret;
     qemu_rdma_cleanup(rdma);
     g_free(rdma_return_path);
-    return ret;
+    return -1;
 }
 
 static int dest_ram_sort_func(const void *a, const void *b)
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 27/53] migration/rdma: Replace int error_state by bool errored
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (25 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 26/53] migration/rdma: Dumb down remaining int error values to -1 Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 16:25   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 28/53] migration/rdma: Drop superfluous assignments to @ret Markus Armbruster
                   ` (26 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

All we do with the value of RDMAContext member @error_state is test
whether it's zero.  Change to bool and rename to @errored.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 66 ++++++++++++++++++++++++------------------------
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index fe101236c4..d92be4869a 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -352,7 +352,7 @@ typedef struct RDMAContext {
      * memory registration, then do not attempt any future work
      * and remember the error state.
      */
-    int error_state;
+    bool errored;
     bool error_reported;
     bool received_error;
 
@@ -439,14 +439,14 @@ typedef struct QEMU_PACKED {
     uint64_t chunks;            /* how many sequential chunks to register */
 } RDMARegister;
 
-static int check_error_state(RDMAContext *rdma)
+static bool rdma_errored(RDMAContext *rdma)
 {
-    if (rdma->error_state && !rdma->error_reported) {
+    if (rdma->errored && !rdma->error_reported) {
         error_report("RDMA is in an error state waiting migration"
                      " to abort!");
         rdma->error_reported = true;
     }
-    return rdma->error_state;
+    return rdma->errored;
 }
 
 static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
@@ -1547,7 +1547,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
          * But we need to be able to handle 'cancel' or an error
          * without hanging forever.
          */
-        while (!rdma->error_state  && !rdma->received_error) {
+        while (!rdma->errored && !rdma->received_error) {
             GPollFD pfds[2];
             pfds[0].fd = comp_channel->fd;
             pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
@@ -1604,7 +1604,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
     if (rdma->received_error) {
         return -1;
     }
-    return -!!rdma->error_state;
+    return -rdma->errored;
 }
 
 static struct ibv_comp_channel *to_channel(RDMAContext *rdma, uint64_t wrid)
@@ -1722,7 +1722,7 @@ err_block_for_wrid:
         ibv_ack_cq_events(cq, num_cq_events);
     }
 
-    rdma->error_state = ret;
+    rdma->errored = true;
     return -1;
 }
 
@@ -2366,7 +2366,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
     int idx;
 
     if (rdma->cm_id && rdma->connected) {
-        if ((rdma->error_state ||
+        if ((rdma->errored ||
              migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) &&
             !rdma->received_error) {
             RDMAControlHeader head = { .len = 0,
@@ -2652,14 +2652,14 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 
     if (!rdma->host || !rdma->host[0]) {
         ERROR(errp, "RDMA host is not set!");
-        rdma->error_state = -EINVAL;
+        rdma->errored = true;
         return -1;
     }
     /* create CM channel */
     rdma->channel = rdma_create_event_channel();
     if (!rdma->channel) {
         ERROR(errp, "could not create rdma event channel");
-        rdma->error_state = -EINVAL;
+        rdma->errored = true;
         return -1;
     }
 
@@ -2717,7 +2717,7 @@ err_dest_init_bind_addr:
 err_dest_init_create_listen_id:
     rdma_destroy_event_channel(rdma->channel);
     rdma->channel = NULL;
-    rdma->error_state = ret;
+    rdma->errored = true;
     return -1;
 
 }
@@ -2793,7 +2793,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
         return -1;
     }
 
-    if (rdma->error_state) {
+    if (rdma->errored) {
         error_setg(errp,
                    "RDMA is in an error state waiting migration to abort!");
         return -1;
@@ -2805,7 +2805,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
      */
     ret = qemu_rdma_write_flush(f, rdma);
     if (ret < 0) {
-        rdma->error_state = ret;
+        rdma->errored = true;
         error_setg(errp, "qemu_rdma_write_flush failed");
         return -1;
     }
@@ -2825,7 +2825,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
             ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
 
             if (ret < 0) {
-                rdma->error_state = ret;
+                rdma->errored = true;
                 error_setg(errp, "qemu_rdma_exchange_send failed");
                 return -1;
             }
@@ -2883,7 +2883,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
         return -1;
     }
 
-    if (rdma->error_state) {
+    if (rdma->errored) {
         error_setg(errp,
                    "RDMA is in an error state waiting migration to abort!");
         return -1;
@@ -2919,7 +2919,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
         ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_QEMU_FILE);
 
         if (ret < 0) {
-            rdma->error_state = ret;
+            rdma->errored = true;
             error_setg(errp, "qemu_rdma_exchange_recv failed");
             return -1;
         }
@@ -3193,21 +3193,21 @@ qio_channel_rdma_shutdown(QIOChannel *ioc,
     switch (how) {
     case QIO_CHANNEL_SHUTDOWN_READ:
         if (rdmain) {
-            rdmain->error_state = -1;
+            rdmain->errored = true;
         }
         break;
     case QIO_CHANNEL_SHUTDOWN_WRITE:
         if (rdmaout) {
-            rdmaout->error_state = -1;
+            rdmaout->errored = true;
         }
         break;
     case QIO_CHANNEL_SHUTDOWN_BOTH:
     default:
         if (rdmain) {
-            rdmain->error_state = -1;
+            rdmain->errored = true;
         }
         if (rdmaout) {
-            rdmaout->error_state = -1;
+            rdmaout->errored = true;
         }
         break;
     }
@@ -3252,7 +3252,7 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
         return -1;
     }
 
-    if (check_error_state(rdma)) {
+    if (rdma_errored(rdma)) {
         return -1;
     }
 
@@ -3321,7 +3321,7 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
     return RAM_SAVE_CONTROL_DELAYED;
 
 err:
-    rdma->error_state = ret;
+    rdma->errored = true;
     return -1;
 }
 
@@ -3342,13 +3342,13 @@ static void rdma_cm_poll_handler(void *opaque)
 
     if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
         cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
-        if (!rdma->error_state &&
+        if (!rdma->errored &&
             migration_incoming_get_current()->state !=
               MIGRATION_STATUS_COMPLETED) {
             error_report("receive cm event, cm event is %d", cm_event->event);
-            rdma->error_state = -EPIPE;
+            rdma->errored = true;
             if (rdma->return_path) {
-                rdma->return_path->error_state = -EPIPE;
+                rdma->return_path->errored = true;
             }
         }
         rdma_ack_cm_event(cm_event);
@@ -3514,7 +3514,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     return 0;
 
 err_rdma_dest_wait:
-    rdma->error_state = ret;
+    rdma->errored = true;
     qemu_rdma_cleanup(rdma);
     g_free(rdma_return_path);
     return -1;
@@ -3571,7 +3571,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
         return -1;
     }
 
-    if (check_error_state(rdma)) {
+    if (rdma_errored(rdma)) {
         return -1;
     }
 
@@ -3810,7 +3810,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
     } while (1);
 
 err:
-    rdma->error_state = ret;
+    rdma->errored = true;
     return -1;
 }
 
@@ -3887,7 +3887,7 @@ static int qemu_rdma_registration_start(QEMUFile *f,
         return -1;
     }
 
-    if (check_error_state(rdma)) {
+    if (rdma_errored(rdma)) {
         return -1;
     }
 
@@ -3920,7 +3920,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
         return -1;
     }
 
-    if (check_error_state(rdma)) {
+    if (rdma_errored(rdma)) {
         return -1;
     }
 
@@ -3974,7 +3974,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
                     "Your QEMU command line parameters are probably "
                     "not identical on both the source and destination.",
                     local->nb_blocks, nb_dest_blocks);
-            rdma->error_state = -EINVAL;
+            rdma->errored = true;
             return -1;
         }
 
@@ -3990,7 +3990,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
                         "vs %" PRIu64, local->block[i].block_name, i,
                         local->block[i].length,
                         rdma->dest_blocks[i].length);
-                rdma->error_state = -EINVAL;
+                rdma->errored = true;
                 return -1;
             }
             local->block[i].remote_host_addr =
@@ -4010,7 +4010,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
 
     return 0;
 err:
-    rdma->error_state = ret;
+    rdma->errored = true;
     return -1;
 }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 28/53] migration/rdma: Drop superfluous assignments to @ret
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (26 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 27/53] migration/rdma: Replace int error_state by bool errored Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 16:27   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 29/53] migration/rdma: Check negative error values the same way everywhere Markus Armbruster
                   ` (25 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 35 ++++++++++-------------------------
 1 file changed, 10 insertions(+), 25 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index d92be4869a..2af9395696 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1530,7 +1530,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
                                        struct ibv_comp_channel *comp_channel)
 {
     struct rdma_cm_event *cm_event;
-    int ret = -1;
+    int ret;
 
     /*
      * Coroutine doesn't start until migration_fd_process_incoming()
@@ -1635,7 +1635,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
                                     uint64_t wrid_requested,
                                     uint32_t *byte_len)
 {
-    int num_cq_events = 0, ret = 0;
+    int num_cq_events = 0, ret;
     struct ibv_cq *cq;
     void *cq_ctx;
     uint64_t wr_id = RDMA_WRID_NONE, wr_id_in;
@@ -1685,8 +1685,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
 
         num_cq_events++;
 
-        ret = -ibv_req_notify_cq(cq, 0);
-        if (ret) {
+        if (ibv_req_notify_cq(cq, 0)) {
             goto err_block_for_wrid;
         }
 
@@ -1733,7 +1732,7 @@ err_block_for_wrid:
 static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
                                        RDMAControlHeader *head)
 {
-    int ret = 0;
+    int ret;
     RDMAWorkRequestData *wr = &rdma->wr_data[RDMA_WRID_CONTROL];
     struct ibv_send_wr *bad_wr;
     struct ibv_sge sge = {
@@ -1890,7 +1889,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    int *resp_idx,
                                    int (*callback)(RDMAContext *rdma))
 {
-    int ret = 0;
+    int ret;
 
     /*
      * Wait until the dest is ready before attempting to deliver the message
@@ -2871,7 +2870,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
     RDMAContext *rdma;
     RDMAControlHeader head;
-    int ret = 0;
+    int ret;
     ssize_t done = 0;
     size_t i, len;
 
@@ -3371,7 +3370,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     RDMAContext *rdma_return_path = NULL;
     struct rdma_cm_event *cm_event;
     struct ibv_context *verbs;
-    int ret = -EINVAL;
+    int ret;
     int idx;
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
@@ -3381,7 +3380,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 
     if (cm_event->event != RDMA_CM_EVENT_CONNECT_REQUEST) {
         rdma_ack_cm_event(cm_event);
-        ret = -1;
         goto err_rdma_dest_wait;
     }
 
@@ -3394,7 +3392,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
         rdma_return_path = qemu_rdma_data_init(rdma->host_port, NULL);
         if (rdma_return_path == NULL) {
             rdma_ack_cm_event(cm_event);
-            ret = -1;
             goto err_rdma_dest_wait;
         }
 
@@ -3409,7 +3406,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
         error_report("Unknown source RDMA version: %d, bailing...",
                      cap.version);
         rdma_ack_cm_event(cm_event);
-        ret = -1;
         goto err_rdma_dest_wait;
     }
 
@@ -3442,7 +3438,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     } else if (rdma->verbs != verbs) {
         error_report("ibv context not matching %p, %p!", rdma->verbs,
                      verbs);
-        ret = -1;
         goto err_rdma_dest_wait;
     }
 
@@ -3496,7 +3491,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     if (cm_event->event != RDMA_CM_EVENT_ESTABLISHED) {
         error_report("rdma_accept not event established");
         rdma_ack_cm_event(cm_event);
-        ret = -1;
         goto err_rdma_dest_wait;
     }
 
@@ -3559,7 +3553,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
     static RDMARegisterResult results[RDMA_CONTROL_MAX_COMMANDS_PER_MESSAGE];
     RDMALocalBlock *block;
     void *host_addr;
-    int ret = 0;
+    int ret;
     int idx = 0;
     int count = 0;
     int i = 0;
@@ -3588,7 +3582,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
         if (head.repeat > RDMA_CONTROL_MAX_COMMANDS_PER_MESSAGE) {
             error_report("rdma: Too many requests in this message (%d)."
                             "Bailing.", head.repeat);
-            ret = -EIO;
             break;
         }
 
@@ -3604,7 +3597,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                 error_report("rdma: 'compress' bad block index %u (vs %d)",
                              (unsigned int)comp->block_idx,
                              rdma->local_ram_blocks.nb_blocks);
-                ret = -EIO;
                 goto err;
             }
             block = &(rdma->local_ram_blocks.block[comp->block_idx]);
@@ -3703,7 +3695,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                     error_report("rdma: 'register' bad block index %u (vs %d)",
                                  (unsigned int)reg->current_index,
                                  rdma->local_ram_blocks.nb_blocks);
-                    ret = -ENOENT;
                     goto err;
                 }
                 block = &(rdma->local_ram_blocks.block[reg->current_index]);
@@ -3713,7 +3704,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                             " offset: %" PRIx64 " current_addr: %" PRIx64,
                             block->block_name, block->offset,
                             reg->key.current_addr);
-                        ret = -ERANGE;
                         goto err;
                     }
                     host_addr = (block->local_host_addr +
@@ -3729,7 +3719,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                         error_report("rdma: bad chunk for block %s"
                             " chunk: %" PRIx64,
                             block->block_name, reg->key.chunk);
-                        ret = -ERANGE;
                         goto err;
                     }
                 }
@@ -3741,7 +3730,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                             (uintptr_t)host_addr, NULL, &tmp_rkey,
                             chunk, chunk_start, chunk_end)) {
                     error_report("cannot get rkey");
-                    ret = -EINVAL;
                     goto err;
                 }
                 reg_result->rkey = tmp_rkey;
@@ -3781,7 +3769,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
                 if (ret != 0) {
                     perror("rdma unregistration chunk failed");
-                    ret = -ret;
                     goto err;
                 }
 
@@ -3800,11 +3787,9 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
             break;
         case RDMA_CONTROL_REGISTER_RESULT:
             error_report("Invalid RESULT message at dest.");
-            ret = -EIO;
             goto err;
         default:
             error_report("Unknown control message %s", control_desc(head.type));
-            ret = -EIO;
             goto err;
         }
     } while (1);
@@ -3908,7 +3893,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
     RDMAContext *rdma;
     RDMAControlHeader head = { .len = 0, .repeat = 1 };
-    int ret = 0;
+    int ret;
 
     if (migration_in_postcopy()) {
         return 0;
@@ -4182,7 +4167,7 @@ void rdma_start_outgoing_migration(void *opaque,
     MigrationState *s = opaque;
     RDMAContext *rdma_return_path = NULL;
     RDMAContext *rdma;
-    int ret = 0;
+    int ret;
 
     /* Avoid ram_block_discard_disable(), cannot change during migration. */
     if (ram_block_discard_is_required()) {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 29/53] migration/rdma: Check negative error values the same way everywhere
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (27 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 28/53] migration/rdma: Drop superfluous assignments to @ret Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-09-29 15:28   ` Fabiano Rosas
  2023-10-04 16:33   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 30/53] migration/rdma: Plug a memory leak and improve a message Markus Armbruster
                   ` (24 subsequent siblings)
  53 siblings, 2 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

When a function returns 0 on success, negative value on error,
checking for non-zero suffices, but checking for negative is clearer.
So do that.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 82 ++++++++++++++++++++++++------------------------
 1 file changed, 41 insertions(+), 41 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 2af9395696..c57692e5a3 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -953,7 +953,7 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
 
     /* create CM id */
     ret = rdma_create_id(rdma->channel, &rdma->cm_id, NULL, RDMA_PS_TCP);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "could not create channel id");
         goto err_resolve_create_id;
     }
@@ -974,10 +974,10 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
 
         ret = rdma_resolve_addr(rdma->cm_id, NULL, e->ai_dst_addr,
                 RDMA_RESOLVE_TIMEOUT_MS);
-        if (!ret) {
+        if (ret >= 0) {
             if (e->ai_family == AF_INET6) {
                 ret = qemu_rdma_broken_ipv6_kernel(rdma->cm_id->verbs, errp);
-                if (ret) {
+                if (ret < 0) {
                     continue;
                 }
             }
@@ -994,7 +994,7 @@ route:
     qemu_rdma_dump_gid("source_resolve_addr", rdma->cm_id);
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "could not perform event_addr_resolved");
         goto err_resolve_get_addr;
     }
@@ -1010,13 +1010,13 @@ route:
 
     /* resolve route */
     ret = rdma_resolve_route(rdma->cm_id, RDMA_RESOLVE_TIMEOUT_MS);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "could not resolve rdma route");
         goto err_resolve_get_addr;
     }
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "could not perform event_route_resolved");
         goto err_resolve_get_addr;
     }
@@ -1124,7 +1124,7 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma)
     attr.qp_type = IBV_QPT_RC;
 
     ret = rdma_create_qp(rdma->cm_id, rdma->pd, &attr);
-    if (ret) {
+    if (ret < 0) {
         return -1;
     }
 
@@ -1567,7 +1567,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
 
                 if (pfds[1].revents) {
                     ret = rdma_get_cm_event(rdma->channel, &cm_event);
-                    if (ret) {
+                    if (ret < 0) {
                         error_report("failed to get cm event while wait "
                                      "completion channel");
                         return -1;
@@ -1668,12 +1668,12 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
 
     while (1) {
         ret = qemu_rdma_wait_comp_channel(rdma, ch);
-        if (ret) {
+        if (ret < 0) {
             goto err_block_for_wrid;
         }
 
         ret = ibv_get_cq_event(ch, &cq, &cq_ctx);
-        if (ret) {
+        if (ret < 0) {
             /*
              * FIXME perror() is problematic, because ibv_reg_mr() is
              * not documented to set errno.  Will go away later in
@@ -1909,7 +1909,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
      */
     if (resp) {
         ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_DATA);
-        if (ret) {
+        if (ret < 0) {
             error_report("rdma migration: error posting"
                     " extra control recv for anticipated result!");
             return -1;
@@ -1920,7 +1920,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
      * Post a WR to replace the one we just consumed for the READY message.
      */
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma migration: error posting first control recv!");
         return -1;
     }
@@ -2007,7 +2007,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
      * Post a new RECV work request to replace the one we just consumed.
      */
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma migration: error posting second control recv!");
         return -1;
     }
@@ -2337,7 +2337,7 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
     /* If we cannot merge it, we flush the current buffer first. */
     if (!qemu_rdma_buffer_mergeable(rdma, current_addr, len)) {
         ret = qemu_rdma_write_flush(f, rdma);
-        if (ret) {
+        if (ret < 0) {
             return -1;
         }
         rdma->current_length = 0;
@@ -2467,12 +2467,12 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
     rdma->pin_all = pin_all;
 
     ret = qemu_rdma_resolve_host(rdma, errp);
-    if (ret) {
+    if (ret < 0) {
         goto err_rdma_source_init;
     }
 
     ret = qemu_rdma_alloc_pd_cq(rdma);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "rdma migration: error allocating pd and cq! Your mlock()"
                     " limits may be too low. Please check $ ulimit -a # and "
                     "search for 'ulimit -l' in the output");
@@ -2480,7 +2480,7 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
     }
 
     ret = qemu_rdma_alloc_qp(rdma);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "rdma migration: error allocating qp!");
         goto err_rdma_source_init;
     }
@@ -2497,7 +2497,7 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
 
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
-        if (ret) {
+        if (ret < 0) {
             ERROR(errp, "rdma migration: error registering %d control!",
                                                             idx);
             goto err_rdma_source_init;
@@ -2571,13 +2571,13 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
     caps_to_network(&cap);
 
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "posting second control recv");
         goto err_rdma_source_connect;
     }
 
     ret = rdma_connect(rdma->cm_id, &conn_param);
-    if (ret) {
+    if (ret < 0) {
         perror("rdma_connect");
         ERROR(errp, "connecting to destination!");
         goto err_rdma_source_connect;
@@ -2591,7 +2591,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
             ERROR(errp, "failed to get cm event");
         }
     }
-    if (ret) {
+    if (ret < 0) {
         /*
          * FIXME perror() is wrong, because
          * qemu_get_cm_event_timeout() can fail without setting errno.
@@ -2664,7 +2664,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 
     /* create CM id */
     ret = rdma_create_id(rdma->channel, &listen_id, NULL, RDMA_PS_TCP);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "could not create cm_id!");
         goto err_dest_init_create_listen_id;
     }
@@ -2680,7 +2680,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 
     ret = rdma_set_option(listen_id, RDMA_OPTION_ID, RDMA_OPTION_ID_REUSEADDR,
                           &reuse, sizeof reuse);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "Error: could not set REUSEADDR option");
         goto err_dest_init_bind_addr;
     }
@@ -2689,12 +2689,12 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
             &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof ip);
         trace_qemu_rdma_dest_init_trying(rdma->host, ip);
         ret = rdma_bind_addr(listen_id, e->ai_dst_addr);
-        if (ret) {
+        if (ret < 0) {
             continue;
         }
         if (e->ai_family == AF_INET6) {
             ret = qemu_rdma_broken_ipv6_kernel(listen_id->verbs, errp);
-            if (ret) {
+            if (ret < 0) {
                 continue;
             }
         }
@@ -3334,7 +3334,7 @@ static void rdma_cm_poll_handler(void *opaque)
     MigrationIncomingState *mis = migration_incoming_get_current();
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
-    if (ret) {
+    if (ret < 0) {
         error_report("get_cm_event failed %d", errno);
         return;
     }
@@ -3374,7 +3374,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     int idx;
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
-    if (ret) {
+    if (ret < 0) {
         goto err_rdma_dest_wait;
     }
 
@@ -3444,13 +3444,13 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     qemu_rdma_dump_id("dest_init", verbs);
 
     ret = qemu_rdma_alloc_pd_cq(rdma);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma migration: error allocating pd and cq!");
         goto err_rdma_dest_wait;
     }
 
     ret = qemu_rdma_alloc_qp(rdma);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma migration: error allocating qp!");
         goto err_rdma_dest_wait;
     }
@@ -3459,7 +3459,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
-        if (ret) {
+        if (ret < 0) {
             error_report("rdma: error registering %d control", idx);
             goto err_rdma_dest_wait;
         }
@@ -3477,13 +3477,13 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     }
 
     ret = rdma_accept(rdma->cm_id, &conn_param);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma_accept failed");
         goto err_rdma_dest_wait;
     }
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma_accept get_cm_event failed");
         goto err_rdma_dest_wait;
     }
@@ -3498,7 +3498,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     rdma->connected = true;
 
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma migration: error posting second control recv");
         goto err_rdma_dest_wait;
     }
@@ -3627,7 +3627,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
             if (rdma->pin_all) {
                 ret = qemu_rdma_reg_whole_ram_blocks(rdma);
-                if (ret) {
+                if (ret < 0) {
                     error_report("rdma migration: error dest "
                                     "registering ram blocks");
                     goto err;
@@ -4088,7 +4088,7 @@ static void rdma_accept_incoming_migration(void *opaque)
     trace_qemu_rdma_accept_incoming_migration();
     ret = qemu_rdma_accept(rdma);
 
-    if (ret) {
+    if (ret < 0) {
         fprintf(stderr, "RDMA ERROR: Migration initialization failed\n");
         return;
     }
@@ -4132,7 +4132,7 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
     }
 
     ret = qemu_rdma_dest_init(rdma, errp);
-    if (ret) {
+    if (ret < 0) {
         goto err;
     }
 
@@ -4140,7 +4140,7 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
 
     ret = rdma_listen(rdma->listen_id, 5);
 
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "listening on socket!");
         goto cleanup_rdma;
     }
@@ -4182,14 +4182,14 @@ void rdma_start_outgoing_migration(void *opaque,
 
     ret = qemu_rdma_source_init(rdma, migrate_rdma_pin_all(), errp);
 
-    if (ret) {
+    if (ret < 0) {
         goto err;
     }
 
     trace_rdma_start_outgoing_migration_after_rdma_source_init();
     ret = qemu_rdma_connect(rdma, false, errp);
 
-    if (ret) {
+    if (ret < 0) {
         goto err;
     }
 
@@ -4204,13 +4204,13 @@ void rdma_start_outgoing_migration(void *opaque,
         ret = qemu_rdma_source_init(rdma_return_path,
                                     migrate_rdma_pin_all(), errp);
 
-        if (ret) {
+        if (ret < 0) {
             goto return_path_err;
         }
 
         ret = qemu_rdma_connect(rdma_return_path, true, errp);
 
-        if (ret) {
+        if (ret < 0) {
             goto return_path_err;
         }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 30/53] migration/rdma: Plug a memory leak and improve a message
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (28 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 29/53] migration/rdma: Check negative error values the same way everywhere Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 16:27   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 31/53] migration/rdma: Delete inappropriate error_report() in macro ERROR() Markus Armbruster
                   ` (23 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

When migration capability @rdma-pin-all is true, but the server cannot
honor it, qemu_rdma_connect() calls macro ERROR(), then returns
success.

ERROR() sets an error.  Since qemu_rdma_connect() returns success, its
caller rdma_start_outgoing_migration() duly assumes @errp is still
clear.  The Error object leaks.

ERROR() additionally reports the situation to the user as an error:

    RDMA ERROR: Server cannot support pinning all memory. Will register memory dynamically.

Is this an error or not?  It actually isn't; we disable @rdma-pin-all
and carry on.  "Correcting" the user's configuration decisions that
way feels problematic, but that's a topic for another day.

Replace ERROR() by warn_report().  This plugs the memory leak, and
emits a clearer message to the user.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index c57692e5a3..54f4a917be 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2617,8 +2617,8 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
      * and disable them otherwise.
      */
     if (rdma->pin_all && !(cap.flags & RDMA_CAPABILITY_PIN_ALL)) {
-        ERROR(errp, "Server cannot support pinning all memory. "
-                        "Will register memory dynamically.");
+        warn_report("RDMA: Server cannot support pinning all memory. "
+                    "Will register memory dynamically.");
         rdma->pin_all = false;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 31/53] migration/rdma: Delete inappropriate error_report() in macro ERROR()
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (29 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 30/53] migration/rdma: Plug a memory leak and improve a message Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 16:50   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 32/53] migration/rdma: Retire " Markus Armbruster
                   ` (22 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

Macro ERROR() violates this principle.  Delete the error_report()
there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 54f4a917be..128489e0ce 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -40,12 +40,8 @@
 #include "options.h"
 #include <poll.h>
 
-/*
- * Print and error on both the Monitor and the Log file.
- */
 #define ERROR(errp, fmt, ...) \
     do { \
-        fprintf(stderr, "RDMA ERROR: " fmt "\n", ## __VA_ARGS__); \
         if (errp && (*(errp) == NULL)) { \
             error_setg(errp, "RDMA ERROR: " fmt, ## __VA_ARGS__); \
         } \
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 32/53] migration/rdma: Retire macro ERROR()
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (30 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 31/53] migration/rdma: Delete inappropriate error_report() in macro ERROR() Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 16:50   ` Juan Quintela
  2023-09-28 13:19 ` [PATCH v2 33/53] migration/rdma: Fix error handling around rdma_getaddrinfo() Markus Armbruster
                   ` (21 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

ERROR() has become "error_setg() unless an error has been set
already".  Hiding the conditional in the macro is in the way of
further work.  Replace the macro uses by their expansion, and delete
the macro.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 168 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 120 insertions(+), 48 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 128489e0ce..cbb6822dda 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -40,13 +40,6 @@
 #include "options.h"
 #include <poll.h>
 
-#define ERROR(errp, fmt, ...) \
-    do { \
-        if (errp && (*(errp) == NULL)) { \
-            error_setg(errp, "RDMA ERROR: " fmt, ## __VA_ARGS__); \
-        } \
-    } while (0)
-
 #define RDMA_RESOLVE_TIMEOUT_MS 10000
 
 /* Do not merge data if larger than this. */
@@ -865,7 +858,10 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
 
             if (ibv_query_port(verbs, 1, &port_attr)) {
                 ibv_close_device(verbs);
-                ERROR(errp, "Could not query initial IB port");
+                if (errp && !*errp) {
+                    error_setg(errp,
+                               "RDMA ERROR: Could not query initial IB port");
+                }
                 return -1;
             }
 
@@ -888,9 +884,12 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
                                 " migrate over the IB fabric until the kernel "
                                 " fixes the bug.\n");
             } else {
-                ERROR(errp, "You only have RoCE / iWARP devices in your systems"
-                            " and your management software has specified '[::]'"
-                            ", but IPv6 over RoCE / iWARP is not supported in Linux.");
+                if (errp && !*errp) {
+                    error_setg(errp, "RDMA ERROR: "
+                               "You only have RoCE / iWARP devices in your systems"
+                               " and your management software has specified '[::]'"
+                               ", but IPv6 over RoCE / iWARP is not supported in Linux.");
+                }
                 return -1;
             }
         }
@@ -906,13 +905,18 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
 
     /* IB ports start with 1, not 0 */
     if (ibv_query_port(verbs, 1, &port_attr)) {
-        ERROR(errp, "Could not query initial IB port");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: Could not query initial IB port");
+        }
         return -1;
     }
 
     if (port_attr.link_layer == IBV_LINK_LAYER_ETHERNET) {
-        ERROR(errp, "Linux kernel's RoCE / iWARP does not support IPv6 "
-                    "(but patches on linux-rdma in progress)");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: "
+                       "Linux kernel's RoCE / iWARP does not support IPv6 "
+                       "(but patches on linux-rdma in progress)");
+        }
         return -1;
     }
 
@@ -936,21 +940,27 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
     struct rdma_addrinfo *e;
 
     if (rdma->host == NULL || !strcmp(rdma->host, "")) {
-        ERROR(errp, "RDMA hostname has not been set");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: RDMA hostname has not been set");
+        }
         return -1;
     }
 
     /* create CM channel */
     rdma->channel = rdma_create_event_channel();
     if (!rdma->channel) {
-        ERROR(errp, "could not create CM channel");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not create CM channel");
+        }
         return -1;
     }
 
     /* create CM id */
     ret = rdma_create_id(rdma->channel, &rdma->cm_id, NULL, RDMA_PS_TCP);
     if (ret < 0) {
-        ERROR(errp, "could not create channel id");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not create channel id");
+        }
         goto err_resolve_create_id;
     }
 
@@ -959,7 +969,10 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
 
     ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
     if (ret) {
-        ERROR(errp, "could not rdma_getaddrinfo address %s", rdma->host);
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
+                       rdma->host);
+        }
         goto err_resolve_get_addr;
     }
 
@@ -982,7 +995,10 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
     }
 
     rdma_freeaddrinfo(res);
-    ERROR(errp, "could not resolve address %s", rdma->host);
+    if (errp && !*errp) {
+        error_setg(errp, "RDMA ERROR: could not resolve address %s",
+                   rdma->host);
+    }
     goto err_resolve_get_addr;
 
 route:
@@ -991,13 +1007,18 @@ route:
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
     if (ret < 0) {
-        ERROR(errp, "could not perform event_addr_resolved");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not perform event_addr_resolved");
+        }
         goto err_resolve_get_addr;
     }
 
     if (cm_event->event != RDMA_CM_EVENT_ADDR_RESOLVED) {
-        ERROR(errp, "result not equal to event_addr_resolved %s",
-                rdma_event_str(cm_event->event));
+        if (errp && !*errp) {
+            error_setg(errp,
+                       "RDMA ERROR: result not equal to event_addr_resolved %s",
+                       rdma_event_str(cm_event->event));
+        }
         error_report("rdma_resolve_addr");
         rdma_ack_cm_event(cm_event);
         goto err_resolve_get_addr;
@@ -1007,18 +1028,25 @@ route:
     /* resolve route */
     ret = rdma_resolve_route(rdma->cm_id, RDMA_RESOLVE_TIMEOUT_MS);
     if (ret < 0) {
-        ERROR(errp, "could not resolve rdma route");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not resolve rdma route");
+        }
         goto err_resolve_get_addr;
     }
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
     if (ret < 0) {
-        ERROR(errp, "could not perform event_route_resolved");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not perform event_route_resolved");
+        }
         goto err_resolve_get_addr;
     }
     if (cm_event->event != RDMA_CM_EVENT_ROUTE_RESOLVED) {
-        ERROR(errp, "result not equal to event_route_resolved: %s",
-                        rdma_event_str(cm_event->event));
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: "
+                       "result not equal to event_route_resolved: %s",
+                       rdma_event_str(cm_event->event));
+        }
         rdma_ack_cm_event(cm_event);
         goto err_resolve_get_addr;
     }
@@ -2469,15 +2497,20 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
 
     ret = qemu_rdma_alloc_pd_cq(rdma);
     if (ret < 0) {
-        ERROR(errp, "rdma migration: error allocating pd and cq! Your mlock()"
-                    " limits may be too low. Please check $ ulimit -a # and "
-                    "search for 'ulimit -l' in the output");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: "
+                       "rdma migration: error allocating pd and cq! Your mlock()"
+                       " limits may be too low. Please check $ ulimit -a # and "
+                       "search for 'ulimit -l' in the output");
+        }
         goto err_rdma_source_init;
     }
 
     ret = qemu_rdma_alloc_qp(rdma);
     if (ret < 0) {
-        ERROR(errp, "rdma migration: error allocating qp!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: rdma migration: error allocating qp!");
+        }
         goto err_rdma_source_init;
     }
 
@@ -2494,8 +2527,11 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
         if (ret < 0) {
-            ERROR(errp, "rdma migration: error registering %d control!",
-                                                            idx);
+            if (errp && !*errp) {
+                error_setg(errp,
+                           "RDMA ERROR: rdma migration: error registering %d control!",
+                           idx);
+            }
             goto err_rdma_source_init;
         }
     }
@@ -2523,19 +2559,29 @@ static int qemu_get_cm_event_timeout(RDMAContext *rdma,
     } while (ret < 0 && errno == EINTR);
 
     if (ret == 0) {
-        ERROR(errp, "poll cm event timeout");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: poll cm event timeout");
+        }
         return -1;
     } else if (ret < 0) {
-        ERROR(errp, "failed to poll cm event, errno=%i", errno);
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: failed to poll cm event, errno=%i",
+                       errno);
+        }
         return -1;
     } else if (poll_fd.revents & POLLIN) {
         if (rdma_get_cm_event(rdma->channel, cm_event) < 0) {
-            ERROR(errp, "failed to get cm event");
+            if (errp && !*errp) {
+                error_setg(errp, "RDMA ERROR: failed to get cm event");
+            }
             return -1;
         }
         return 0;
     } else {
-        ERROR(errp, "no POLLIN event, revent=%x", poll_fd.revents);
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: no POLLIN event, revent=%x",
+                       poll_fd.revents);
+        }
         return -1;
     }
 }
@@ -2568,14 +2614,18 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
 
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
     if (ret < 0) {
-        ERROR(errp, "posting second control recv");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: posting second control recv");
+        }
         goto err_rdma_source_connect;
     }
 
     ret = rdma_connect(rdma->cm_id, &conn_param);
     if (ret < 0) {
         perror("rdma_connect");
-        ERROR(errp, "connecting to destination!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: connecting to destination!");
+        }
         goto err_rdma_source_connect;
     }
 
@@ -2584,7 +2634,9 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
     } else {
         ret = rdma_get_cm_event(rdma->channel, &cm_event);
         if (ret < 0) {
-            ERROR(errp, "failed to get cm event");
+            if (errp && !*errp) {
+                error_setg(errp, "RDMA ERROR: failed to get cm event");
+            }
         }
     }
     if (ret < 0) {
@@ -2599,7 +2651,9 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
 
     if (cm_event->event != RDMA_CM_EVENT_ESTABLISHED) {
         error_report("rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect");
-        ERROR(errp, "connecting to destination!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: connecting to destination!");
+        }
         rdma_ack_cm_event(cm_event);
         goto err_rdma_source_connect;
     }
@@ -2646,14 +2700,18 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
     }
 
     if (!rdma->host || !rdma->host[0]) {
-        ERROR(errp, "RDMA host is not set!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: RDMA host is not set!");
+        }
         rdma->errored = true;
         return -1;
     }
     /* create CM channel */
     rdma->channel = rdma_create_event_channel();
     if (!rdma->channel) {
-        ERROR(errp, "could not create rdma event channel");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not create rdma event channel");
+        }
         rdma->errored = true;
         return -1;
     }
@@ -2661,7 +2719,9 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
     /* create CM id */
     ret = rdma_create_id(rdma->channel, &listen_id, NULL, RDMA_PS_TCP);
     if (ret < 0) {
-        ERROR(errp, "could not create cm_id!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not create cm_id!");
+        }
         goto err_dest_init_create_listen_id;
     }
 
@@ -2670,14 +2730,19 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 
     ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
     if (ret) {
-        ERROR(errp, "could not rdma_getaddrinfo address %s", rdma->host);
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
+                       rdma->host);
+        }
         goto err_dest_init_bind_addr;
     }
 
     ret = rdma_set_option(listen_id, RDMA_OPTION_ID, RDMA_OPTION_ID_REUSEADDR,
                           &reuse, sizeof reuse);
     if (ret < 0) {
-        ERROR(errp, "Error: could not set REUSEADDR option");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: Error: could not set REUSEADDR option");
+        }
         goto err_dest_init_bind_addr;
     }
     for (e = res; e != NULL; e = e->ai_next) {
@@ -2699,7 +2764,9 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 
     rdma_freeaddrinfo(res);
     if (!e) {
-        ERROR(errp, "Error: could not rdma_bind_addr!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: Error: could not rdma_bind_addr!");
+        }
         goto err_dest_init_bind_addr;
     }
 
@@ -2751,7 +2818,10 @@ static RDMAContext *qemu_rdma_data_init(const char *host_port, Error **errp)
         rdma->host = g_strdup(addr->host);
         rdma->host_port = g_strdup(host_port);
     } else {
-        ERROR(errp, "bad RDMA migration address '%s'", host_port);
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: bad RDMA migration address '%s'",
+                       host_port);
+        }
         g_free(rdma);
         rdma = NULL;
     }
@@ -4137,7 +4207,9 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
     ret = rdma_listen(rdma->listen_id, 5);
 
     if (ret < 0) {
-        ERROR(errp, "listening on socket!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: listening on socket!");
+        }
         goto cleanup_rdma;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 33/53] migration/rdma: Fix error handling around rdma_getaddrinfo()
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (31 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 32/53] migration/rdma: Retire " Markus Armbruster
@ 2023-09-28 13:19 ` Markus Armbruster
  2023-10-04 16:51   ` Juan Quintela
  2023-09-28 13:20 ` [PATCH v2 34/53] migration/rdma: Drop "@errp is clear" guards around error_setg() Markus Armbruster
                   ` (20 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

qemu_rdma_resolve_host() and qemu_rdma_dest_init() iterate over
addresses to find one that works, holding onto the first Error from
qemu_rdma_broken_ipv6_kernel() for use when no address works.  Issues:

1. If @errp was &error_abort or &error_fatal, we'd terminate instead
   of trying the next address.  Can't actually happen, since no caller
   passes these arguments.

2. When @errp is a pointer to a variable containing NULL, and
   qemu_rdma_broken_ipv6_kernel() fails, the variable no longer
   contains NULL.  Subsequent iterations pass it again, violating
   Error usage rules.  Dangerous, as setting an error would then trip
   error_setv()'s assertion.  Works only because
   qemu_rdma_broken_ipv6_kernel() and the code following the loops
   carefully avoids setting a second error.

3. If qemu_rdma_broken_ipv6_kernel() fails, and then a later iteration
   finds a working address, @errp still holds the first error from
   qemu_rdma_broken_ipv6_kernel().  If we then run into another error,
   we report the qemu_rdma_broken_ipv6_kernel() failure instead.

4. If we don't run into another error, we leak the Error object.

Use a local error variable, and propagate to @errp.  This fixes 3. and
also cleans up 1 and partly 2.

Free this error when we have a working address.  This fixes 4.

Pass the local error variable to qemu_rdma_broken_ipv6_kernel() only
until it fails.  Pass null on any later iterations.  This cleans up
the remainder of 2.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index cbb6822dda..4fec6dbf86 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -932,6 +932,7 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
  */
 static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
 {
+    Error *err = NULL;
     int ret;
     struct rdma_addrinfo *res;
     char port_str[16];
@@ -976,7 +977,10 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
         goto err_resolve_get_addr;
     }
 
+    /* Try all addresses, saving the first error in @err */
     for (e = res; e != NULL; e = e->ai_next) {
+        Error **local_errp = err ? NULL : &err;
+
         inet_ntop(e->ai_family,
             &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof ip);
         trace_qemu_rdma_resolve_host_trying(rdma->host, ip);
@@ -985,17 +989,21 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
                 RDMA_RESOLVE_TIMEOUT_MS);
         if (ret >= 0) {
             if (e->ai_family == AF_INET6) {
-                ret = qemu_rdma_broken_ipv6_kernel(rdma->cm_id->verbs, errp);
+                ret = qemu_rdma_broken_ipv6_kernel(rdma->cm_id->verbs,
+                                                   local_errp);
                 if (ret < 0) {
                     continue;
                 }
             }
+            error_free(err);
             goto route;
         }
     }
 
     rdma_freeaddrinfo(res);
-    if (errp && !*errp) {
+    if (err) {
+        error_propagate(errp, err);
+    } else {
         error_setg(errp, "RDMA ERROR: could not resolve address %s",
                    rdma->host);
     }
@@ -2687,6 +2695,7 @@ err_rdma_source_connect:
 
 static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 {
+    Error *err = NULL;
     int ret, idx;
     struct rdma_cm_id *listen_id;
     char ip[40] = "unknown";
@@ -2745,7 +2754,11 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
         }
         goto err_dest_init_bind_addr;
     }
+
+    /* Try all addresses, saving the first error in @err */
     for (e = res; e != NULL; e = e->ai_next) {
+        Error **local_errp = err ? NULL : &err;
+
         inet_ntop(e->ai_family,
             &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof ip);
         trace_qemu_rdma_dest_init_trying(rdma->host, ip);
@@ -2754,17 +2767,21 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
             continue;
         }
         if (e->ai_family == AF_INET6) {
-            ret = qemu_rdma_broken_ipv6_kernel(listen_id->verbs, errp);
+            ret = qemu_rdma_broken_ipv6_kernel(listen_id->verbs,
+                                               local_errp);
             if (ret < 0) {
                 continue;
             }
         }
+        error_free(err);
         break;
     }
 
     rdma_freeaddrinfo(res);
     if (!e) {
-        if (errp && !*errp) {
+        if (err) {
+            error_propagate(errp, err);
+        } else {
             error_setg(errp, "RDMA ERROR: Error: could not rdma_bind_addr!");
         }
         goto err_dest_init_bind_addr;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 34/53] migration/rdma: Drop "@errp is clear" guards around error_setg()
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (32 preceding siblings ...)
  2023-09-28 13:19 ` [PATCH v2 33/53] migration/rdma: Fix error handling around rdma_getaddrinfo() Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-10-04 16:52   ` Juan Quintela
  2023-09-28 13:20 ` [PATCH v2 35/53] migration/rdma: Convert qemu_rdma_exchange_recv() to Error Markus Armbruster
                   ` (19 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

These guards are all redundant now.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 164 +++++++++++++++--------------------------------
 1 file changed, 51 insertions(+), 113 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 4fec6dbf86..45e55178a8 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -858,10 +858,8 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
 
             if (ibv_query_port(verbs, 1, &port_attr)) {
                 ibv_close_device(verbs);
-                if (errp && !*errp) {
-                    error_setg(errp,
-                               "RDMA ERROR: Could not query initial IB port");
-                }
+                error_setg(errp,
+                           "RDMA ERROR: Could not query initial IB port");
                 return -1;
             }
 
@@ -884,12 +882,10 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
                                 " migrate over the IB fabric until the kernel "
                                 " fixes the bug.\n");
             } else {
-                if (errp && !*errp) {
-                    error_setg(errp, "RDMA ERROR: "
-                               "You only have RoCE / iWARP devices in your systems"
-                               " and your management software has specified '[::]'"
-                               ", but IPv6 over RoCE / iWARP is not supported in Linux.");
-                }
+                error_setg(errp, "RDMA ERROR: "
+                           "You only have RoCE / iWARP devices in your systems"
+                           " and your management software has specified '[::]'"
+                           ", but IPv6 over RoCE / iWARP is not supported in Linux.");
                 return -1;
             }
         }
@@ -905,18 +901,14 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
 
     /* IB ports start with 1, not 0 */
     if (ibv_query_port(verbs, 1, &port_attr)) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: Could not query initial IB port");
-        }
+        error_setg(errp, "RDMA ERROR: Could not query initial IB port");
         return -1;
     }
 
     if (port_attr.link_layer == IBV_LINK_LAYER_ETHERNET) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: "
-                       "Linux kernel's RoCE / iWARP does not support IPv6 "
-                       "(but patches on linux-rdma in progress)");
-        }
+        error_setg(errp, "RDMA ERROR: "
+                   "Linux kernel's RoCE / iWARP does not support IPv6 "
+                   "(but patches on linux-rdma in progress)");
         return -1;
     }
 
@@ -941,27 +933,21 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
     struct rdma_addrinfo *e;
 
     if (rdma->host == NULL || !strcmp(rdma->host, "")) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: RDMA hostname has not been set");
-        }
+        error_setg(errp, "RDMA ERROR: RDMA hostname has not been set");
         return -1;
     }
 
     /* create CM channel */
     rdma->channel = rdma_create_event_channel();
     if (!rdma->channel) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not create CM channel");
-        }
+        error_setg(errp, "RDMA ERROR: could not create CM channel");
         return -1;
     }
 
     /* create CM id */
     ret = rdma_create_id(rdma->channel, &rdma->cm_id, NULL, RDMA_PS_TCP);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not create channel id");
-        }
+        error_setg(errp, "RDMA ERROR: could not create channel id");
         goto err_resolve_create_id;
     }
 
@@ -970,10 +956,8 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
 
     ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
     if (ret) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
-                       rdma->host);
-        }
+        error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
+                   rdma->host);
         goto err_resolve_get_addr;
     }
 
@@ -1015,18 +999,14 @@ route:
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not perform event_addr_resolved");
-        }
+        error_setg(errp, "RDMA ERROR: could not perform event_addr_resolved");
         goto err_resolve_get_addr;
     }
 
     if (cm_event->event != RDMA_CM_EVENT_ADDR_RESOLVED) {
-        if (errp && !*errp) {
-            error_setg(errp,
-                       "RDMA ERROR: result not equal to event_addr_resolved %s",
-                       rdma_event_str(cm_event->event));
-        }
+        error_setg(errp,
+                   "RDMA ERROR: result not equal to event_addr_resolved %s",
+                   rdma_event_str(cm_event->event));
         error_report("rdma_resolve_addr");
         rdma_ack_cm_event(cm_event);
         goto err_resolve_get_addr;
@@ -1036,25 +1016,19 @@ route:
     /* resolve route */
     ret = rdma_resolve_route(rdma->cm_id, RDMA_RESOLVE_TIMEOUT_MS);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not resolve rdma route");
-        }
+        error_setg(errp, "RDMA ERROR: could not resolve rdma route");
         goto err_resolve_get_addr;
     }
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not perform event_route_resolved");
-        }
+        error_setg(errp, "RDMA ERROR: could not perform event_route_resolved");
         goto err_resolve_get_addr;
     }
     if (cm_event->event != RDMA_CM_EVENT_ROUTE_RESOLVED) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: "
-                       "result not equal to event_route_resolved: %s",
-                       rdma_event_str(cm_event->event));
-        }
+        error_setg(errp, "RDMA ERROR: "
+                   "result not equal to event_route_resolved: %s",
+                   rdma_event_str(cm_event->event));
         rdma_ack_cm_event(cm_event);
         goto err_resolve_get_addr;
     }
@@ -2505,20 +2479,16 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
 
     ret = qemu_rdma_alloc_pd_cq(rdma);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: "
-                       "rdma migration: error allocating pd and cq! Your mlock()"
-                       " limits may be too low. Please check $ ulimit -a # and "
-                       "search for 'ulimit -l' in the output");
-        }
+        error_setg(errp, "RDMA ERROR: "
+                   "rdma migration: error allocating pd and cq! Your mlock()"
+                   " limits may be too low. Please check $ ulimit -a # and "
+                   "search for 'ulimit -l' in the output");
         goto err_rdma_source_init;
     }
 
     ret = qemu_rdma_alloc_qp(rdma);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: rdma migration: error allocating qp!");
-        }
+        error_setg(errp, "RDMA ERROR: rdma migration: error allocating qp!");
         goto err_rdma_source_init;
     }
 
@@ -2535,11 +2505,9 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
         if (ret < 0) {
-            if (errp && !*errp) {
-                error_setg(errp,
-                           "RDMA ERROR: rdma migration: error registering %d control!",
-                           idx);
-            }
+            error_setg(errp,
+                       "RDMA ERROR: rdma migration: error registering %d control!",
+                       idx);
             goto err_rdma_source_init;
         }
     }
@@ -2567,29 +2535,21 @@ static int qemu_get_cm_event_timeout(RDMAContext *rdma,
     } while (ret < 0 && errno == EINTR);
 
     if (ret == 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: poll cm event timeout");
-        }
+        error_setg(errp, "RDMA ERROR: poll cm event timeout");
         return -1;
     } else if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: failed to poll cm event, errno=%i",
-                       errno);
-        }
+        error_setg(errp, "RDMA ERROR: failed to poll cm event, errno=%i",
+                   errno);
         return -1;
     } else if (poll_fd.revents & POLLIN) {
         if (rdma_get_cm_event(rdma->channel, cm_event) < 0) {
-            if (errp && !*errp) {
-                error_setg(errp, "RDMA ERROR: failed to get cm event");
-            }
+            error_setg(errp, "RDMA ERROR: failed to get cm event");
             return -1;
         }
         return 0;
     } else {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: no POLLIN event, revent=%x",
-                       poll_fd.revents);
-        }
+        error_setg(errp, "RDMA ERROR: no POLLIN event, revent=%x",
+                   poll_fd.revents);
         return -1;
     }
 }
@@ -2622,18 +2582,14 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
 
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: posting second control recv");
-        }
+        error_setg(errp, "RDMA ERROR: posting second control recv");
         goto err_rdma_source_connect;
     }
 
     ret = rdma_connect(rdma->cm_id, &conn_param);
     if (ret < 0) {
         perror("rdma_connect");
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: connecting to destination!");
-        }
+        error_setg(errp, "RDMA ERROR: connecting to destination!");
         goto err_rdma_source_connect;
     }
 
@@ -2642,9 +2598,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
     } else {
         ret = rdma_get_cm_event(rdma->channel, &cm_event);
         if (ret < 0) {
-            if (errp && !*errp) {
-                error_setg(errp, "RDMA ERROR: failed to get cm event");
-            }
+            error_setg(errp, "RDMA ERROR: failed to get cm event");
         }
     }
     if (ret < 0) {
@@ -2659,9 +2613,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
 
     if (cm_event->event != RDMA_CM_EVENT_ESTABLISHED) {
         error_report("rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect");
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: connecting to destination!");
-        }
+        error_setg(errp, "RDMA ERROR: connecting to destination!");
         rdma_ack_cm_event(cm_event);
         goto err_rdma_source_connect;
     }
@@ -2709,18 +2661,14 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
     }
 
     if (!rdma->host || !rdma->host[0]) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: RDMA host is not set!");
-        }
+        error_setg(errp, "RDMA ERROR: RDMA host is not set!");
         rdma->errored = true;
         return -1;
     }
     /* create CM channel */
     rdma->channel = rdma_create_event_channel();
     if (!rdma->channel) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not create rdma event channel");
-        }
+        error_setg(errp, "RDMA ERROR: could not create rdma event channel");
         rdma->errored = true;
         return -1;
     }
@@ -2728,9 +2676,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
     /* create CM id */
     ret = rdma_create_id(rdma->channel, &listen_id, NULL, RDMA_PS_TCP);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not create cm_id!");
-        }
+        error_setg(errp, "RDMA ERROR: could not create cm_id!");
         goto err_dest_init_create_listen_id;
     }
 
@@ -2739,19 +2685,15 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 
     ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
     if (ret) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
-                       rdma->host);
-        }
+        error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
+                   rdma->host);
         goto err_dest_init_bind_addr;
     }
 
     ret = rdma_set_option(listen_id, RDMA_OPTION_ID, RDMA_OPTION_ID_REUSEADDR,
                           &reuse, sizeof reuse);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: Error: could not set REUSEADDR option");
-        }
+        error_setg(errp, "RDMA ERROR: Error: could not set REUSEADDR option");
         goto err_dest_init_bind_addr;
     }
 
@@ -2835,10 +2777,8 @@ static RDMAContext *qemu_rdma_data_init(const char *host_port, Error **errp)
         rdma->host = g_strdup(addr->host);
         rdma->host_port = g_strdup(host_port);
     } else {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: bad RDMA migration address '%s'",
-                       host_port);
-        }
+        error_setg(errp, "RDMA ERROR: bad RDMA migration address '%s'",
+                   host_port);
         g_free(rdma);
         rdma = NULL;
     }
@@ -4224,9 +4164,7 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
     ret = rdma_listen(rdma->listen_id, 5);
 
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: listening on socket!");
-        }
+        error_setg(errp, "RDMA ERROR: listening on socket!");
         goto cleanup_rdma;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 35/53] migration/rdma: Convert qemu_rdma_exchange_recv() to Error
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (33 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 34/53] migration/rdma: Drop "@errp is clear" guards around error_setg() Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-10-04 16:53   ` Juan Quintela
  2023-09-28 13:20 ` [PATCH v2 36/53] migration/rdma: Convert qemu_rdma_exchange_send() " Markus Armbruster
                   ` (18 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qio_channel_rdma_readv() violates this principle: it calls
error_report() via qemu_rdma_exchange_recv().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up by converting qemu_rdma_exchange_recv() to Error.

Necessitates setting an error when qemu_rdma_exchange_get_response()
failed.  Since this error will go away later in this series, simply
use "FIXME temporary error message" there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 45e55178a8..e0101422d9 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1978,7 +1978,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
  * control-channel message.
  */
 static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
-                                   uint32_t expecting)
+                                   uint32_t expecting, Error **errp)
 {
     RDMAControlHeader ready = {
                                 .len = 0,
@@ -1993,7 +1993,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
     ret = qemu_rdma_post_send_control(rdma, NULL, &ready);
 
     if (ret < 0) {
-        error_report("Failed to send control buffer!");
+        error_setg(errp, "Failed to send control buffer!");
         return -1;
     }
 
@@ -2004,6 +2004,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
                                           expecting, RDMA_WRID_READY);
 
     if (ret < 0) {
+        error_setg(errp, "FIXME temporary error message");
         return -1;
     }
 
@@ -2014,7 +2015,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
      */
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
     if (ret < 0) {
-        error_report("rdma migration: error posting second control recv!");
+        error_setg(errp, "rdma migration: error posting second control recv!");
         return -1;
     }
 
@@ -2938,11 +2939,11 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
         /* We've got nothing at all, so lets wait for
          * more to arrive
          */
-        ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_QEMU_FILE);
+        ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_QEMU_FILE,
+                                      errp);
 
         if (ret < 0) {
             rdma->errored = true;
-            error_setg(errp, "qemu_rdma_exchange_recv failed");
             return -1;
         }
 
@@ -3567,6 +3568,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
     RDMAControlHeader blocks = { .type = RDMA_CONTROL_RAM_BLOCKS_RESULT,
                                  .repeat = 1 };
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
+    Error *err = NULL;
     RDMAContext *rdma;
     RDMALocalBlocks *local;
     RDMAControlHeader head;
@@ -3596,9 +3598,10 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
     do {
         trace_qemu_rdma_registration_handle_wait();
 
-        ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_NONE);
+        ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_NONE, &err);
 
         if (ret < 0) {
+            error_report_err(err);
             break;
         }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 36/53] migration/rdma: Convert qemu_rdma_exchange_send() to Error
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (34 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 35/53] migration/rdma: Convert qemu_rdma_exchange_recv() to Error Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-10-04 16:55   ` Juan Quintela
  2023-09-28 13:20 ` [PATCH v2 37/53] migration/rdma: Convert qemu_rdma_exchange_get_response() " Markus Armbruster
                   ` (17 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qio_channel_rdma_writev() violates this principle: it calls
error_report() via qemu_rdma_exchange_send().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up by converting qemu_rdma_exchange_send() to Error.

Necessitates setting an error when qemu_rdma_post_recv_control(),
callback(), or qemu_rdma_exchange_get_response() failed.  Since these
errors will go away later in this series, simply use "FIXME temporary
error message" there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 40 +++++++++++++++++++++++++++-------------
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index e0101422d9..f77bf1d453 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -518,7 +518,8 @@ static void network_to_result(RDMARegisterResult *result)
 static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    uint8_t *data, RDMAControlHeader *resp,
                                    int *resp_idx,
-                                   int (*callback)(RDMAContext *rdma));
+                                   int (*callback)(RDMAContext *rdma),
+                                   Error **errp);
 
 static inline uint64_t ram_chunk_index(const uint8_t *start,
                                        const uint8_t *host)
@@ -1376,6 +1377,8 @@ static int qemu_rdma_reg_control(RDMAContext *rdma, int idx)
  */
 static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
 {
+    Error *err = NULL;
+
     while (rdma->unregistrations[rdma->unregister_current]) {
         int ret;
         uint64_t wr_id = rdma->unregistrations[rdma->unregister_current];
@@ -1438,8 +1441,9 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
         reg.key.chunk = chunk;
         register_to_network(rdma, &reg);
         ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
-                                &resp, NULL, NULL);
+                                      &resp, NULL, NULL, &err);
         if (ret < 0) {
+            error_report_err(err);
             return -1;
         }
 
@@ -1893,7 +1897,8 @@ static void qemu_rdma_move_header(RDMAContext *rdma, int idx,
 static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    uint8_t *data, RDMAControlHeader *resp,
                                    int *resp_idx,
-                                   int (*callback)(RDMAContext *rdma))
+                                   int (*callback)(RDMAContext *rdma),
+                                   Error **errp)
 {
     int ret;
 
@@ -1906,6 +1911,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
         ret = qemu_rdma_exchange_get_response(rdma,
                                     &resp, RDMA_CONTROL_READY, RDMA_WRID_READY);
         if (ret < 0) {
+            error_setg(errp, "FIXME temporary error message");
             return -1;
         }
     }
@@ -1916,7 +1922,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
     if (resp) {
         ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_DATA);
         if (ret < 0) {
-            error_report("rdma migration: error posting"
+            error_setg(errp, "rdma migration: error posting"
                     " extra control recv for anticipated result!");
             return -1;
         }
@@ -1927,7 +1933,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
      */
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
     if (ret < 0) {
-        error_report("rdma migration: error posting first control recv!");
+        error_setg(errp, "rdma migration: error posting first control recv!");
         return -1;
     }
 
@@ -1937,7 +1943,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
     ret = qemu_rdma_post_send_control(rdma, data, head);
 
     if (ret < 0) {
-        error_report("Failed to send control buffer!");
+        error_setg(errp, "Failed to send control buffer!");
         return -1;
     }
 
@@ -1949,6 +1955,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
             trace_qemu_rdma_exchange_send_issue_callback();
             ret = callback(rdma);
             if (ret < 0) {
+                error_setg(errp, "FIXME temporary error message");
                 return -1;
             }
         }
@@ -1958,6 +1965,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                               resp->type, RDMA_WRID_DATA);
 
         if (ret < 0) {
+            error_setg(errp, "FIXME temporary error message");
             return -1;
         }
 
@@ -2032,6 +2040,7 @@ static int qemu_rdma_write_one(QEMUFile *f, RDMAContext *rdma,
                                int current_index, uint64_t current_addr,
                                uint64_t length)
 {
+    Error *err = NULL;
     struct ibv_sge sge;
     struct ibv_send_wr send_wr = { 0 };
     struct ibv_send_wr *bad_wr;
@@ -2117,9 +2126,10 @@ retry:
 
                 compress_to_network(rdma, &comp);
                 ret = qemu_rdma_exchange_send(rdma, &head,
-                                (uint8_t *) &comp, NULL, NULL, NULL);
+                                (uint8_t *) &comp, NULL, NULL, NULL, &err);
 
                 if (ret < 0) {
+                    error_report_err(err);
                     return -1;
                 }
 
@@ -2145,8 +2155,9 @@ retry:
 
             register_to_network(rdma, &reg);
             ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
-                                    &resp, &reg_result_idx, NULL);
+                                    &resp, &reg_result_idx, NULL, &err);
             if (ret < 0) {
+                error_report_err(err);
                 return -1;
             }
 
@@ -2845,11 +2856,11 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
             head.len = len;
             head.type = RDMA_CONTROL_QEMU_FILE;
 
-            ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
+            ret = qemu_rdma_exchange_send(rdma, &head,
+                                          data, NULL, NULL, NULL, errp);
 
             if (ret < 0) {
                 rdma->errored = true;
-                error_setg(errp, "qemu_rdma_exchange_send failed");
                 return -1;
             }
 
@@ -3917,6 +3928,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
                                        uint64_t flags, void *data)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
+    Error *err = NULL;
     RDMAContext *rdma;
     RDMAControlHeader head = { .len = 0, .repeat = 1 };
     int ret;
@@ -3960,9 +3972,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
          */
         ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp,
                     &reg_result_idx, rdma->pin_all ?
-                    qemu_rdma_reg_whole_ram_blocks : NULL);
+                    qemu_rdma_reg_whole_ram_blocks : NULL,
+                    &err);
         if (ret < 0) {
-            fprintf(stderr, "receiving remote info!");
+            error_report_err(err);
             return -1;
         }
 
@@ -4013,9 +4026,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
     trace_qemu_rdma_registration_stop(flags);
 
     head.type = RDMA_CONTROL_REGISTER_FINISHED;
-    ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
+    ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL, &err);
 
     if (ret < 0) {
+        error_report_err(err);
         goto err;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 37/53] migration/rdma: Convert qemu_rdma_exchange_get_response() to Error
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (35 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 36/53] migration/rdma: Convert qemu_rdma_exchange_send() " Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-10-04 16:55   ` Juan Quintela
  2023-09-28 13:20 ` [PATCH v2 38/53] migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() " Markus Armbruster
                   ` (16 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_exchange_send() and qemu_rdma_exchange_recv() violate this
principle: they call error_report() via
qemu_rdma_exchange_get_response().  I elected not to investigate how
callers handle the error, i.e. precise impact is not known.

Clean this up by converting qemu_rdma_exchange_get_response() to
Error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index f77bf1d453..2f6e22e1f2 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1824,14 +1824,15 @@ static int qemu_rdma_post_recv_control(RDMAContext *rdma, int idx)
  * Block and wait for a RECV control channel message to arrive.
  */
 static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
-                RDMAControlHeader *head, uint32_t expecting, int idx)
+                RDMAControlHeader *head, uint32_t expecting, int idx,
+                Error **errp)
 {
     uint32_t byte_len;
     int ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RECV_CONTROL + idx,
                                        &byte_len);
 
     if (ret < 0) {
-        error_report("rdma migration: recv polling control error!");
+        error_setg(errp, "rdma migration: recv polling control error!");
         return -1;
     }
 
@@ -1844,7 +1845,7 @@ static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
         trace_qemu_rdma_exchange_get_response_none(control_desc(head->type),
                                              head->type);
     } else if (head->type != expecting || head->type == RDMA_CONTROL_ERROR) {
-        error_report("Was expecting a %s (%d) control message"
+        error_setg(errp, "Was expecting a %s (%d) control message"
                 ", but got: %s (%d), length: %d",
                 control_desc(expecting), expecting,
                 control_desc(head->type), head->type, head->len);
@@ -1854,11 +1855,12 @@ static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
         return -1;
     }
     if (head->len > RDMA_CONTROL_MAX_BUFFER - sizeof(*head)) {
-        error_report("too long length: %d", head->len);
+        error_setg(errp, "too long length: %d", head->len);
         return -1;
     }
     if (sizeof(*head) + head->len != byte_len) {
-        error_report("Malformed length: %d byte_len %d", head->len, byte_len);
+        error_setg(errp, "Malformed length: %d byte_len %d",
+                   head->len, byte_len);
         return -1;
     }
 
@@ -1908,10 +1910,10 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
      */
     if (rdma->control_ready_expected) {
         RDMAControlHeader resp;
-        ret = qemu_rdma_exchange_get_response(rdma,
-                                    &resp, RDMA_CONTROL_READY, RDMA_WRID_READY);
+        ret = qemu_rdma_exchange_get_response(rdma, &resp,
+                                              RDMA_CONTROL_READY,
+                                              RDMA_WRID_READY, errp);
         if (ret < 0) {
-            error_setg(errp, "FIXME temporary error message");
             return -1;
         }
     }
@@ -1962,10 +1964,10 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
 
         trace_qemu_rdma_exchange_send_waiting(control_desc(resp->type));
         ret = qemu_rdma_exchange_get_response(rdma, resp,
-                                              resp->type, RDMA_WRID_DATA);
+                                              resp->type, RDMA_WRID_DATA,
+                                              errp);
 
         if (ret < 0) {
-            error_setg(errp, "FIXME temporary error message");
             return -1;
         }
 
@@ -2009,10 +2011,9 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
      * Block and wait for the message.
      */
     ret = qemu_rdma_exchange_get_response(rdma, head,
-                                          expecting, RDMA_WRID_READY);
+                                          expecting, RDMA_WRID_READY, errp);
 
     if (ret < 0) {
-        error_setg(errp, "FIXME temporary error message");
         return -1;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 38/53] migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() to Error
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (36 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 37/53] migration/rdma: Convert qemu_rdma_exchange_get_response() " Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-10-04 16:56   ` Juan Quintela
  2023-09-28 13:20 ` [PATCH v2 39/53] migration/rdma: Convert qemu_rdma_write_flush() " Markus Armbruster
                   ` (15 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_exchange_send() violates this principle: it calls
error_report() via callback qemu_rdma_reg_whole_ram_blocks().  I
elected not to investigate how callers handle the error, i.e. precise
impact is not known.

Clean this up by converting the callback to Error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 2f6e22e1f2..fa15b1f6ce 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -518,7 +518,8 @@ static void network_to_result(RDMARegisterResult *result)
 static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    uint8_t *data, RDMAControlHeader *resp,
                                    int *resp_idx,
-                                   int (*callback)(RDMAContext *rdma),
+                                   int (*callback)(RDMAContext *rdma,
+                                                   Error **errp),
                                    Error **errp);
 
 static inline uint64_t ram_chunk_index(const uint8_t *start,
@@ -1177,7 +1178,7 @@ static void qemu_rdma_advise_prefetch_mr(struct ibv_pd *pd, uint64_t addr,
 #endif
 }
 
-static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
+static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma, Error **errp)
 {
     int i;
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
@@ -1217,16 +1218,16 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
         }
 
         if (!local->block[i].mr) {
-            perror("Failed to register local dest ram block!");
-            break;
+            error_setg_errno(errp, errno,
+                             "Failed to register local dest ram block!");
+            goto err;
         }
         rdma->total_registrations++;
     }
 
-    if (i >= local->nb_blocks) {
-        return 0;
-    }
+    return 0;
 
+err:
     for (i--; i >= 0; i--) {
         ibv_dereg_mr(local->block[i].mr);
         local->block[i].mr = NULL;
@@ -1899,7 +1900,8 @@ static void qemu_rdma_move_header(RDMAContext *rdma, int idx,
 static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    uint8_t *data, RDMAControlHeader *resp,
                                    int *resp_idx,
-                                   int (*callback)(RDMAContext *rdma),
+                                   int (*callback)(RDMAContext *rdma,
+                                                   Error **errp),
                                    Error **errp)
 {
     int ret;
@@ -1955,9 +1957,8 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
     if (resp) {
         if (callback) {
             trace_qemu_rdma_exchange_send_issue_callback();
-            ret = callback(rdma);
+            ret = callback(rdma, errp);
             if (ret < 0) {
-                error_setg(errp, "FIXME temporary error message");
                 return -1;
             }
         }
@@ -3664,10 +3665,9 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
             }
 
             if (rdma->pin_all) {
-                ret = qemu_rdma_reg_whole_ram_blocks(rdma);
+                ret = qemu_rdma_reg_whole_ram_blocks(rdma, &err);
                 if (ret < 0) {
-                    error_report("rdma migration: error dest "
-                                    "registering ram blocks");
+                    error_report_err(err);
                     goto err;
                 }
             }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 39/53] migration/rdma: Convert qemu_rdma_write_flush() to Error
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (37 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 38/53] migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() " Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-10-04 16:56   ` Juan Quintela
  2023-09-28 13:20 ` [PATCH v2 40/53] migration/rdma: Convert qemu_rdma_write_one() " Markus Armbruster
                   ` (14 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qio_channel_rdma_writev() violates this principle: it calls
error_report() via qemu_rdma_write_flush().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up by converting qemu_rdma_write_flush() to Error.

Necessitates setting an error when qemu_rdma_write_one() failed.
Since this error will go away later in this series, simply use "FIXME
temporary error message" there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index fa15b1f6ce..feed8712bb 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2264,7 +2264,8 @@ retry:
  * We support sending out multiple chunks at the same time.
  * Not all of them need to get signaled in the completion queue.
  */
-static int qemu_rdma_write_flush(QEMUFile *f, RDMAContext *rdma)
+static int qemu_rdma_write_flush(QEMUFile *f, RDMAContext *rdma,
+                                 Error **errp)
 {
     int ret;
 
@@ -2276,6 +2277,7 @@ static int qemu_rdma_write_flush(QEMUFile *f, RDMAContext *rdma)
             rdma->current_index, rdma->current_addr, rdma->current_length);
 
     if (ret < 0) {
+        error_setg(errp, "FIXME temporary error message");
         return -1;
     }
 
@@ -2349,6 +2351,7 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
                            uint64_t block_offset, uint64_t offset,
                            uint64_t len)
 {
+    Error *err = NULL;
     uint64_t current_addr = block_offset + offset;
     uint64_t index = rdma->current_index;
     uint64_t chunk = rdma->current_chunk;
@@ -2356,8 +2359,9 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
 
     /* If we cannot merge it, we flush the current buffer first. */
     if (!qemu_rdma_buffer_mergeable(rdma, current_addr, len)) {
-        ret = qemu_rdma_write_flush(f, rdma);
+        ret = qemu_rdma_write_flush(f, rdma, &err);
         if (ret < 0) {
+            error_report_err(err);
             return -1;
         }
         rdma->current_length = 0;
@@ -2374,7 +2378,10 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
 
     /* flush it if buffer is too large */
     if (rdma->current_length >= RDMA_MERGE_MAX) {
-        return qemu_rdma_write_flush(f, rdma);
+        if (qemu_rdma_write_flush(f, rdma, &err) < 0) {
+            error_report_err(err);
+            return -1;
+        }
     }
 
     return 0;
@@ -2839,10 +2846,9 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
      * Push out any writes that
      * we're queued up for VM's ram.
      */
-    ret = qemu_rdma_write_flush(f, rdma);
+    ret = qemu_rdma_write_flush(f, rdma, errp);
     if (ret < 0) {
         rdma->errored = true;
-        error_setg(errp, "qemu_rdma_write_flush failed");
         return -1;
     }
 
@@ -2984,9 +2990,11 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
  */
 static int qemu_rdma_drain_cq(QEMUFile *f, RDMAContext *rdma)
 {
+    Error *err = NULL;
     int ret;
 
-    if (qemu_rdma_write_flush(f, rdma) < 0) {
+    if (qemu_rdma_write_flush(f, rdma, &err) < 0) {
+        error_report_err(err);
         return -1;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 40/53] migration/rdma: Convert qemu_rdma_write_one() to Error
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (38 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 39/53] migration/rdma: Convert qemu_rdma_write_flush() " Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-10-04 16:56   ` Juan Quintela
  2023-09-28 13:20 ` [PATCH v2 41/53] migration/rdma: Convert qemu_rdma_write() " Markus Armbruster
                   ` (13 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_write_flush() violates this principle: it calls
error_report() via qemu_rdma_write_one().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up by converting qemu_rdma_write_one() to Error.  Bonus:
resolves a FIXME about problematic use of errno.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 31 ++++++++++++-------------------
 1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index feed8712bb..928d09d177 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2040,9 +2040,8 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
  */
 static int qemu_rdma_write_one(QEMUFile *f, RDMAContext *rdma,
                                int current_index, uint64_t current_addr,
-                               uint64_t length)
+                               uint64_t length, Error **errp)
 {
-    Error *err = NULL;
     struct ibv_sge sge;
     struct ibv_send_wr send_wr = { 0 };
     struct ibv_send_wr *bad_wr;
@@ -2096,7 +2095,7 @@ retry:
         ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL);
 
         if (ret < 0) {
-            error_report("Failed to Wait for previous write to complete "
+            error_setg(errp, "Failed to Wait for previous write to complete "
                     "block %d chunk %" PRIu64
                     " current %" PRIu64 " len %" PRIu64 " %d",
                     current_index, chunk, sge.addr, length, rdma->nb_sent);
@@ -2128,10 +2127,9 @@ retry:
 
                 compress_to_network(rdma, &comp);
                 ret = qemu_rdma_exchange_send(rdma, &head,
-                                (uint8_t *) &comp, NULL, NULL, NULL, &err);
+                                (uint8_t *) &comp, NULL, NULL, NULL, errp);
 
                 if (ret < 0) {
-                    error_report_err(err);
                     return -1;
                 }
 
@@ -2157,9 +2155,8 @@ retry:
 
             register_to_network(rdma, &reg);
             ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
-                                    &resp, &reg_result_idx, NULL, &err);
+                                    &resp, &reg_result_idx, NULL, errp);
             if (ret < 0) {
-                error_report_err(err);
                 return -1;
             }
 
@@ -2167,7 +2164,7 @@ retry:
             if (qemu_rdma_register_and_get_keys(rdma, block, sge.addr,
                                                 &sge.lkey, NULL, chunk,
                                                 chunk_start, chunk_end)) {
-                error_report("cannot get lkey");
+                error_setg(errp, "cannot get lkey");
                 return -1;
             }
 
@@ -2186,7 +2183,7 @@ retry:
             if (qemu_rdma_register_and_get_keys(rdma, block, sge.addr,
                                                 &sge.lkey, NULL, chunk,
                                                 chunk_start, chunk_end)) {
-                error_report("cannot get lkey!");
+                error_setg(errp, "cannot get lkey!");
                 return -1;
             }
         }
@@ -2198,7 +2195,7 @@ retry:
         if (qemu_rdma_register_and_get_keys(rdma, block, sge.addr,
                                                      &sge.lkey, NULL, chunk,
                                                      chunk_start, chunk_end)) {
-            error_report("cannot get lkey!");
+            error_setg(errp, "cannot get lkey!");
             return -1;
         }
     }
@@ -2232,7 +2229,7 @@ retry:
         trace_qemu_rdma_write_one_queue_full();
         ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL);
         if (ret < 0) {
-            error_report("rdma migration: failed to make "
+            error_setg(errp, "rdma migration: failed to make "
                          "room in full send queue!");
             return -1;
         }
@@ -2240,12 +2237,8 @@ retry:
         goto retry;
 
     } else if (ret > 0) {
-        /*
-         * FIXME perror() is problematic, because whether
-         * ibv_post_send() sets errno is unclear.  Will go away later
-         * in this series.
-         */
-        perror("rdma migration: post rdma write failed");
+        error_setg_errno(errp, ret,
+                         "rdma migration: post rdma write failed");
         return -1;
     }
 
@@ -2274,10 +2267,10 @@ static int qemu_rdma_write_flush(QEMUFile *f, RDMAContext *rdma,
     }
 
     ret = qemu_rdma_write_one(f, rdma,
-            rdma->current_index, rdma->current_addr, rdma->current_length);
+            rdma->current_index, rdma->current_addr, rdma->current_length,
+            errp);
 
     if (ret < 0) {
-        error_setg(errp, "FIXME temporary error message");
         return -1;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 41/53] migration/rdma: Convert qemu_rdma_write() to Error
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (39 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 40/53] migration/rdma: Convert qemu_rdma_write_one() " Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-10-04 17:23   ` Juan Quintela
  2023-09-28 13:20 ` [PATCH v2 42/53] migration/rdma: Convert qemu_rdma_post_send_control() " Markus Armbruster
                   ` (12 subsequent siblings)
  53 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Just for consistency with qemu_rdma_write_one() and
qemu_rdma_write_flush(), and for slightly simpler code.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 928d09d177..528f468dfb 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2342,9 +2342,8 @@ static inline bool qemu_rdma_buffer_mergeable(RDMAContext *rdma,
  */
 static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
                            uint64_t block_offset, uint64_t offset,
-                           uint64_t len)
+                           uint64_t len, Error **errp)
 {
-    Error *err = NULL;
     uint64_t current_addr = block_offset + offset;
     uint64_t index = rdma->current_index;
     uint64_t chunk = rdma->current_chunk;
@@ -2352,9 +2351,8 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
 
     /* If we cannot merge it, we flush the current buffer first. */
     if (!qemu_rdma_buffer_mergeable(rdma, current_addr, len)) {
-        ret = qemu_rdma_write_flush(f, rdma, &err);
+        ret = qemu_rdma_write_flush(f, rdma, errp);
         if (ret < 0) {
-            error_report_err(err);
             return -1;
         }
         rdma->current_length = 0;
@@ -2371,10 +2369,7 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
 
     /* flush it if buffer is too large */
     if (rdma->current_length >= RDMA_MERGE_MAX) {
-        if (qemu_rdma_write_flush(f, rdma, &err) < 0) {
-            error_report_err(err);
-            return -1;
-        }
+        return qemu_rdma_write_flush(f, rdma, errp);
     }
 
     return 0;
@@ -3275,6 +3270,7 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
                                   size_t size, uint64_t *bytes_sent)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
+    Error *err = NULL;
     RDMAContext *rdma;
     int ret;
 
@@ -3300,9 +3296,9 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
      * is full, or the page doesn't belong to the current chunk,
      * an actual RDMA write will occur and a new chunk will be formed.
      */
-    ret = qemu_rdma_write(f, rdma, block_offset, offset, size);
+    ret = qemu_rdma_write(f, rdma, block_offset, offset, size, &err);
     if (ret < 0) {
-        error_report("rdma migration: write error");
+        error_report_err(err);
         goto err;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 42/53] migration/rdma: Convert qemu_rdma_post_send_control() to Error
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (40 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 41/53] migration/rdma: Convert qemu_rdma_write() " Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-09-28 13:20 ` [PATCH v2 43/53] migration/rdma: Convert qemu_rdma_post_recv_control() " Markus Armbruster
                   ` (11 subsequent siblings)
  53 siblings, 0 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_exchange_send() violates this principle: it calls
error_report() via qemu_rdma_post_send_control().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up by converting qemu_rdma_post_send_control() to Error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 528f468dfb..ce56ba9b40 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1741,7 +1741,8 @@ err_block_for_wrid:
  * containing some data and block until the post completes.
  */
 static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
-                                       RDMAControlHeader *head)
+                                       RDMAControlHeader *head,
+                                       Error **errp)
 {
     int ret;
     RDMAWorkRequestData *wr = &rdma->wr_data[RDMA_WRID_CONTROL];
@@ -1781,13 +1782,13 @@ static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
     ret = ibv_post_send(rdma->qp, &send_wr, &bad_wr);
 
     if (ret > 0) {
-        error_report("Failed to use post IB SEND for control");
+        error_setg(errp, "Failed to use post IB SEND for control");
         return -1;
     }
 
     ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_SEND_CONTROL, NULL);
     if (ret < 0) {
-        error_report("rdma migration: send polling control error");
+        error_setg(errp, "rdma migration: send polling control error");
         return -1;
     }
 
@@ -1944,10 +1945,9 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
     /*
      * Deliver the control message that was requested.
      */
-    ret = qemu_rdma_post_send_control(rdma, data, head);
+    ret = qemu_rdma_post_send_control(rdma, data, head, errp);
 
     if (ret < 0) {
-        error_setg(errp, "Failed to send control buffer!");
         return -1;
     }
 
@@ -2001,10 +2001,9 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
     /*
      * Inform the source that we're ready to receive a message.
      */
-    ret = qemu_rdma_post_send_control(rdma, NULL, &ready);
+    ret = qemu_rdma_post_send_control(rdma, NULL, &ready, errp);
 
     if (ret < 0) {
-        error_setg(errp, "Failed to send control buffer!");
         return -1;
     }
 
@@ -2377,6 +2376,7 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
 
 static void qemu_rdma_cleanup(RDMAContext *rdma)
 {
+    Error *err = NULL;
     int idx;
 
     if (rdma->cm_id && rdma->connected) {
@@ -2388,7 +2388,9 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
                                        .repeat = 1,
                                      };
             error_report("Early error. Sending error.");
-            qemu_rdma_post_send_control(rdma, NULL, &head);
+            if (qemu_rdma_post_send_control(rdma, NULL, &head, &err) < 0) {
+                error_report_err(err);
+            }
         }
 
         rdma_disconnect(rdma->cm_id);
@@ -3700,10 +3702,11 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
 
             ret = qemu_rdma_post_send_control(rdma,
-                                        (uint8_t *) rdma->dest_blocks, &blocks);
+                                    (uint8_t *) rdma->dest_blocks, &blocks,
+                                    &err);
 
             if (ret < 0) {
-                error_report("rdma migration: error sending remote info");
+                error_report_err(err);
                 goto err;
             }
 
@@ -3778,10 +3781,10 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
             }
 
             ret = qemu_rdma_post_send_control(rdma,
-                            (uint8_t *) results, &reg_resp);
+                            (uint8_t *) results, &reg_resp, &err);
 
             if (ret < 0) {
-                error_report("Failed to send control buffer");
+                error_report_err(err);
                 goto err;
             }
             break;
@@ -3813,10 +3816,10 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                                                        reg->key.chunk);
             }
 
-            ret = qemu_rdma_post_send_control(rdma, NULL, &unreg_resp);
+            ret = qemu_rdma_post_send_control(rdma, NULL, &unreg_resp, &err);
 
             if (ret < 0) {
-                error_report("Failed to send control buffer");
+                error_report_err(err);
                 goto err;
             }
             break;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 43/53] migration/rdma: Convert qemu_rdma_post_recv_control() to Error
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (41 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 42/53] migration/rdma: Convert qemu_rdma_post_send_control() " Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-09-28 13:20 ` [PATCH v2 44/53] migration/rdma: Convert qemu_rdma_alloc_pd_cq() " Markus Armbruster
                   ` (10 subsequent siblings)
  53 siblings, 0 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Just for symmetry with qemu_rdma_post_send_control().  Error messages
lose detail I consider of no use to users.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index ce56ba9b40..336a960006 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1799,7 +1799,8 @@ static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
  * Post a RECV work request in anticipation of some future receipt
  * of data on the control channel.
  */
-static int qemu_rdma_post_recv_control(RDMAContext *rdma, int idx)
+static int qemu_rdma_post_recv_control(RDMAContext *rdma, int idx,
+                                       Error **errp)
 {
     struct ibv_recv_wr *bad_wr;
     struct ibv_sge sge = {
@@ -1816,6 +1817,7 @@ static int qemu_rdma_post_recv_control(RDMAContext *rdma, int idx)
 
 
     if (ibv_post_recv(rdma->qp, &recv_wr, &bad_wr)) {
+        error_setg(errp, "error posting control recv");
         return -1;
     }
 
@@ -1925,10 +1927,8 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
      * If the user is expecting a response, post a WR in anticipation of it.
      */
     if (resp) {
-        ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_DATA);
+        ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_DATA, errp);
         if (ret < 0) {
-            error_setg(errp, "rdma migration: error posting"
-                    " extra control recv for anticipated result!");
             return -1;
         }
     }
@@ -1936,9 +1936,8 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
     /*
      * Post a WR to replace the one we just consumed for the READY message.
      */
-    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
+    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY, errp);
     if (ret < 0) {
-        error_setg(errp, "rdma migration: error posting first control recv!");
         return -1;
     }
 
@@ -2022,9 +2021,8 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
     /*
      * Post a new RECV work request to replace the one we just consumed.
      */
-    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
+    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY, errp);
     if (ret < 0) {
-        error_setg(errp, "rdma migration: error posting second control recv!");
         return -1;
     }
 
@@ -2591,9 +2589,8 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
 
     caps_to_network(&cap);
 
-    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
+    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY, errp);
     if (ret < 0) {
-        error_setg(errp, "RDMA ERROR: posting second control recv");
         goto err_rdma_source_connect;
     }
 
@@ -3397,6 +3394,7 @@ static void rdma_cm_poll_handler(void *opaque)
 
 static int qemu_rdma_accept(RDMAContext *rdma)
 {
+    Error *err = NULL;
     RDMACapabilities cap;
     struct rdma_conn_param conn_param = {
                                             .responder_resources = 2,
@@ -3533,9 +3531,9 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     rdma_ack_cm_event(cm_event);
     rdma->connected = true;
 
-    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
+    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY, &err);
     if (ret < 0) {
-        error_report("rdma migration: error posting second control recv");
+        error_report_err(err);
         goto err_rdma_dest_wait;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 44/53] migration/rdma: Convert qemu_rdma_alloc_pd_cq() to Error
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (42 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 43/53] migration/rdma: Convert qemu_rdma_post_recv_control() " Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-09-28 13:20 ` [PATCH v2 45/53] migration/rdma: Silence qemu_rdma_resolve_host() Markus Armbruster
                   ` (9 subsequent siblings)
  53 siblings, 0 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_source_init() violates this principle: it calls
error_report() via qemu_rdma_alloc_pd_cq().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up by converting qemu_rdma_alloc_pd_cq() to Error.

The conversion loses a piece of advice on one of two failure paths:

    Your mlock() limits may be too low. Please check $ ulimit -a # and search for 'ulimit -l' in the output

Not worth retaining.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 336a960006..44d8202ad0 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1052,19 +1052,19 @@ err_resolve_create_id:
 /*
  * Create protection domain and completion queues
  */
-static int qemu_rdma_alloc_pd_cq(RDMAContext *rdma)
+static int qemu_rdma_alloc_pd_cq(RDMAContext *rdma, Error **errp)
 {
     /* allocate pd */
     rdma->pd = ibv_alloc_pd(rdma->verbs);
     if (!rdma->pd) {
-        error_report("failed to allocate protection domain");
+        error_setg(errp, "failed to allocate protection domain");
         return -1;
     }
 
     /* create receive completion channel */
     rdma->recv_comp_channel = ibv_create_comp_channel(rdma->verbs);
     if (!rdma->recv_comp_channel) {
-        error_report("failed to allocate receive completion channel");
+        error_setg(errp, "failed to allocate receive completion channel");
         goto err_alloc_pd_cq;
     }
 
@@ -1074,21 +1074,21 @@ static int qemu_rdma_alloc_pd_cq(RDMAContext *rdma)
     rdma->recv_cq = ibv_create_cq(rdma->verbs, (RDMA_SIGNALED_SEND_MAX * 3),
                                   NULL, rdma->recv_comp_channel, 0);
     if (!rdma->recv_cq) {
-        error_report("failed to allocate receive completion queue");
+        error_setg(errp, "failed to allocate receive completion queue");
         goto err_alloc_pd_cq;
     }
 
     /* create send completion channel */
     rdma->send_comp_channel = ibv_create_comp_channel(rdma->verbs);
     if (!rdma->send_comp_channel) {
-        error_report("failed to allocate send completion channel");
+        error_setg(errp, "failed to allocate send completion channel");
         goto err_alloc_pd_cq;
     }
 
     rdma->send_cq = ibv_create_cq(rdma->verbs, (RDMA_SIGNALED_SEND_MAX * 3),
                                   NULL, rdma->send_comp_channel, 0);
     if (!rdma->send_cq) {
-        error_report("failed to allocate send completion queue");
+        error_setg(errp, "failed to allocate send completion queue");
         goto err_alloc_pd_cq;
     }
 
@@ -2486,12 +2486,8 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
         goto err_rdma_source_init;
     }
 
-    ret = qemu_rdma_alloc_pd_cq(rdma);
+    ret = qemu_rdma_alloc_pd_cq(rdma, errp);
     if (ret < 0) {
-        error_setg(errp, "RDMA ERROR: "
-                   "rdma migration: error allocating pd and cq! Your mlock()"
-                   " limits may be too low. Please check $ ulimit -a # and "
-                   "search for 'ulimit -l' in the output");
         goto err_rdma_source_init;
     }
 
@@ -3477,9 +3473,9 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 
     qemu_rdma_dump_id("dest_init", verbs);
 
-    ret = qemu_rdma_alloc_pd_cq(rdma);
+    ret = qemu_rdma_alloc_pd_cq(rdma, &err);
     if (ret < 0) {
-        error_report("rdma migration: error allocating pd and cq!");
+        error_report_err(err);
         goto err_rdma_dest_wait;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 45/53] migration/rdma: Silence qemu_rdma_resolve_host()
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (43 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 44/53] migration/rdma: Convert qemu_rdma_alloc_pd_cq() " Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-09-28 13:20 ` [PATCH v2 46/53] migration/rdma: Silence qemu_rdma_connect() Markus Armbruster
                   ` (8 subsequent siblings)
  53 siblings, 0 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_resolve_host() violates this principle: it calls
error_report().

Clean this up: drop error_report().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 44d8202ad0..5e21dfca53 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1009,7 +1009,6 @@ route:
         error_setg(errp,
                    "RDMA ERROR: result not equal to event_addr_resolved %s",
                    rdma_event_str(cm_event->event));
-        error_report("rdma_resolve_addr");
         rdma_ack_cm_event(cm_event);
         goto err_resolve_get_addr;
     }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 46/53] migration/rdma: Silence qemu_rdma_connect()
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (44 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 45/53] migration/rdma: Silence qemu_rdma_resolve_host() Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-09-28 13:20 ` [PATCH v2 47/53] migration/rdma: Silence qemu_rdma_reg_control() Markus Armbruster
                   ` (7 subsequent siblings)
  53 siblings, 0 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_connect() violates this principle: it calls error_report()
and perror().  I elected not to investigate how callers handle the
error, i.e. precise impact is not known.

Clean this up: replace perror() by changing error_setg() to
error_setg_errno(), and drop error_report().  I believe the callers'
error reports suffice then.  If they don't, we need to convert to
Error instead.

Bonus: resolves a FIXME about problematic use of errno.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 5e21dfca53..b85d5e60cb 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2591,8 +2591,8 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
 
     ret = rdma_connect(rdma->cm_id, &conn_param);
     if (ret < 0) {
-        perror("rdma_connect");
-        error_setg(errp, "RDMA ERROR: connecting to destination!");
+        error_setg_errno(errp, errno,
+                         "RDMA ERROR: connecting to destination!");
         goto err_rdma_source_connect;
     }
 
@@ -2601,21 +2601,15 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
     } else {
         ret = rdma_get_cm_event(rdma->channel, &cm_event);
         if (ret < 0) {
-            error_setg(errp, "RDMA ERROR: failed to get cm event");
+            error_setg_errno(errp, errno,
+                             "RDMA ERROR: failed to get cm event");
         }
     }
     if (ret < 0) {
-        /*
-         * FIXME perror() is wrong, because
-         * qemu_get_cm_event_timeout() can fail without setting errno.
-         * Will go away later in this series.
-         */
-        perror("rdma_get_cm_event after rdma_connect");
         goto err_rdma_source_connect;
     }
 
     if (cm_event->event != RDMA_CM_EVENT_ESTABLISHED) {
-        error_report("rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect");
         error_setg(errp, "RDMA ERROR: connecting to destination!");
         rdma_ack_cm_event(cm_event);
         goto err_rdma_source_connect;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 47/53] migration/rdma: Silence qemu_rdma_reg_control()
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (45 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 46/53] migration/rdma: Silence qemu_rdma_connect() Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-09-28 13:20 ` [PATCH v2 48/53] migration/rdma: Don't report received completion events as error Markus Armbruster
                   ` (6 subsequent siblings)
  53 siblings, 0 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_source_init() and qemu_rdma_accept() violate this principle:
they call error_report() via qemu_rdma_reg_control().  I elected not
to investigate how callers handle the error, i.e. precise impact is
not known.

Clean this up by dropping the error reporting from
qemu_rdma_reg_control().  I believe the callers' error reports
suffice.  If they don't, we need to convert to Error instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index b85d5e60cb..1ef1e9f3a5 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1360,7 +1360,6 @@ static int qemu_rdma_reg_control(RDMAContext *rdma, int idx)
         rdma->total_registrations++;
         return 0;
     }
-    error_report("qemu_rdma_reg_control failed");
     return -1;
 }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 48/53] migration/rdma: Don't report received completion events as error
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (46 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 47/53] migration/rdma: Silence qemu_rdma_reg_control() Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-09-28 13:20 ` [PATCH v2 49/53] migration/rdma: Silence qemu_rdma_block_for_wrid() Markus Armbruster
                   ` (5 subsequent siblings)
  53 siblings, 0 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

When qemu_rdma_wait_comp_channel() receives an event from the
completion channel, it reports an error "receive cm event while wait
comp channel,cm event is T", where T is the numeric event type.
However, the function fails only when T is a disconnect or device
removal.  Events other than these two are not actually an error, and
reporting them as an error is wrong.  If we need to report them to the
user, we should use something else, and what to use depends on why we
need to report them to the user.

For now, report this error only when the function actually fails.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 1ef1e9f3a5..f4bb329671 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1582,11 +1582,11 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
                         return -1;
                     }
 
-                    error_report("receive cm event while wait comp channel,"
-                                 "cm event is %d", cm_event->event);
                     if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
                         cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
                         rdma_ack_cm_event(cm_event);
+                        error_report("receive cm event while wait comp channel,"
+                                     "cm event is %d", cm_event->event);
                         return -1;
                     }
                     rdma_ack_cm_event(cm_event);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 49/53] migration/rdma: Silence qemu_rdma_block_for_wrid()
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (47 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 48/53] migration/rdma: Don't report received completion events as error Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-09-28 13:20 ` [PATCH v2 50/53] migration/rdma: Silence qemu_rdma_register_and_get_keys() Markus Armbruster
                   ` (4 subsequent siblings)
  53 siblings, 0 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_post_send_control(), qemu_rdma_exchange_get_response(), and
qemu_rdma_write_one() violate this principle: they call
error_report(), fprintf(stderr, ...), and perror() via
qemu_rdma_block_for_wrid(), qemu_rdma_poll(), and
qemu_rdma_wait_comp_channel().  I elected not to investigate how
callers handle the error, i.e. precise impact is not known.

Clean this up by dropping the error reporting from qemu_rdma_poll(),
qemu_rdma_wait_comp_channel(), and qemu_rdma_block_for_wrid().  I
believe the callers' error reports suffice.  If they don't, we need to
convert to Error instead.

Bonus: resolves a FIXME about problematic use of errno.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index f4bb329671..6c63f9c269 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1483,17 +1483,12 @@ static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
     }
 
     if (ret < 0) {
-        error_report("ibv_poll_cq failed");
         return -1;
     }
 
     wr_id = wc.wr_id & RDMA_WRID_TYPE_MASK;
 
     if (wc.status != IBV_WC_SUCCESS) {
-        fprintf(stderr, "ibv_poll_cq wc.status=%d %s!\n",
-                        wc.status, ibv_wc_status_str(wc.status));
-        fprintf(stderr, "ibv_poll_cq wrid=%" PRIu64 "!\n", wr_id);
-
         return -1;
     }
 
@@ -1577,16 +1572,12 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
                 if (pfds[1].revents) {
                     ret = rdma_get_cm_event(rdma->channel, &cm_event);
                     if (ret < 0) {
-                        error_report("failed to get cm event while wait "
-                                     "completion channel");
                         return -1;
                     }
 
                     if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
                         cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
                         rdma_ack_cm_event(cm_event);
-                        error_report("receive cm event while wait comp channel,"
-                                     "cm event is %d", cm_event->event);
                         return -1;
                     }
                     rdma_ack_cm_event(cm_event);
@@ -1599,7 +1590,6 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
             default: /* Error of some type -
                       * I don't trust errno from qemu_poll_ns
                      */
-                error_report("%s: poll failed", __func__);
                 return -1;
             }
 
@@ -1683,12 +1673,6 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
 
         ret = ibv_get_cq_event(ch, &cq, &cq_ctx);
         if (ret < 0) {
-            /*
-             * FIXME perror() is problematic, because ibv_reg_mr() is
-             * not documented to set errno.  Will go away later in
-             * this series.
-             */
-            perror("ibv_get_cq_event");
             goto err_block_for_wrid;
         }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 50/53] migration/rdma: Silence qemu_rdma_register_and_get_keys()
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (48 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 49/53] migration/rdma: Silence qemu_rdma_block_for_wrid() Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-09-28 13:20 ` [PATCH v2 51/53] migration/rdma: Downgrade qemu_rdma_cleanup() errors to warnings Markus Armbruster
                   ` (3 subsequent siblings)
  53 siblings, 0 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_write_one() violates this principle: it reports errors to
stderr via qemu_rdma_register_and_get_keys().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up: silence qemu_rdma_register_and_get_keys().  I believe
the caller's error reports suffice.  If they don't, we need to convert
to Error instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 migration/rdma.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 6c63f9c269..4e4d818460 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1325,15 +1325,6 @@ static int qemu_rdma_register_and_get_keys(RDMAContext *rdma,
         }
     }
     if (!block->pmr[chunk]) {
-        perror("Failed to register chunk!");
-        fprintf(stderr, "Chunk details: block: %d chunk index %d"
-                        " start %" PRIuPTR " end %" PRIuPTR
-                        " host %" PRIuPTR
-                        " local %" PRIuPTR " registrations: %d\n",
-                        block->index, chunk, (uintptr_t)chunk_start,
-                        (uintptr_t)chunk_end, host_addr,
-                        (uintptr_t)block->local_host_addr,
-                        rdma->total_registrations);
         return -1;
     }
     rdma->total_registrations++;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 51/53] migration/rdma: Downgrade qemu_rdma_cleanup() errors to warnings
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (49 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 50/53] migration/rdma: Silence qemu_rdma_register_and_get_keys() Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-09-29 15:29   ` Fabiano Rosas
                     ` (2 more replies)
  2023-09-28 13:20 ` [PATCH v2 52/53] migration/rdma: Use error_report() & friends instead of stderr Markus Armbruster
                   ` (2 subsequent siblings)
  53 siblings, 3 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_source_init(), qemu_rdma_connect(),
rdma_start_incoming_migration(), and rdma_start_outgoing_migration()
violate this principle: they call error_report() via
qemu_rdma_cleanup().

Moreover, qemu_rdma_cleanup() can't fail.  It is called on error
paths, and QIOChannel close and finalization.  Are the conditions it
reports really errors?  I doubt it.

Downgrade qemu_rdma_cleanup()'s errors to warnings.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 4e4d818460..54b59d12b1 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2358,9 +2358,9 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
                                        .type = RDMA_CONTROL_ERROR,
                                        .repeat = 1,
                                      };
-            error_report("Early error. Sending error.");
+            warn_report("Early error. Sending error.");
             if (qemu_rdma_post_send_control(rdma, NULL, &head, &err) < 0) {
-                error_report_err(err);
+                warn_report_err(err);
             }
         }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 52/53] migration/rdma: Use error_report() & friends instead of stderr
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (50 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 51/53] migration/rdma: Downgrade qemu_rdma_cleanup() errors to warnings Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-09-29 15:36   ` Fabiano Rosas
                     ` (2 more replies)
  2023-09-28 13:20 ` [PATCH v2 53/53] migration/rdma: Replace flawed device detail dump by tracing Markus Armbruster
  2023-10-04 17:52 ` [PATCH v2 00/53] migration/rdma: Error handling fixes Juan Quintela
  53 siblings, 3 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

error_report() obeys -msg, reports the current error location if any,
and reports to the current monitor if any.  Reporting to stderr
directly with fprintf() or perror() is wrong, because it loses all
this.

Fix the offenders.  Bonus: resolves a FIXME about problematic use of
errno.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 44 +++++++++++++++++++++-----------------------
 1 file changed, 21 insertions(+), 23 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 54b59d12b1..dba0802fca 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -877,12 +877,12 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
 
         if (roce_found) {
             if (ib_found) {
-                fprintf(stderr, "WARN: migrations may fail:"
-                                " IPv6 over RoCE / iWARP in linux"
-                                " is broken. But since you appear to have a"
-                                " mixed RoCE / IB environment, be sure to only"
-                                " migrate over the IB fabric until the kernel "
-                                " fixes the bug.\n");
+                warn_report("WARN: migrations may fail:"
+                            " IPv6 over RoCE / iWARP in linux"
+                            " is broken. But since you appear to have a"
+                            " mixed RoCE / IB environment, be sure to only"
+                            " migrate over the IB fabric until the kernel "
+                            " fixes the bug.");
             } else {
                 error_setg(errp, "RDMA ERROR: "
                            "You only have RoCE / iWARP devices in your systems"
@@ -1418,12 +1418,8 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
         block->remote_keys[chunk] = 0;
 
         if (ret != 0) {
-            /*
-             * FIXME perror() is problematic, bcause ibv_dereg_mr() is
-             * not documented to set errno.  Will go away later in
-             * this series.
-             */
-            perror("unregistration chunk failed");
+            error_report("unregistration chunk failed: %s",
+                         strerror(ret));
             return -1;
         }
         rdma->total_registrations--;
@@ -3767,7 +3763,8 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                 block->pmr[reg->key.chunk] = NULL;
 
                 if (ret != 0) {
-                    perror("rdma unregistration chunk failed");
+                    error_report("rdma unregistration chunk failed: %s",
+                                 strerror(errno));
                     goto err;
                 }
 
@@ -3956,10 +3953,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
          */
 
         if (local->nb_blocks != nb_dest_blocks) {
-            fprintf(stderr, "ram blocks mismatch (Number of blocks %d vs %d) "
-                    "Your QEMU command line parameters are probably "
-                    "not identical on both the source and destination.",
-                    local->nb_blocks, nb_dest_blocks);
+            error_report("ram blocks mismatch (Number of blocks %d vs %d)",
+                         local->nb_blocks, nb_dest_blocks);
+            error_printf("Your QEMU command line parameters are probably "
+                         "not identical on both the source and destination.");
             rdma->errored = true;
             return -1;
         }
@@ -3972,10 +3969,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
 
             /* We require that the blocks are in the same order */
             if (rdma->dest_blocks[i].length != local->block[i].length) {
-                fprintf(stderr, "Block %s/%d has a different length %" PRIu64
-                        "vs %" PRIu64, local->block[i].block_name, i,
-                        local->block[i].length,
-                        rdma->dest_blocks[i].length);
+                error_report("Block %s/%d has a different length %" PRIu64
+                             "vs %" PRIu64,
+                             local->block[i].block_name, i,
+                             local->block[i].length,
+                             rdma->dest_blocks[i].length);
                 rdma->errored = true;
                 return -1;
             }
@@ -4091,7 +4089,7 @@ static void rdma_accept_incoming_migration(void *opaque)
     ret = qemu_rdma_accept(rdma);
 
     if (ret < 0) {
-        fprintf(stderr, "RDMA ERROR: Migration initialization failed\n");
+        error_report("RDMA ERROR: Migration initialization failed");
         return;
     }
 
@@ -4103,7 +4101,7 @@ static void rdma_accept_incoming_migration(void *opaque)
 
     f = rdma_new_input(rdma);
     if (f == NULL) {
-        fprintf(stderr, "RDMA ERROR: could not open RDMA for input\n");
+        error_report("RDMA ERROR: could not open RDMA for input");
         qemu_rdma_cleanup(rdma);
         return;
     }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 53/53] migration/rdma: Replace flawed device detail dump by tracing
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (51 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 52/53] migration/rdma: Use error_report() & friends instead of stderr Markus Armbruster
@ 2023-09-28 13:20 ` Markus Armbruster
  2023-09-29 17:05   ` Fabiano Rosas
                     ` (2 more replies)
  2023-10-04 17:52 ` [PATCH v2 00/53] migration/rdma: Error handling fixes Juan Quintela
  53 siblings, 3 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-09-28 13:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, peterx, leobras, farosas, lizhijian, eblake

qemu_rdma_dump_id() dumps RDMA device details to stdout.

rdma_start_outgoing_migration() calls it via qemu_rdma_source_init()
and qemu_rdma_resolve_host() to show source device details.
rdma_start_incoming_migration() arranges its call via
rdma_accept_incoming_migration() and qemu_rdma_accept() to show
destination device details.

Two issues:

1. rdma_start_outgoing_migration() can run in HMP context.  The
   information should arguably go the monitor, not stdout.

2. ibv_query_port() failure is reported as error.  Its callers remain
   unaware of this failure (qemu_rdma_dump_id() can't fail), so
   reporting this to the user as an error is problematic.

Fixable, but the device detail dump is noise, except when
troubleshooting.  Tracing is a better fit.  Similar function
qemu_rdma_dump_id() was converted to tracing in commit
733252deb8b (Tracify migration/rdma.c).

Convert qemu_rdma_dump_id(), too.

While there, touch up qemu_rdma_dump_gid()'s outdated comment.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c       | 23 ++++++++---------------
 migration/trace-events |  2 ++
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index dba0802fca..07aef9a071 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -734,38 +734,31 @@ static void rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
 }
 
 /*
- * Put in the log file which RDMA device was opened and the details
- * associated with that device.
+ * Trace RDMA device open, with device details.
  */
 static void qemu_rdma_dump_id(const char *who, struct ibv_context *verbs)
 {
     struct ibv_port_attr port;
 
     if (ibv_query_port(verbs, 1, &port)) {
-        error_report("Failed to query port information");
+        trace_qemu_rdma_dump_id_failed(who);
         return;
     }
 
-    printf("%s RDMA Device opened: kernel name %s "
-           "uverbs device name %s, "
-           "infiniband_verbs class device path %s, "
-           "infiniband class device path %s, "
-           "transport: (%d) %s\n",
-                who,
+    trace_qemu_rdma_dump_id(who,
                 verbs->device->name,
                 verbs->device->dev_name,
                 verbs->device->dev_path,
                 verbs->device->ibdev_path,
                 port.link_layer,
-                (port.link_layer == IBV_LINK_LAYER_INFINIBAND) ? "Infiniband" :
-                 ((port.link_layer == IBV_LINK_LAYER_ETHERNET)
-                    ? "Ethernet" : "Unknown"));
+                port.link_layer == IBV_LINK_LAYER_INFINIBAND ? "Infiniband"
+                : port.link_layer == IBV_LINK_LAYER_ETHERNET ? "Ethernet"
+                : "Unknown");
 }
 
 /*
- * Put in the log file the RDMA gid addressing information,
- * useful for folks who have trouble understanding the
- * RDMA device hierarchy in the kernel.
+ * Trace RDMA gid addressing information.
+ * Useful for understanding the RDMA device hierarchy in the kernel.
  */
 static void qemu_rdma_dump_gid(const char *who, struct rdma_cm_id *id)
 {
diff --git a/migration/trace-events b/migration/trace-events
index d733107ec6..4ce16ae866 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -213,6 +213,8 @@ qemu_rdma_close(void) ""
 qemu_rdma_connect_pin_all_requested(void) ""
 qemu_rdma_connect_pin_all_outcome(bool pin) "%d"
 qemu_rdma_dest_init_trying(const char *host, const char *ip) "%s => %s"
+qemu_rdma_dump_id_failed(const char *who) "%s RDMA Device opened, but can't query port information"
+qemu_rdma_dump_id(const char *who, const char *name, const char *dev_name, const char *dev_path, const char *ibdev_path, int transport, const char *transport_name) "%s RDMA Device opened: kernel name %s uverbs device name %s, infiniband_verbs class device path %s, infiniband class device path %s, transport: (%d) %s"
 qemu_rdma_dump_gid(const char *who, const char *src, const char *dst) "%s Source GID: %s, Dest GID: %s"
 qemu_rdma_exchange_get_response_start(const char *desc) "CONTROL: %s receiving..."
 qemu_rdma_exchange_get_response_none(const char *desc, int type) "Surprise: got %s (%d)"
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 06/53] migration/rdma: Fix unwanted integer truncation
  2023-09-28 13:19 ` [PATCH v2 06/53] migration/rdma: Fix unwanted integer truncation Markus Armbruster
@ 2023-09-28 14:20   ` Fabiano Rosas
  2023-10-04 14:41   ` Juan Quintela
  2023-10-07  1:53   ` Zhijian Li (Fujitsu)
  2 siblings, 0 replies; 121+ messages in thread
From: Fabiano Rosas @ 2023-09-28 14:20 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel
  Cc: quintela, peterx, leobras, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> writes:

> qio_channel_rdma_readv() assigns the size_t value of qemu_rdma_fill()
> to an int variable before it adds it to @done / subtracts it from
> @want, both size_t.  Truncation when qemu_rdma_fill() copies more than
> INT_MAX bytes.  Seems vanishingly unlikely, but needs fixing all the
> same.
>
> Fixes: 6ddd2d76ca6f (migration: convert RDMA to use QIOChannel interface)
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  migration/rdma.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 4289346617..5f423f66f0 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2852,7 +2852,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>      RDMAControlHeader head;
>      int ret = 0;
>      ssize_t i;
> -    size_t done = 0;
> +    size_t done = 0, len;
>  
>      RCU_READ_LOCK_GUARD();
>      rdma = qatomic_rcu_read(&rioc->rdmain);
> @@ -2873,9 +2873,9 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>           * were given and dish out the bytes until we run
>           * out of bytes.
>           */
> -        ret = qemu_rdma_fill(rdma, data, want, 0);
> -        done += ret;
> -        want -= ret;
> +        len = qemu_rdma_fill(rdma, data, want, 0);
> +        done += len;
> +        want -= len;
>          /* Got what we needed, so go to next iovec */
>          if (want == 0) {
>              continue;
> @@ -2902,9 +2902,9 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>          /*
>           * SEND was received with new bytes, now try again.
>           */
> -        ret = qemu_rdma_fill(rdma, data, want, 0);
> -        done += ret;
> -        want -= ret;
> +        len = qemu_rdma_fill(rdma, data, want, 0);
> +        done += len;
> +        want -= len;
>  
>          /* Still didn't get enough, so lets just return */
>          if (want) {

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 16/53] migration/rdma: Fix or document problematic uses of errno
  2023-09-28 13:19 ` [PATCH v2 16/53] migration/rdma: Fix or document problematic uses of errno Markus Armbruster
@ 2023-09-29 15:09   ` Fabiano Rosas
  2023-10-04 11:12     ` Markus Armbruster
  2023-10-05  6:46   ` Juan Quintela
  2023-10-07  5:34   ` Zhijian Li (Fujitsu)
  2 siblings, 1 reply; 121+ messages in thread
From: Fabiano Rosas @ 2023-09-29 15:09 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel
  Cc: quintela, peterx, leobras, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> writes:

> We use errno after calling Libibverbs functions that are not
> documented to set errno (manual page does not mention errno), or where
> the documentation is unclear ("returns [...] the value of errno on
> failure").  While this could be read as "sets errno and returns it",
> a glance at the source code[*] kills that hope:
>
>     static inline int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr,
>                                     struct ibv_send_wr **bad_wr)
>     {
>             return qp->context->ops.post_send(qp, wr, bad_wr);
>     }
>
> The callback can be
>
>     static int mana_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
>                               struct ibv_send_wr **bad)
>     {
>             /* This version of driver supports RAW QP only.
>              * Posting WR is done directly in the application.
>              */
>             return EOPNOTSUPP;
>     }
>
> Neither of them touches errno.
>
> One of these errno uses is easy to fix, so do that now.  Several more
> will go away later in the series; add temporary FIXME commments.
> Three will remain; add TODO comments.  TODO, not FIXME, because the
> bug might be in Libibverbs documentation.
>
> [*] https://github.com/linux-rdma/rdma-core.git
>     commit 55fa316b4b18f258d8ac1ceb4aa5a7a35b094dcf
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  migration/rdma.c | 45 +++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 39 insertions(+), 6 deletions(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 28097ce604..bba8c99fa9 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -853,6 +853,12 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
>  
>          for (x = 0; x < num_devices; x++) {
>              verbs = ibv_open_device(dev_list[x]);
> +            /*
> +             * ibv_open_device() is not documented to set errno.  If
> +             * it does, it's somebody else's doc bug.  If it doesn't,
> +             * the use of errno below is wrong.
> +             * TODO Find out whether ibv_open_device() sets errno.
> +             */
>              if (!verbs) {
>                  if (errno == EPERM) {
>                      continue;

This function can call into glibc, so it's not unreasonable to expect
errno to be set at some point. We're not relying on errno to be set,
just taking an action if it happens to be.

I don't think someone would just decide to handle EPERM at this point
for no reason. Specially since the manual makes no mention to
errno. This was probably introduced after someone got bit by it.

... indeed the commit 5b61d57521 ("rdma: Fix qemu crash when IPv6
address is used for migration") introduced this to fix a crash.



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 22/53] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port
  2023-09-28 13:19 ` [PATCH v2 22/53] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port Markus Armbruster
@ 2023-09-29 15:10   ` Fabiano Rosas
  2023-10-04 15:24   ` Juan Quintela
  2023-10-07  5:36   ` Zhijian Li (Fujitsu)
  2 siblings, 0 replies; 121+ messages in thread
From: Fabiano Rosas @ 2023-09-29 15:10 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel
  Cc: quintela, peterx, leobras, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> writes:

> qemu_rdma_data_init() neglects to set an Error when it fails because
> @host_port is null.  Fortunately, no caller passes null, so this is
> merely a latent bug.  Drop the flawed code handling null argument.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 29/53] migration/rdma: Check negative error values the same way everywhere
  2023-09-28 13:19 ` [PATCH v2 29/53] migration/rdma: Check negative error values the same way everywhere Markus Armbruster
@ 2023-09-29 15:28   ` Fabiano Rosas
  2023-10-04 16:33   ` Juan Quintela
  1 sibling, 0 replies; 121+ messages in thread
From: Fabiano Rosas @ 2023-09-29 15:28 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel
  Cc: quintela, peterx, leobras, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> writes:

> When a function returns 0 on success, negative value on error,
> checking for non-zero suffices, but checking for negative is clearer.
> So do that.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 51/53] migration/rdma: Downgrade qemu_rdma_cleanup() errors to warnings
  2023-09-28 13:20 ` [PATCH v2 51/53] migration/rdma: Downgrade qemu_rdma_cleanup() errors to warnings Markus Armbruster
@ 2023-09-29 15:29   ` Fabiano Rosas
  2023-10-04 17:47   ` Juan Quintela
  2023-10-07  3:50   ` Zhijian Li (Fujitsu)
  2 siblings, 0 replies; 121+ messages in thread
From: Fabiano Rosas @ 2023-09-29 15:29 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel
  Cc: quintela, peterx, leobras, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> writes:

> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
>
> qemu_rdma_source_init(), qemu_rdma_connect(),
> rdma_start_incoming_migration(), and rdma_start_outgoing_migration()
> violate this principle: they call error_report() via
> qemu_rdma_cleanup().
>
> Moreover, qemu_rdma_cleanup() can't fail.  It is called on error
> paths, and QIOChannel close and finalization.  Are the conditions it
> reports really errors?  I doubt it.
>
> Downgrade qemu_rdma_cleanup()'s errors to warnings.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 52/53] migration/rdma: Use error_report() & friends instead of stderr
  2023-09-28 13:20 ` [PATCH v2 52/53] migration/rdma: Use error_report() & friends instead of stderr Markus Armbruster
@ 2023-09-29 15:36   ` Fabiano Rosas
  2023-10-04 11:15     ` Markus Armbruster
  2023-10-05  7:24   ` Juan Quintela
  2023-10-07  3:56   ` Zhijian Li (Fujitsu)
  2 siblings, 1 reply; 121+ messages in thread
From: Fabiano Rosas @ 2023-09-29 15:36 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel
  Cc: quintela, peterx, leobras, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> writes:

> error_report() obeys -msg, reports the current error location if any,
> and reports to the current monitor if any.  Reporting to stderr
> directly with fprintf() or perror() is wrong, because it loses all
> this.
>
> Fix the offenders.  Bonus: resolves a FIXME about problematic use of
> errno.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  migration/rdma.c | 44 +++++++++++++++++++++-----------------------
>  1 file changed, 21 insertions(+), 23 deletions(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 54b59d12b1..dba0802fca 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -877,12 +877,12 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
>  
>          if (roce_found) {
>              if (ib_found) {
> -                fprintf(stderr, "WARN: migrations may fail:"
> -                                " IPv6 over RoCE / iWARP in linux"
> -                                " is broken. But since you appear to have a"
> -                                " mixed RoCE / IB environment, be sure to only"
> -                                " migrate over the IB fabric until the kernel "
> -                                " fixes the bug.\n");
> +                warn_report("WARN: migrations may fail:"
> +                            " IPv6 over RoCE / iWARP in linux"
> +                            " is broken. But since you appear to have a"
> +                            " mixed RoCE / IB environment, be sure to only"
> +                            " migrate over the IB fabric until the kernel "
> +                            " fixes the bug.");

Won't this become "warning: WARN:"?

>              } else {
>                  error_setg(errp, "RDMA ERROR: "
>                             "You only have RoCE / iWARP devices in your systems"
> @@ -1418,12 +1418,8 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
>          block->remote_keys[chunk] = 0;
>  
>          if (ret != 0) {
> -            /*
> -             * FIXME perror() is problematic, bcause ibv_dereg_mr() is
> -             * not documented to set errno.  Will go away later in
> -             * this series.
> -             */
> -            perror("unregistration chunk failed");
> +            error_report("unregistration chunk failed: %s",
> +                         strerror(ret));

Doesn't seem to fix the issue, ret might still not be an errno. Am I
missing something?

>              return -1;
>          }


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 53/53] migration/rdma: Replace flawed device detail dump by tracing
  2023-09-28 13:20 ` [PATCH v2 53/53] migration/rdma: Replace flawed device detail dump by tracing Markus Armbruster
@ 2023-09-29 17:05   ` Fabiano Rosas
  2023-10-04 17:50   ` Juan Quintela
  2023-10-07  3:57   ` Zhijian Li (Fujitsu)
  2 siblings, 0 replies; 121+ messages in thread
From: Fabiano Rosas @ 2023-09-29 17:05 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel
  Cc: quintela, peterx, leobras, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> writes:

> qemu_rdma_dump_id() dumps RDMA device details to stdout.
>
> rdma_start_outgoing_migration() calls it via qemu_rdma_source_init()
> and qemu_rdma_resolve_host() to show source device details.
> rdma_start_incoming_migration() arranges its call via
> rdma_accept_incoming_migration() and qemu_rdma_accept() to show
> destination device details.
>
> Two issues:
>
> 1. rdma_start_outgoing_migration() can run in HMP context.  The
>    information should arguably go the monitor, not stdout.
>
> 2. ibv_query_port() failure is reported as error.  Its callers remain
>    unaware of this failure (qemu_rdma_dump_id() can't fail), so
>    reporting this to the user as an error is problematic.
>
> Fixable, but the device detail dump is noise, except when
> troubleshooting.  Tracing is a better fit.  Similar function
> qemu_rdma_dump_id() was converted to tracing in commit
> 733252deb8b (Tracify migration/rdma.c).
>
> Convert qemu_rdma_dump_id(), too.
>
> While there, touch up qemu_rdma_dump_gid()'s outdated comment.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 16/53] migration/rdma: Fix or document problematic uses of errno
  2023-09-29 15:09   ` Fabiano Rosas
@ 2023-10-04 11:12     ` Markus Armbruster
  0 siblings, 0 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-10-04 11:12 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, quintela, peterx, leobras, lizhijian, eblake

Fabiano Rosas <farosas@suse.de> writes:

> Markus Armbruster <armbru@redhat.com> writes:
>
>> We use errno after calling Libibverbs functions that are not
>> documented to set errno (manual page does not mention errno), or where
>> the documentation is unclear ("returns [...] the value of errno on
>> failure").  While this could be read as "sets errno and returns it",
>> a glance at the source code[*] kills that hope:
>>
>>     static inline int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr,
>>                                     struct ibv_send_wr **bad_wr)
>>     {
>>             return qp->context->ops.post_send(qp, wr, bad_wr);
>>     }
>>
>> The callback can be
>>
>>     static int mana_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
>>                               struct ibv_send_wr **bad)
>>     {
>>             /* This version of driver supports RAW QP only.
>>              * Posting WR is done directly in the application.
>>              */
>>             return EOPNOTSUPP;
>>     }
>>
>> Neither of them touches errno.
>>
>> One of these errno uses is easy to fix, so do that now.  Several more
>> will go away later in the series; add temporary FIXME commments.
>> Three will remain; add TODO comments.  TODO, not FIXME, because the
>> bug might be in Libibverbs documentation.
>>
>> [*] https://github.com/linux-rdma/rdma-core.git
>>     commit 55fa316b4b18f258d8ac1ceb4aa5a7a35b094dcf
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  migration/rdma.c | 45 +++++++++++++++++++++++++++++++++++++++------
>>  1 file changed, 39 insertions(+), 6 deletions(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index 28097ce604..bba8c99fa9 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -853,6 +853,12 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
>>  
>>          for (x = 0; x < num_devices; x++) {
>>              verbs = ibv_open_device(dev_list[x]);
>> +            /*
>> +             * ibv_open_device() is not documented to set errno.  If
>> +             * it does, it's somebody else's doc bug.  If it doesn't,
>> +             * the use of errno below is wrong.
>> +             * TODO Find out whether ibv_open_device() sets errno.
>> +             */
>>              if (!verbs) {
>>                  if (errno == EPERM) {
>>                      continue;
>
> This function can call into glibc, so it's not unreasonable to expect
> errno to be set at some point. We're not relying on errno to be set,
> just taking an action if it happens to be.

errno may well be set on some failures.  But it needs to be set on *all*
failures to be reliable.  If it's not, then its value on such failures
comes from some unrelated, prior errno-setting failure.

> I don't think someone would just decide to handle EPERM at this point
> for no reason. Specially since the manual makes no mention to
> errno. This was probably introduced after someone got bit by it.
>
> ... indeed the commit 5b61d57521 ("rdma: Fix qemu crash when IPv6
> address is used for migration") introduced this to fix a crash.

I don't doubt the error recovery added there works in the case described
by the commit message.  I suspect it can break on unrelated failures.



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 52/53] migration/rdma: Use error_report() & friends instead of stderr
  2023-09-29 15:36   ` Fabiano Rosas
@ 2023-10-04 11:15     ` Markus Armbruster
  2023-10-04 13:52       ` Fabiano Rosas
  0 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-10-04 11:15 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: Markus Armbruster, qemu-devel, quintela, peterx, leobras,
	lizhijian, eblake

Fabiano Rosas <farosas@suse.de> writes:

> Markus Armbruster <armbru@redhat.com> writes:
>
>> error_report() obeys -msg, reports the current error location if any,
>> and reports to the current monitor if any.  Reporting to stderr
>> directly with fprintf() or perror() is wrong, because it loses all
>> this.
>>
>> Fix the offenders.  Bonus: resolves a FIXME about problematic use of
>> errno.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  migration/rdma.c | 44 +++++++++++++++++++++-----------------------
>>  1 file changed, 21 insertions(+), 23 deletions(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index 54b59d12b1..dba0802fca 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -877,12 +877,12 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
>>  
>>          if (roce_found) {
>>              if (ib_found) {
>> -                fprintf(stderr, "WARN: migrations may fail:"
>> -                                " IPv6 over RoCE / iWARP in linux"
>> -                                " is broken. But since you appear to have a"
>> -                                " mixed RoCE / IB environment, be sure to only"
>> -                                " migrate over the IB fabric until the kernel "
>> -                                " fixes the bug.\n");
>> +                warn_report("WARN: migrations may fail:"
>> +                            " IPv6 over RoCE / iWARP in linux"
>> +                            " is broken. But since you appear to have a"
>> +                            " mixed RoCE / IB environment, be sure to only"
>> +                            " migrate over the IB fabric until the kernel "
>> +                            " fixes the bug.");
>
> Won't this become "warning: WARN:"?

It will.  I'll drop the "WARN: " prefix.

>>              } else {
>>                  error_setg(errp, "RDMA ERROR: "
>>                             "You only have RoCE / iWARP devices in your systems"
>> @@ -1418,12 +1418,8 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
>>          block->remote_keys[chunk] = 0;
>>  
>>          if (ret != 0) {
>> -            /*
>> -             * FIXME perror() is problematic, bcause ibv_dereg_mr() is
>> -             * not documented to set errno.  Will go away later in
>> -             * this series.
>> -             */
>> -            perror("unregistration chunk failed");
>> +            error_report("unregistration chunk failed: %s",
>> +                         strerror(ret));
>
> Doesn't seem to fix the issue, ret might still not be an errno. Am I
> missing something?

Yes :)

ibv_dereg_mr(3) section RETURN VALUE has:

       ibv_dereg_mr()  returns  0 on success, or the value of errno on failure
       (which indicates the failure reason).

Clearer now?

>>              return -1;
>>          }



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 52/53] migration/rdma: Use error_report() & friends instead of stderr
  2023-10-04 11:15     ` Markus Armbruster
@ 2023-10-04 13:52       ` Fabiano Rosas
  0 siblings, 0 replies; 121+ messages in thread
From: Fabiano Rosas @ 2023-10-04 13:52 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Markus Armbruster, qemu-devel, quintela, peterx, leobras,
	lizhijian, eblake

Markus Armbruster <armbru@redhat.com> writes:

> Fabiano Rosas <farosas@suse.de> writes:
>
>> Markus Armbruster <armbru@redhat.com> writes:
>>
>>> error_report() obeys -msg, reports the current error location if any,
>>> and reports to the current monitor if any.  Reporting to stderr
>>> directly with fprintf() or perror() is wrong, because it loses all
>>> this.
>>>
>>> Fix the offenders.  Bonus: resolves a FIXME about problematic use of
>>> errno.
>>>
>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>> ---
>>>  migration/rdma.c | 44 +++++++++++++++++++++-----------------------
>>>  1 file changed, 21 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/migration/rdma.c b/migration/rdma.c
>>> index 54b59d12b1..dba0802fca 100644
>>> --- a/migration/rdma.c
>>> +++ b/migration/rdma.c
>>> @@ -877,12 +877,12 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
>>>  
>>>          if (roce_found) {
>>>              if (ib_found) {
>>> -                fprintf(stderr, "WARN: migrations may fail:"
>>> -                                " IPv6 over RoCE / iWARP in linux"
>>> -                                " is broken. But since you appear to have a"
>>> -                                " mixed RoCE / IB environment, be sure to only"
>>> -                                " migrate over the IB fabric until the kernel "
>>> -                                " fixes the bug.\n");
>>> +                warn_report("WARN: migrations may fail:"
>>> +                            " IPv6 over RoCE / iWARP in linux"
>>> +                            " is broken. But since you appear to have a"
>>> +                            " mixed RoCE / IB environment, be sure to only"
>>> +                            " migrate over the IB fabric until the kernel "
>>> +                            " fixes the bug.");
>>
>> Won't this become "warning: WARN:"?
>
> It will.  I'll drop the "WARN: " prefix.
>
>>>              } else {
>>>                  error_setg(errp, "RDMA ERROR: "
>>>                             "You only have RoCE / iWARP devices in your systems"
>>> @@ -1418,12 +1418,8 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
>>>          block->remote_keys[chunk] = 0;
>>>  
>>>          if (ret != 0) {
>>> -            /*
>>> -             * FIXME perror() is problematic, bcause ibv_dereg_mr() is
>>> -             * not documented to set errno.  Will go away later in
>>> -             * this series.
>>> -             */
>>> -            perror("unregistration chunk failed");
>>> +            error_report("unregistration chunk failed: %s",
>>> +                         strerror(ret));
>>
>> Doesn't seem to fix the issue, ret might still not be an errno. Am I
>> missing something?
>
> Yes :)
>
> ibv_dereg_mr(3) section RETURN VALUE has:
>
>        ibv_dereg_mr()  returns  0 on success, or the value of errno on failure
>        (which indicates the failure reason).
>
> Clearer now?

Yep, thank you. 



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 01/53] migration/rdma: Clean up qemu_rdma_poll()'s return type
  2023-09-28 13:19 ` [PATCH v2 01/53] migration/rdma: Clean up qemu_rdma_poll()'s return type Markus Armbruster
@ 2023-10-04 14:26   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 14:26 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> qemu_rdma_poll()'s return type is uint64_t, even though it returns 0,
> -1, or @ret, which is int.  Its callers assign the return value to int
> variables, then check whether it's negative.  Unclean.
>
> Return int instead.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

queued.



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 02/53] migration/rdma: Clean up qemu_rdma_data_init()'s return type
  2023-09-28 13:19 ` [PATCH v2 02/53] migration/rdma: Clean up qemu_rdma_data_init()'s " Markus Armbruster
@ 2023-10-04 14:35   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 14:35 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> qemu_rdma_data_init() return type is void *.  It actually returns
> RDMAContext *, and all its callers assign the value to an
> RDMAContext *.  Unclean.
>
> Return RDMAContext * instead.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

queued



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 03/53] migration/rdma: Clean up rdma_delete_block()'s return type
  2023-09-28 13:19 ` [PATCH v2 03/53] migration/rdma: Clean up rdma_delete_block()'s " Markus Armbruster
@ 2023-10-04 14:36   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 14:36 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> rdma_delete_block() always returns 0, which its only caller ignores.
> Return void instead.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 04/53] migration/rdma: Drop fragile wr_id formatting
  2023-09-28 13:19 ` [PATCH v2 04/53] migration/rdma: Drop fragile wr_id formatting Markus Armbruster
@ 2023-10-04 14:38   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 14:38 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> wrid_desc[] uses 4001 pointers to map four integer values to strings.
>
> print_wrid() accesses wrid_desc[] out of bounds when passed a negative
> argument.  It returns null for values 2..1999 and 2001..3999.
>
> qemu_rdma_poll() and qemu_rdma_block_for_wrid() print wrid_desc[wr_id]
> and passes print_wrid(wr_id) to tracepoints.  Could conceivably crash
> trying to format a null string.  I believe access out of bounds is not
> possible.
>
> Not worth cleaning up.  Dumb down to show just numeric wr_id.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 05/53] migration/rdma: Consistently use uint64_t for work request IDs
  2023-09-28 13:19 ` [PATCH v2 05/53] migration/rdma: Consistently use uint64_t for work request IDs Markus Armbruster
@ 2023-10-04 14:39   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 14:39 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> We use int instead of uint64_t in a few places.  Change them to
> uint64_t.
>
> This cleans up a comparison of signed qemu_rdma_block_for_wrid()
> parameter @wrid_requested with unsigned @wr_id.  Harmless, because the
> actual arguments are non-negative enumeration constants.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 06/53] migration/rdma: Fix unwanted integer truncation
  2023-09-28 13:19 ` [PATCH v2 06/53] migration/rdma: Fix unwanted integer truncation Markus Armbruster
  2023-09-28 14:20   ` Fabiano Rosas
@ 2023-10-04 14:41   ` Juan Quintela
  2023-10-07  1:53   ` Zhijian Li (Fujitsu)
  2 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 14:41 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> qio_channel_rdma_readv() assigns the size_t value of qemu_rdma_fill()
> to an int variable before it adds it to @done / subtracts it from
> @want, both size_t.  Truncation when qemu_rdma_fill() copies more than
> INT_MAX bytes.  Seems vanishingly unlikely, but needs fixing all the
> same.
>
> Fixes: 6ddd2d76ca6f (migration: convert RDMA to use QIOChannel interface)
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 07/53] migration/rdma: Clean up two more harmless signed vs. unsigned issues
  2023-09-28 13:19 ` [PATCH v2 07/53] migration/rdma: Clean up two more harmless signed vs. unsigned issues Markus Armbruster
@ 2023-10-04 14:44   ` Juan Quintela
  2023-10-07  2:38   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 14:44 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> qemu_rdma_exchange_get_response() compares int parameter @expecting
> with uint32_t head->type.  Actual arguments are non-negative
> enumeration constants, RDMAControlHeader uint32_t member type, or
> qemu_rdma_exchange_recv() int parameter expecting.  Actual arguments
> for the latter are non-negative enumeration constants.  Change both
> parameters to uint32_t.
>
> In qio_channel_rdma_readv(), loop control variable @i is ssize_t, and
> counts from 0 up to @niov, which is size_t.  Change @i to size_t.
>
> While there, make qio_channel_rdma_readv() and
> qio_channel_rdma_writev() more consistent: change the former's @done
> to ssize_t, and delete the latter's useless initialization of @len.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 08/53] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage
  2023-09-28 13:19 ` [PATCH v2 08/53] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage Markus Armbruster
@ 2023-10-04 14:50   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 14:50 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 09/53] migration/rdma: Fix qemu_rdma_accept() to return failure on errors
  2023-09-28 13:19 ` [PATCH v2 09/53] migration/rdma: Fix qemu_rdma_accept() to return failure on errors Markus Armbruster
@ 2023-10-04 14:51   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 14:51 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> qemu_rdma_accept() returns 0 in some cases even when it didn't
> complete its job due to errors.  Impact is not obvious.  I figure the
> caller will soon fail again with a misleading error message.
>
> Fix it to return -1 on any failure.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 10/53] migration/rdma: Put @errp parameter last
  2023-09-28 13:19 ` [PATCH v2 10/53] migration/rdma: Put @errp parameter last Markus Armbruster
@ 2023-10-04 14:54   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 14:54 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> include/qapi/error.h demands:
>
>  * - Functions that use Error to report errors have an Error **errp
>  *   parameter.  It should be the last parameter, except for functions
>  *   taking variable arguments.
>
> qemu_rdma_connect() does not conform.  Clean it up.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 11/53] migration/rdma: Eliminate error_propagate()
  2023-09-28 13:19 ` [PATCH v2 11/53] migration/rdma: Eliminate error_propagate() Markus Armbruster
@ 2023-10-04 14:58   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 14:58 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> When all we do with an Error we receive into a local variable is
> propagating to somewhere else, we can just as well receive it there
> right away.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 12/53] migration/rdma: Drop rdma_add_block() error handling
  2023-09-28 13:19 ` [PATCH v2 12/53] migration/rdma: Drop rdma_add_block() error handling Markus Armbruster
@ 2023-10-04 14:58   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 14:58 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> rdma_add_block() can't fail.  Return void, and drop the unreachable
> error handling.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 13/53] migration/rdma: Drop qemu_rdma_search_ram_block() error handling
  2023-09-28 13:19 ` [PATCH v2 13/53] migration/rdma: Drop qemu_rdma_search_ram_block() " Markus Armbruster
@ 2023-10-04 15:00   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 15:00 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> qemu_rdma_search_ram_block() can't fail.  Return void, and drop the
> unreachable error handling.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 14/53] migration/rdma: Make qemu_rdma_buffer_mergeable() return bool
  2023-09-28 13:19 ` [PATCH v2 14/53] migration/rdma: Make qemu_rdma_buffer_mergeable() return bool Markus Armbruster
@ 2023-10-04 15:01   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 15:01 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> qemu_rdma_buffer_mergeable() is semantically a predicate.  It returns
> int 0 or 1.  Return bool instead, and fix the function name's
> spelling.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 17/53] migration/rdma: Ditch useless numeric error codes in error messages
  2023-09-28 13:19 ` [PATCH v2 17/53] migration/rdma: Ditch useless numeric error codes in error messages Markus Armbruster
@ 2023-10-04 15:06   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 15:06 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> Several error messages include numeric error codes returned by failed
> functions:
>
> * ibv_poll_cq() returns an unspecified negative value.  Useless.
>
> * rdma_accept and rdma_get_cm_event() return -1.  Useless.
>
> * qemu_rdma_poll() returns either -1 or an unspecified negative
>   value.  Useless.
>
> * qemu_rdma_block_for_wrid(), qemu_rdma_write_flush(),
>   qemu_rdma_exchange_send(), qemu_rdma_exchange_recv(),
>   qemu_rdma_write() return a negative value that may or may not be an
>   errno value.  While reporting human-readable errno
>   information (which a number is not) can be useful, reporting an
>   error code that may or may not be an errno value is useless.
>
> Drop these error codes from the error messages.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

As I didn't catch the previous on (waiting the discussion to end), I had
to fix one error_report() by hand, nothing complicated.



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 18/53] migration/rdma: Fix io_writev(), io_readv() methods to obey contract
  2023-09-28 13:19 ` [PATCH v2 18/53] migration/rdma: Fix io_writev(), io_readv() methods to obey contract Markus Armbruster
@ 2023-10-04 15:09   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 15:09 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> QIOChannelClass methods qio_channel_rdma_readv() and
> qio_channel_rdma_writev() violate their method contract when
> rdma->error_state is non-zero:
>
> 1. They return whatever is in rdma->error_state then.  Only -1 will be
>    fine.  -2 will be misinterpreted as "would block".  Anything less
>    than -2 isn't defined in the contract.  A positive value would be
>    misinterpreted as success, but I believe that's not actually
>    possible.
>
> 2. They neglect to set an error then.  If something up the call stack
>    dereferences the error when failure is returned, it will crash.  If
>    it ignores the return value and checks the error instead, it will
>    miss the error.
>
> Crap like this happens when return statements hide in macros,
> especially when their uses are far away from the definition.
>
> I elected not to investigate how callers are impacted.
>
> Expand the two bad macro uses, so we can set an error and return -1.
> The next commit will then get rid of the macro altogether.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 19/53] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE()
  2023-09-28 13:19 ` [PATCH v2 19/53] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE() Markus Armbruster
@ 2023-10-04 15:10   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 15:10 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> Hiding return statements in macros is a bad idea.  Use a function
> instead, and open code the return part.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

I hated this macro, thanks.



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 20/53] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error
  2023-09-28 13:19 ` [PATCH v2 20/53] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error Markus Armbruster
@ 2023-10-04 15:10   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 15:10 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> qemu_rdma_resolve_host() and qemu_rdma_dest_init() try addresses until
> they find on that works.  If none works, they return the first Error
> set by qemu_rdma_broken_ipv6_kernel(), or else return a generic one.
>
> qemu_rdma_broken_ipv6_kernel() neglects to set an Error when
> ibv_open_device() fails.  If a later address fails differently, we use
> that Error instead, or else the generic one.  Harmless enough, but
> needs fixing all the same.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 22/53] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port
  2023-09-28 13:19 ` [PATCH v2 22/53] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port Markus Armbruster
  2023-09-29 15:10   ` Fabiano Rosas
@ 2023-10-04 15:24   ` Juan Quintela
  2023-10-07  5:36   ` Zhijian Li (Fujitsu)
  2 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 15:24 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> qemu_rdma_data_init() neglects to set an Error when it fails because
> @host_port is null.  Fortunately, no caller passes null, so this is
> merely a latent bug.  Drop the flawed code handling null argument.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

As your discussion with Peter, I think that this is ok.  The only other
thing we could do is add an assert(), but I am not a big fan either.



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 21/53] migration/rdma: Fix qemu_get_cm_event_timeout() to always set error
  2023-09-28 13:19 ` [PATCH v2 21/53] migration/rdma: Fix qemu_get_cm_event_timeout() to always " Markus Armbruster
@ 2023-10-04 15:25   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 15:25 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> qemu_get_cm_event_timeout() neglects to set an error when it fails
> because rdma_get_cm_event() fails.  Harmless, as its caller
> qemu_rdma_connect() substitutes a generic error then.  Fix it anyway.
>
> qemu_rdma_connect() also sets the generic error when its own call of
> rdma_get_cm_event() fails.  Make the error handling more obvious: set
> a specific error right after rdma_get_cm_event() fails.  Delete the
> generic error.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 23/53] migration/rdma: Fix QEMUFileHooks method return values
  2023-09-28 13:19 ` [PATCH v2 23/53] migration/rdma: Fix QEMUFileHooks method return values Markus Armbruster
@ 2023-10-04 15:28   ` Juan Quintela
  2023-10-04 16:22   ` Juan Quintela
  1 sibling, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 15:28 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> The QEMUFileHooks methods don't come with a written contract.  Digging
> through the code calling them, we find:
>
> * save_page():
>
>   Negative values RAM_SAVE_CONTROL_DELAYED and
>   RAM_SAVE_CONTROL_NOT_SUPP are special.  Any other negative value is
>   an unspecified error.
>
>   qemu_rdma_save_page() returns -EIO or rdma->error_state on error.  I
>   believe the latter is always negative.  Nothing stops either of them
>   to clash with the special values, though.  Feels unlikely, but fix
>   it anyway to return only the special values and -1.
>
> * before_ram_iterate(), after_ram_iterate():
>
>   Negative value means error.  qemu_rdma_registration_start() and
>   qemu_rdma_registration_stop() comply as far as I can tell.  Make
>   them comply *obviously*, by returning -1 on error.
>
> * hook_ram_load:
>
>   Negative value means error.  rdma_load_hook() already returns -1 on
>   error.  Leave it alone.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

I "remove" QEMUFileHooks on this series:

https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg01037.html

Do you mean waiting for the series to land before adding this one?

As that hooks only get rdma use, I just changed them to be normal
functions, no hooks.

And yes, it was not fun.  I know you have feel the "pleasure" of hacking
this file O:-)

Thanks, Juan.



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 24/53] migration/rdma: Fix rdma_getaddrinfo() error checking
  2023-09-28 13:19 ` [PATCH v2 24/53] migration/rdma: Fix rdma_getaddrinfo() error checking Markus Armbruster
@ 2023-10-04 15:30   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 15:30 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> rdma_getaddrinfo() returns 0 on success.  On error, it returns one of
> the EAI_ error codes like getaddrinfo() does, or -1 with errno set.
> This is broken by design: POSIX implicitly specifies the EAI_ error
> codes to be non-zero, no more.  They could clash with -1.  Nothing we
> can do about this design flaw.
>
> Both callers of rdma_getaddrinfo() only recognize negative values as
> error.  Works only because systems elect to make the EAI_ error codes
> negative.
>
> Best not to rely on that: change the callers to treat any non-zero
> value as failure.  Also change them to return -1 instead of the value
> received from getaddrinfo() on failure, to avoid positive error
> values.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 15/53] migration/rdma: Use bool for two RDMAContext flags
  2023-09-28 13:19 ` [PATCH v2 15/53] migration/rdma: Use bool for two RDMAContext flags Markus Armbruster
@ 2023-10-04 15:56   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 15:56 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> @error_reported and @received_error are flags.  The latter is even
> assigned bool true.  Change them from int to bool.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 25/53] migration/rdma: Return -1 instead of negative errno code
  2023-09-28 13:19 ` [PATCH v2 25/53] migration/rdma: Return -1 instead of negative errno code Markus Armbruster
@ 2023-10-04 16:19   ` Juan Quintela
  2023-10-04 16:23   ` Juan Quintela
  1 sibling, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 16:19 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> Several functions return negative errno codes on failure.  Callers
> check for specific codes exactly never.  For some of the functions,
> callers couldn't check even if they wanted to, because the functions
> also return negative values that aren't errno codes, leaving readers
> confused on what the function actually returns.
>
> Clean up and simplify: return -1 instead of negative errno code.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 23/53] migration/rdma: Fix QEMUFileHooks method return values
  2023-09-28 13:19 ` [PATCH v2 23/53] migration/rdma: Fix QEMUFileHooks method return values Markus Armbruster
  2023-10-04 15:28   ` Juan Quintela
@ 2023-10-04 16:22   ` Juan Quintela
  2023-10-04 16:37     ` Markus Armbruster
  1 sibling, 1 reply; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 16:22 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> The QEMUFileHooks methods don't come with a written contract.  Digging
> through the code calling them, we find:
>
> * save_page():
>
>   Negative values RAM_SAVE_CONTROL_DELAYED and
>   RAM_SAVE_CONTROL_NOT_SUPP are special.  Any other negative value is
>   an unspecified error.
>
>   qemu_rdma_save_page() returns -EIO or rdma->error_state on error.  I
>   believe the latter is always negative.  Nothing stops either of them
>   to clash with the special values, though.  Feels unlikely, but fix
>   it anyway to return only the special values and -1.
>
> * before_ram_iterate(), after_ram_iterate():
>
>   Negative value means error.  qemu_rdma_registration_start() and
>   qemu_rdma_registration_stop() comply as far as I can tell.  Make
>   them comply *obviously*, by returning -1 on error.
>
> * hook_ram_load:
>
>   Negative value means error.  rdma_load_hook() already returns -1 on
>   error.  Leave it alone.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

Changed idea.  Will include this and rebase mines on top.



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 25/53] migration/rdma: Return -1 instead of negative errno code
  2023-09-28 13:19 ` [PATCH v2 25/53] migration/rdma: Return -1 instead of negative errno code Markus Armbruster
  2023-10-04 16:19   ` Juan Quintela
@ 2023-10-04 16:23   ` Juan Quintela
  1 sibling, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 16:23 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> Several functions return negative errno codes on failure.  Callers
> check for specific codes exactly never.  For some of the functions,
> callers couldn't check even if they wanted to, because the functions
> also return negative values that aren't errno codes, leaving readers
> confused on what the function actually returns.
>
> Clean up and simplify: return -1 instead of negative errno code.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 26/53] migration/rdma: Dumb down remaining int error values to -1
  2023-09-28 13:19 ` [PATCH v2 26/53] migration/rdma: Dumb down remaining int error values to -1 Markus Armbruster
@ 2023-10-04 16:25   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 16:25 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> This is just to make the error value more obvious.  Callers don't
> mind.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 27/53] migration/rdma: Replace int error_state by bool errored
  2023-09-28 13:19 ` [PATCH v2 27/53] migration/rdma: Replace int error_state by bool errored Markus Armbruster
@ 2023-10-04 16:25   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 16:25 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> All we do with the value of RDMAContext member @error_state is test
> whether it's zero.  Change to bool and rename to @errored.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 28/53] migration/rdma: Drop superfluous assignments to @ret
  2023-09-28 13:19 ` [PATCH v2 28/53] migration/rdma: Drop superfluous assignments to @ret Markus Armbruster
@ 2023-10-04 16:27   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 16:27 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 30/53] migration/rdma: Plug a memory leak and improve a message
  2023-09-28 13:19 ` [PATCH v2 30/53] migration/rdma: Plug a memory leak and improve a message Markus Armbruster
@ 2023-10-04 16:27   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 16:27 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> When migration capability @rdma-pin-all is true, but the server cannot
> honor it, qemu_rdma_connect() calls macro ERROR(), then returns
> success.
>
> ERROR() sets an error.  Since qemu_rdma_connect() returns success, its
> caller rdma_start_outgoing_migration() duly assumes @errp is still
> clear.  The Error object leaks.
>
> ERROR() additionally reports the situation to the user as an error:
>
>     RDMA ERROR: Server cannot support pinning all memory. Will register memory dynamically.
>
> Is this an error or not?  It actually isn't; we disable @rdma-pin-all
> and carry on.  "Correcting" the user's configuration decisions that
> way feels problematic, but that's a topic for another day.
>
> Replace ERROR() by warn_report().  This plugs the memory leak, and
> emits a clearer message to the user.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 29/53] migration/rdma: Check negative error values the same way everywhere
  2023-09-28 13:19 ` [PATCH v2 29/53] migration/rdma: Check negative error values the same way everywhere Markus Armbruster
  2023-09-29 15:28   ` Fabiano Rosas
@ 2023-10-04 16:33   ` Juan Quintela
  1 sibling, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 16:33 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> When a function returns 0 on success, negative value on error,
> checking for non-zero suffices, but checking for negative is clearer.
> So do that.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 23/53] migration/rdma: Fix QEMUFileHooks method return values
  2023-10-04 16:22   ` Juan Quintela
@ 2023-10-04 16:37     ` Markus Armbruster
  0 siblings, 0 replies; 121+ messages in thread
From: Markus Armbruster @ 2023-10-04 16:37 UTC (permalink / raw)
  To: Juan Quintela; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Juan Quintela <quintela@redhat.com> writes:

> Markus Armbruster <armbru@redhat.com> wrote:
>> The QEMUFileHooks methods don't come with a written contract.  Digging
>> through the code calling them, we find:
>>
>> * save_page():
>>
>>   Negative values RAM_SAVE_CONTROL_DELAYED and
>>   RAM_SAVE_CONTROL_NOT_SUPP are special.  Any other negative value is
>>   an unspecified error.
>>
>>   qemu_rdma_save_page() returns -EIO or rdma->error_state on error.  I
>>   believe the latter is always negative.  Nothing stops either of them
>>   to clash with the special values, though.  Feels unlikely, but fix
>>   it anyway to return only the special values and -1.
>>
>> * before_ram_iterate(), after_ram_iterate():
>>
>>   Negative value means error.  qemu_rdma_registration_start() and
>>   qemu_rdma_registration_stop() comply as far as I can tell.  Make
>>   them comply *obviously*, by returning -1 on error.
>>
>> * hook_ram_load:
>>
>>   Negative value means error.  rdma_load_hook() already returns -1 on
>>   error.  Leave it alone.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
>
> Changed idea.  Will include this and rebase mines on top.

Thanks!



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 31/53] migration/rdma: Delete inappropriate error_report() in macro ERROR()
  2023-09-28 13:19 ` [PATCH v2 31/53] migration/rdma: Delete inappropriate error_report() in macro ERROR() Markus Armbruster
@ 2023-10-04 16:50   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 16:50 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
>
> Macro ERROR() violates this principle.  Delete the error_report()
> there.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
> Tested-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 32/53] migration/rdma: Retire macro ERROR()
  2023-09-28 13:19 ` [PATCH v2 32/53] migration/rdma: Retire " Markus Armbruster
@ 2023-10-04 16:50   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 16:50 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> ERROR() has become "error_setg() unless an error has been set
> already".  Hiding the conditional in the macro is in the way of
> further work.  Replace the macro uses by their expansion, and delete
> the macro.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 33/53] migration/rdma: Fix error handling around rdma_getaddrinfo()
  2023-09-28 13:19 ` [PATCH v2 33/53] migration/rdma: Fix error handling around rdma_getaddrinfo() Markus Armbruster
@ 2023-10-04 16:51   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 16:51 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> qemu_rdma_resolve_host() and qemu_rdma_dest_init() iterate over
> addresses to find one that works, holding onto the first Error from
> qemu_rdma_broken_ipv6_kernel() for use when no address works.  Issues:
>
> 1. If @errp was &error_abort or &error_fatal, we'd terminate instead
>    of trying the next address.  Can't actually happen, since no caller
>    passes these arguments.
>
> 2. When @errp is a pointer to a variable containing NULL, and
>    qemu_rdma_broken_ipv6_kernel() fails, the variable no longer
>    contains NULL.  Subsequent iterations pass it again, violating
>    Error usage rules.  Dangerous, as setting an error would then trip
>    error_setv()'s assertion.  Works only because
>    qemu_rdma_broken_ipv6_kernel() and the code following the loops
>    carefully avoids setting a second error.
>
> 3. If qemu_rdma_broken_ipv6_kernel() fails, and then a later iteration
>    finds a working address, @errp still holds the first error from
>    qemu_rdma_broken_ipv6_kernel().  If we then run into another error,
>    we report the qemu_rdma_broken_ipv6_kernel() failure instead.
>
> 4. If we don't run into another error, we leak the Error object.
>
> Use a local error variable, and propagate to @errp.  This fixes 3. and
> also cleans up 1 and partly 2.
>
> Free this error when we have a working address.  This fixes 4.
>
> Pass the local error variable to qemu_rdma_broken_ipv6_kernel() only
> until it fails.  Pass null on any later iterations.  This cleans up
> the remainder of 2.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 34/53] migration/rdma: Drop "@errp is clear" guards around error_setg()
  2023-09-28 13:20 ` [PATCH v2 34/53] migration/rdma: Drop "@errp is clear" guards around error_setg() Markus Armbruster
@ 2023-10-04 16:52   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 16:52 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> These guards are all redundant now.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 35/53] migration/rdma: Convert qemu_rdma_exchange_recv() to Error
  2023-09-28 13:20 ` [PATCH v2 35/53] migration/rdma: Convert qemu_rdma_exchange_recv() to Error Markus Armbruster
@ 2023-10-04 16:53   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 16:53 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
>
> qio_channel_rdma_readv() violates this principle: it calls
> error_report() via qemu_rdma_exchange_recv().  I elected not to
> investigate how callers handle the error, i.e. precise impact is not
> known.
>
> Clean this up by converting qemu_rdma_exchange_recv() to Error.
>
> Necessitates setting an error when qemu_rdma_exchange_get_response()
> failed.  Since this error will go away later in this series, simply
> use "FIXME temporary error message" there.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 36/53] migration/rdma: Convert qemu_rdma_exchange_send() to Error
  2023-09-28 13:20 ` [PATCH v2 36/53] migration/rdma: Convert qemu_rdma_exchange_send() " Markus Armbruster
@ 2023-10-04 16:55   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 16:55 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
>
> qio_channel_rdma_writev() violates this principle: it calls
> error_report() via qemu_rdma_exchange_send().  I elected not to
> investigate how callers handle the error, i.e. precise impact is not
> known.
>
> Clean this up by converting qemu_rdma_exchange_send() to Error.
>
> Necessitates setting an error when qemu_rdma_post_recv_control(),
> callback(), or qemu_rdma_exchange_get_response() failed.  Since these
> errors will go away later in this series, simply use "FIXME temporary
> error message" there.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 37/53] migration/rdma: Convert qemu_rdma_exchange_get_response() to Error
  2023-09-28 13:20 ` [PATCH v2 37/53] migration/rdma: Convert qemu_rdma_exchange_get_response() " Markus Armbruster
@ 2023-10-04 16:55   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 16:55 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
>
> qemu_rdma_exchange_send() and qemu_rdma_exchange_recv() violate this
> principle: they call error_report() via
> qemu_rdma_exchange_get_response().  I elected not to investigate how
> callers handle the error, i.e. precise impact is not known.
>
> Clean this up by converting qemu_rdma_exchange_get_response() to
> Error.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 38/53] migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() to Error
  2023-09-28 13:20 ` [PATCH v2 38/53] migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() " Markus Armbruster
@ 2023-10-04 16:56   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 16:56 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
>
> qemu_rdma_exchange_send() violates this principle: it calls
> error_report() via callback qemu_rdma_reg_whole_ram_blocks().  I
> elected not to investigate how callers handle the error, i.e. precise
> impact is not known.
>
> Clean this up by converting the callback to Error.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 39/53] migration/rdma: Convert qemu_rdma_write_flush() to Error
  2023-09-28 13:20 ` [PATCH v2 39/53] migration/rdma: Convert qemu_rdma_write_flush() " Markus Armbruster
@ 2023-10-04 16:56   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 16:56 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
>
> qio_channel_rdma_writev() violates this principle: it calls
> error_report() via qemu_rdma_write_flush().  I elected not to
> investigate how callers handle the error, i.e. precise impact is not
> known.
>
> Clean this up by converting qemu_rdma_write_flush() to Error.
>
> Necessitates setting an error when qemu_rdma_write_one() failed.
> Since this error will go away later in this series, simply use "FIXME
> temporary error message" there.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 40/53] migration/rdma: Convert qemu_rdma_write_one() to Error
  2023-09-28 13:20 ` [PATCH v2 40/53] migration/rdma: Convert qemu_rdma_write_one() " Markus Armbruster
@ 2023-10-04 16:56   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 16:56 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
>
> qemu_rdma_write_flush() violates this principle: it calls
> error_report() via qemu_rdma_write_one().  I elected not to
> investigate how callers handle the error, i.e. precise impact is not
> known.
>
> Clean this up by converting qemu_rdma_write_one() to Error.  Bonus:
> resolves a FIXME about problematic use of errno.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 41/53] migration/rdma: Convert qemu_rdma_write() to Error
  2023-09-28 13:20 ` [PATCH v2 41/53] migration/rdma: Convert qemu_rdma_write() " Markus Armbruster
@ 2023-10-04 17:23   ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 17:23 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> Just for consistency with qemu_rdma_write_one() and
> qemu_rdma_write_flush(), and for slightly simpler code.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 51/53] migration/rdma: Downgrade qemu_rdma_cleanup() errors to warnings
  2023-09-28 13:20 ` [PATCH v2 51/53] migration/rdma: Downgrade qemu_rdma_cleanup() errors to warnings Markus Armbruster
  2023-09-29 15:29   ` Fabiano Rosas
@ 2023-10-04 17:47   ` Juan Quintela
  2023-10-07  3:50   ` Zhijian Li (Fujitsu)
  2 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 17:47 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
>
> qemu_rdma_source_init(), qemu_rdma_connect(),
> rdma_start_incoming_migration(), and rdma_start_outgoing_migration()
> violate this principle: they call error_report() via
> qemu_rdma_cleanup().
>
> Moreover, qemu_rdma_cleanup() can't fail.  It is called on error
> paths, and QIOChannel close and finalization.  Are the conditions it
> reports really errors?  I doubt it.
>
> Downgrade qemu_rdma_cleanup()'s errors to warnings.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 53/53] migration/rdma: Replace flawed device detail dump by tracing
  2023-09-28 13:20 ` [PATCH v2 53/53] migration/rdma: Replace flawed device detail dump by tracing Markus Armbruster
  2023-09-29 17:05   ` Fabiano Rosas
@ 2023-10-04 17:50   ` Juan Quintela
  2023-10-07  3:57   ` Zhijian Li (Fujitsu)
  2 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 17:50 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> qemu_rdma_dump_id() dumps RDMA device details to stdout.
>
> rdma_start_outgoing_migration() calls it via qemu_rdma_source_init()
> and qemu_rdma_resolve_host() to show source device details.
> rdma_start_incoming_migration() arranges its call via
> rdma_accept_incoming_migration() and qemu_rdma_accept() to show
> destination device details.
>
> Two issues:
>
> 1. rdma_start_outgoing_migration() can run in HMP context.  The
>    information should arguably go the monitor, not stdout.
>
> 2. ibv_query_port() failure is reported as error.  Its callers remain
>    unaware of this failure (qemu_rdma_dump_id() can't fail), so
>    reporting this to the user as an error is problematic.
>
> Fixable, but the device detail dump is noise, except when
> troubleshooting.  Tracing is a better fit.  Similar function
> qemu_rdma_dump_id() was converted to tracing in commit
> 733252deb8b (Tracify migration/rdma.c).
>
> Convert qemu_rdma_dump_id(), too.
>
> While there, touch up qemu_rdma_dump_gid()'s outdated comment.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 00/53] migration/rdma: Error handling fixes
  2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
                   ` (52 preceding siblings ...)
  2023-09-28 13:20 ` [PATCH v2 53/53] migration/rdma: Replace flawed device detail dump by tracing Markus Armbruster
@ 2023-10-04 17:52 ` Juan Quintela
  2023-10-05  5:07   ` Markus Armbruster
  53 siblings, 1 reply; 121+ messages in thread
From: Juan Quintela @ 2023-10-04 17:52 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> Oh dear, where to start.  There's so much wrong, and in pretty obvious
> ways.  This code should never have passed review.  I'm refraining from
> saying more; see the commit messages instead.
>
> Issues remaining after this series include:
>
> * Terrible error messages
>
> * Some error message cascades remain
>
> * There is no written contract for QEMUFileHooks, and the
>   responsibility for reporting errors is unclear
>
> * There seem to be no tests whatsoever
>
> PATCH 29 is arguably a matter of taste.  I made my case for it during
> review of v1.  If maintainers don't want it, I'll drop it.
>
> Related: [PATCH 1/7] migration/rdma: Fix save_page method to fail on
> polling error

Hi Markus

I integrated everything except:

>   migration/rdma: Fix or document problematic uses of errno

Most of them are dropped on following patches.

>   migration/rdma: Use error_report() & friends instead of stderr

You said you have to resend this one.

There were some conflicts, I was careful, but one never knows.  So you
are wellcome to take a look when the PULL cames to the tree.

Thanks, Juan.



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 00/53] migration/rdma: Error handling fixes
  2023-10-04 17:52 ` [PATCH v2 00/53] migration/rdma: Error handling fixes Juan Quintela
@ 2023-10-05  5:07   ` Markus Armbruster
  2023-10-05  6:37     ` Juan Quintela
  0 siblings, 1 reply; 121+ messages in thread
From: Markus Armbruster @ 2023-10-05  5:07 UTC (permalink / raw)
  To: Juan Quintela; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Juan Quintela <quintela@redhat.com> writes:

> Markus Armbruster <armbru@redhat.com> wrote:
>> Oh dear, where to start.  There's so much wrong, and in pretty obvious
>> ways.  This code should never have passed review.  I'm refraining from
>> saying more; see the commit messages instead.
>>
>> Issues remaining after this series include:
>>
>> * Terrible error messages
>>
>> * Some error message cascades remain
>>
>> * There is no written contract for QEMUFileHooks, and the
>>   responsibility for reporting errors is unclear
>>
>> * There seem to be no tests whatsoever
>>
>> PATCH 29 is arguably a matter of taste.  I made my case for it during
>> review of v1.  If maintainers don't want it, I'll drop it.
>>
>> Related: [PATCH 1/7] migration/rdma: Fix save_page method to fail on
>> polling error
>
> Hi Markus
>
> I integrated everything except:
>
>>   migration/rdma: Fix or document problematic uses of errno
>
> Most of them are dropped on following patches.

The hunks that are dropped in later patches are:

* Four FIXME comments about incorrect or problematic use of perror().

  If you drop the patch, you have to adjust the later patches that
  remove these hunks.  Resolving the conflicts is *not* enough; you also
  have to correct the commit messages.

The hunks that are not dropped are:

* Three comments about bugs (either library doc bug or incorrect use of
  @errno here).  I'd hate to lose them.

* One bug fix, in qemu_rdma_advise_prefetch_mr().  Losing this one would
  be foolish.

Please consider keeping the patch.

>>   migration/rdma: Use error_report() & friends instead of stderr
>
> You said you have to resend this one.

Can do, but since the change is trivial, perhaps you could make it in
your tree without a resend.  Change the line

                warn_report("WARN: migrations may fail:"

to

                warn_report("migrations may fail:"

> There were some conflicts, I was careful, but one never knows.  So you
> are wellcome to take a look when the PULL cames to the tree.
>
> Thanks, Juan.



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 00/53] migration/rdma: Error handling fixes
  2023-10-05  5:07   ` Markus Armbruster
@ 2023-10-05  6:37     ` Juan Quintela
  0 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-05  6:37 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> Juan Quintela <quintela@redhat.com> writes:
>
>> Markus Armbruster <armbru@redhat.com> wrote:
>>> Oh dear, where to start.  There's so much wrong, and in pretty obvious
>>> ways.  This code should never have passed review.  I'm refraining from
>>> saying more; see the commit messages instead.
>>>
>>> Issues remaining after this series include:
>>>
>>> * Terrible error messages
>>>
>>> * Some error message cascades remain
>>>
>>> * There is no written contract for QEMUFileHooks, and the
>>>   responsibility for reporting errors is unclear
>>>
>>> * There seem to be no tests whatsoever
>>>
>>> PATCH 29 is arguably a matter of taste.  I made my case for it during
>>> review of v1.  If maintainers don't want it, I'll drop it.
>>>
>>> Related: [PATCH 1/7] migration/rdma: Fix save_page method to fail on
>>> polling error
>>
>> Hi Markus
>>
>> I integrated everything except:
>>
>>>   migration/rdma: Fix or document problematic uses of errno
>>
>> Most of them are dropped on following patches.
>
> The hunks that are dropped in later patches are:
>
> * Four FIXME comments about incorrect or problematic use of perror().
>
>   If you drop the patch, you have to adjust the later patches that
>   remove these hunks.  Resolving the conflicts is *not* enough; you also
>   have to correct the commit messages.
>
> The hunks that are not dropped are:
>
> * Three comments about bugs (either library doc bug or incorrect use of
>   @errno here).  I'd hate to lose them.
>
> * One bug fix, in qemu_rdma_advise_prefetch_mr().  Losing this one would
>   be foolish.
>
> Please consider keeping the patch.

And here I am, having to redo the merge from this patch O:-)

>>>   migration/rdma: Use error_report() & friends instead of stderr
>>
>> You said you have to resend this one.
>
> Can do, but since the change is trivial, perhaps you could make it in
> your tree without a resend.  Change the line
>
>                 warn_report("WARN: migrations may fail:"
>
> to
>
>                 warn_report("migrations may fail:"

I thought it was more complicated.

Ok doing both.

Thanks, Juan.



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 16/53] migration/rdma: Fix or document problematic uses of errno
  2023-09-28 13:19 ` [PATCH v2 16/53] migration/rdma: Fix or document problematic uses of errno Markus Armbruster
  2023-09-29 15:09   ` Fabiano Rosas
@ 2023-10-05  6:46   ` Juan Quintela
  2023-10-07  5:34   ` Zhijian Li (Fujitsu)
  2 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-05  6:46 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> We use errno after calling Libibverbs functions that are not
> documented to set errno (manual page does not mention errno), or where
> the documentation is unclear ("returns [...] the value of errno on
> failure").  While this could be read as "sets errno and returns it",
> a glance at the source code[*] kills that hope:
>
>     static inline int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr,
>                                     struct ibv_send_wr **bad_wr)
>     {
>             return qp->context->ops.post_send(qp, wr, bad_wr);
>     }
>
> The callback can be
>
>     static int mana_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
>                               struct ibv_send_wr **bad)
>     {
>             /* This version of driver supports RAW QP only.
>              * Posting WR is done directly in the application.
>              */
>             return EOPNOTSUPP;
>     }
>
> Neither of them touches errno.
>
> One of these errno uses is easy to fix, so do that now.  Several more
> will go away later in the series; add temporary FIXME commments.
> Three will remain; add TODO comments.  TODO, not FIXME, because the
> bug might be in Libibverbs documentation.
>
> [*] https://github.com/linux-rdma/rdma-core.git
>     commit 55fa316b4b18f258d8ac1ceb4aa5a7a35b094dcf
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

And here I am, re-merging from this patch O:-)



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 52/53] migration/rdma: Use error_report() & friends instead of stderr
  2023-09-28 13:20 ` [PATCH v2 52/53] migration/rdma: Use error_report() & friends instead of stderr Markus Armbruster
  2023-09-29 15:36   ` Fabiano Rosas
@ 2023-10-05  7:24   ` Juan Quintela
  2023-10-07  3:56   ` Zhijian Li (Fujitsu)
  2 siblings, 0 replies; 121+ messages in thread
From: Juan Quintela @ 2023-10-05  7:24 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, peterx, leobras, farosas, lizhijian, eblake

Markus Armbruster <armbru@redhat.com> wrote:
> error_report() obeys -msg, reports the current error location if any,
> and reports to the current monitor if any.  Reporting to stderr
> directly with fprintf() or perror() is wrong, because it loses all
> this.
>
> Fix the offenders.  Bonus: resolves a FIXME about problematic use of
> errno.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

I fixed the WARN issue by hand.

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 06/53] migration/rdma: Fix unwanted integer truncation
  2023-09-28 13:19 ` [PATCH v2 06/53] migration/rdma: Fix unwanted integer truncation Markus Armbruster
  2023-09-28 14:20   ` Fabiano Rosas
  2023-10-04 14:41   ` Juan Quintela
@ 2023-10-07  1:53   ` Zhijian Li (Fujitsu)
  2 siblings, 0 replies; 121+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-10-07  1:53 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras, farosas, eblake



On 28/09/2023 21:19, Markus Armbruster wrote:
> qio_channel_rdma_readv() assigns the size_t value of qemu_rdma_fill()
> to an int variable before it adds it to @done / subtracts it from
> @want, both size_t.  Truncation when qemu_rdma_fill() copies more than
> INT_MAX bytes.  Seems vanishingly unlikely, but needs fixing all the
> same.
> 
> Fixes: 6ddd2d76ca6f (migration: convert RDMA to use QIOChannel interface)
> Signed-off-by: Markus Armbruster<armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 07/53] migration/rdma: Clean up two more harmless signed vs. unsigned issues
  2023-09-28 13:19 ` [PATCH v2 07/53] migration/rdma: Clean up two more harmless signed vs. unsigned issues Markus Armbruster
  2023-10-04 14:44   ` Juan Quintela
@ 2023-10-07  2:38   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 121+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-10-07  2:38 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras, farosas, eblake



On 28/09/2023 21:19, Markus Armbruster wrote:
> qemu_rdma_exchange_get_response() compares int parameter @expecting
> with uint32_t head->type.  Actual arguments are non-negative
> enumeration constants, RDMAControlHeader uint32_t member type, or
> qemu_rdma_exchange_recv() int parameter expecting.  Actual arguments
> for the latter are non-negative enumeration constants.  Change both
> parameters to uint32_t.
> 
> In qio_channel_rdma_readv(), loop control variable @i is ssize_t, and
> counts from 0 up to @niov, which is size_t.  Change @i to size_t.
> 
> While there, make qio_channel_rdma_readv() and
> qio_channel_rdma_writev() more consistent: change the former's @done
> to ssize_t, and delete the latter's useless initialization of @len.
> 
> Signed-off-by: Markus Armbruster<armbru@redhat.com>
> Reviewed-by: Fabiano Rosas<farosas@suse.de>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 51/53] migration/rdma: Downgrade qemu_rdma_cleanup() errors to warnings
  2023-09-28 13:20 ` [PATCH v2 51/53] migration/rdma: Downgrade qemu_rdma_cleanup() errors to warnings Markus Armbruster
  2023-09-29 15:29   ` Fabiano Rosas
  2023-10-04 17:47   ` Juan Quintela
@ 2023-10-07  3:50   ` Zhijian Li (Fujitsu)
  2 siblings, 0 replies; 121+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-10-07  3:50 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras, farosas, eblake



On 28/09/2023 21:20, Markus Armbruster wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
> 
> qemu_rdma_source_init(), qemu_rdma_connect(),
> rdma_start_incoming_migration(), and rdma_start_outgoing_migration()
> violate this principle: they call error_report() via
> qemu_rdma_cleanup().
> 
> Moreover, qemu_rdma_cleanup() can't fail.  It is called on error
> paths, and QIOChannel close and finalization.  Are the conditions it
> reports really errors?  I doubt it.
> 
> Downgrade qemu_rdma_cleanup()'s errors to warnings.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> ---
>   migration/rdma.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 4e4d818460..54b59d12b1 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2358,9 +2358,9 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>                                          .type = RDMA_CONTROL_ERROR,
>                                          .repeat = 1,
>                                        };
> -            error_report("Early error. Sending error.");
> +            warn_report("Early error. Sending error.");
>               if (qemu_rdma_post_send_control(rdma, NULL, &head, &err) < 0) {
> -                error_report_err(err);
> +                warn_report_err(err);
>               }
>           }
>   

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 52/53] migration/rdma: Use error_report() & friends instead of stderr
  2023-09-28 13:20 ` [PATCH v2 52/53] migration/rdma: Use error_report() & friends instead of stderr Markus Armbruster
  2023-09-29 15:36   ` Fabiano Rosas
  2023-10-05  7:24   ` Juan Quintela
@ 2023-10-07  3:56   ` Zhijian Li (Fujitsu)
  2 siblings, 0 replies; 121+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-10-07  3:56 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras, farosas, eblake



On 28/09/2023 21:20, Markus Armbruster wrote:
> error_report() obeys -msg, reports the current error location if any,
> and reports to the current monitor if any.  Reporting to stderr
> directly with fprintf() or perror() is wrong, because it loses all
> this.
> 
> Fix the offenders.  Bonus: resolves a FIXME about problematic use of
> errno.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> ---
>   migration/rdma.c | 44 +++++++++++++++++++++-----------------------
>   1 file changed, 21 insertions(+), 23 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 54b59d12b1..dba0802fca 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -877,12 +877,12 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
>   
>           if (roce_found) {
>               if (ib_found) {
> -                fprintf(stderr, "WARN: migrations may fail:"
> -                                " IPv6 over RoCE / iWARP in linux"
> -                                " is broken. But since you appear to have a"
> -                                " mixed RoCE / IB environment, be sure to only"
> -                                " migrate over the IB fabric until the kernel "
> -                                " fixes the bug.\n");
> +                warn_report("WARN: migrations may fail:"
> +                            " IPv6 over RoCE / iWARP in linux"
> +                            " is broken. But since you appear to have a"
> +                            " mixed RoCE / IB environment, be sure to only"
> +                            " migrate over the IB fabric until the kernel "
> +                            " fixes the bug.");
>               } else {
>                   error_setg(errp, "RDMA ERROR: "
>                              "You only have RoCE / iWARP devices in your systems"
> @@ -1418,12 +1418,8 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
>           block->remote_keys[chunk] = 0;
>   
>           if (ret != 0) {
> -            /*
> -             * FIXME perror() is problematic, bcause ibv_dereg_mr() is
> -             * not documented to set errno.  Will go away later in
> -             * this series.
> -             */
> -            perror("unregistration chunk failed");
> +            error_report("unregistration chunk failed: %s",
> +                         strerror(ret));
>               return -1;
>           }
>           rdma->total_registrations--;
> @@ -3767,7 +3763,8 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>                   block->pmr[reg->key.chunk] = NULL;
>   
>                   if (ret != 0) {
> -                    perror("rdma unregistration chunk failed");
> +                    error_report("rdma unregistration chunk failed: %s",
> +                                 strerror(errno));
>                       goto err;
>                   }
>   
> @@ -3956,10 +3953,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
>            */
>   
>           if (local->nb_blocks != nb_dest_blocks) {
> -            fprintf(stderr, "ram blocks mismatch (Number of blocks %d vs %d) "
> -                    "Your QEMU command line parameters are probably "
> -                    "not identical on both the source and destination.",
> -                    local->nb_blocks, nb_dest_blocks);
> +            error_report("ram blocks mismatch (Number of blocks %d vs %d)",
> +                         local->nb_blocks, nb_dest_blocks);
> +            error_printf("Your QEMU command line parameters are probably "
> +                         "not identical on both the source and destination.");
>               rdma->errored = true;
>               return -1;
>           }
> @@ -3972,10 +3969,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
>   
>               /* We require that the blocks are in the same order */
>               if (rdma->dest_blocks[i].length != local->block[i].length) {
> -                fprintf(stderr, "Block %s/%d has a different length %" PRIu64
> -                        "vs %" PRIu64, local->block[i].block_name, i,
> -                        local->block[i].length,
> -                        rdma->dest_blocks[i].length);
> +                error_report("Block %s/%d has a different length %" PRIu64
> +                             "vs %" PRIu64,
> +                             local->block[i].block_name, i,
> +                             local->block[i].length,
> +                             rdma->dest_blocks[i].length);
>                   rdma->errored = true;
>                   return -1;
>               }
> @@ -4091,7 +4089,7 @@ static void rdma_accept_incoming_migration(void *opaque)
>       ret = qemu_rdma_accept(rdma);
>   
>       if (ret < 0) {
> -        fprintf(stderr, "RDMA ERROR: Migration initialization failed\n");
> +        error_report("RDMA ERROR: Migration initialization failed");
>           return;
>       }
>   
> @@ -4103,7 +4101,7 @@ static void rdma_accept_incoming_migration(void *opaque)
>   
>       f = rdma_new_input(rdma);
>       if (f == NULL) {
> -        fprintf(stderr, "RDMA ERROR: could not open RDMA for input\n");
> +        error_report("RDMA ERROR: could not open RDMA for input");
>           qemu_rdma_cleanup(rdma);
>           return;
>       }

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 53/53] migration/rdma: Replace flawed device detail dump by tracing
  2023-09-28 13:20 ` [PATCH v2 53/53] migration/rdma: Replace flawed device detail dump by tracing Markus Armbruster
  2023-09-29 17:05   ` Fabiano Rosas
  2023-10-04 17:50   ` Juan Quintela
@ 2023-10-07  3:57   ` Zhijian Li (Fujitsu)
  2 siblings, 0 replies; 121+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-10-07  3:57 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras, farosas, eblake



On 28/09/2023 21:20, Markus Armbruster wrote:
> qemu_rdma_dump_id() dumps RDMA device details to stdout.
> 
> rdma_start_outgoing_migration() calls it via qemu_rdma_source_init()
> and qemu_rdma_resolve_host() to show source device details.
> rdma_start_incoming_migration() arranges its call via
> rdma_accept_incoming_migration() and qemu_rdma_accept() to show
> destination device details.
> 
> Two issues:
> 
> 1. rdma_start_outgoing_migration() can run in HMP context.  The
>     information should arguably go the monitor, not stdout.
> 
> 2. ibv_query_port() failure is reported as error.  Its callers remain
>     unaware of this failure (qemu_rdma_dump_id() can't fail), so
>     reporting this to the user as an error is problematic.
> 
> Fixable, but the device detail dump is noise, except when
> troubleshooting.  Tracing is a better fit.  Similar function
> qemu_rdma_dump_id() was converted to tracing in commit
> 733252deb8b (Tracify migration/rdma.c).
> 
> Convert qemu_rdma_dump_id(), too.
> 
> While there, touch up qemu_rdma_dump_gid()'s outdated comment.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> ---
>   migration/rdma.c       | 23 ++++++++---------------
>   migration/trace-events |  2 ++
>   2 files changed, 10 insertions(+), 15 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index dba0802fca..07aef9a071 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -734,38 +734,31 @@ static void rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
>   }
>   
>   /*
> - * Put in the log file which RDMA device was opened and the details
> - * associated with that device.
> + * Trace RDMA device open, with device details.
>    */
>   static void qemu_rdma_dump_id(const char *who, struct ibv_context *verbs)
>   {
>       struct ibv_port_attr port;
>   
>       if (ibv_query_port(verbs, 1, &port)) {
> -        error_report("Failed to query port information");
> +        trace_qemu_rdma_dump_id_failed(who);
>           return;
>       }
>   
> -    printf("%s RDMA Device opened: kernel name %s "
> -           "uverbs device name %s, "
> -           "infiniband_verbs class device path %s, "
> -           "infiniband class device path %s, "
> -           "transport: (%d) %s\n",
> -                who,
> +    trace_qemu_rdma_dump_id(who,
>                   verbs->device->name,
>                   verbs->device->dev_name,
>                   verbs->device->dev_path,
>                   verbs->device->ibdev_path,
>                   port.link_layer,
> -                (port.link_layer == IBV_LINK_LAYER_INFINIBAND) ? "Infiniband" :
> -                 ((port.link_layer == IBV_LINK_LAYER_ETHERNET)
> -                    ? "Ethernet" : "Unknown"));
> +                port.link_layer == IBV_LINK_LAYER_INFINIBAND ? "Infiniband"
> +                : port.link_layer == IBV_LINK_LAYER_ETHERNET ? "Ethernet"
> +                : "Unknown");
>   }
>   
>   /*
> - * Put in the log file the RDMA gid addressing information,
> - * useful for folks who have trouble understanding the
> - * RDMA device hierarchy in the kernel.
> + * Trace RDMA gid addressing information.
> + * Useful for understanding the RDMA device hierarchy in the kernel.
>    */
>   static void qemu_rdma_dump_gid(const char *who, struct rdma_cm_id *id)
>   {
> diff --git a/migration/trace-events b/migration/trace-events
> index d733107ec6..4ce16ae866 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -213,6 +213,8 @@ qemu_rdma_close(void) ""
>   qemu_rdma_connect_pin_all_requested(void) ""
>   qemu_rdma_connect_pin_all_outcome(bool pin) "%d"
>   qemu_rdma_dest_init_trying(const char *host, const char *ip) "%s => %s"
> +qemu_rdma_dump_id_failed(const char *who) "%s RDMA Device opened, but can't query port information"
> +qemu_rdma_dump_id(const char *who, const char *name, const char *dev_name, const char *dev_path, const char *ibdev_path, int transport, const char *transport_name) "%s RDMA Device opened: kernel name %s uverbs device name %s, infiniband_verbs class device path %s, infiniband class device path %s, transport: (%d) %s"
>   qemu_rdma_dump_gid(const char *who, const char *src, const char *dst) "%s Source GID: %s, Dest GID: %s"
>   qemu_rdma_exchange_get_response_start(const char *desc) "CONTROL: %s receiving..."
>   qemu_rdma_exchange_get_response_none(const char *desc, int type) "Surprise: got %s (%d)"

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 16/53] migration/rdma: Fix or document problematic uses of errno
  2023-09-28 13:19 ` [PATCH v2 16/53] migration/rdma: Fix or document problematic uses of errno Markus Armbruster
  2023-09-29 15:09   ` Fabiano Rosas
  2023-10-05  6:46   ` Juan Quintela
@ 2023-10-07  5:34   ` Zhijian Li (Fujitsu)
  2 siblings, 0 replies; 121+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-10-07  5:34 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras, farosas, eblake



On 28/09/2023 21:19, Markus Armbruster wrote:
> We use errno after calling Libibverbs functions that are not
> documented to set errno (manual page does not mention errno), or where
> the documentation is unclear ("returns [...] the value of errno on
> failure").  While this could be read as "sets errno and returns it",
> a glance at the source code[*] kills that hope:
> 
>      static inline int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr,
>                                      struct ibv_send_wr **bad_wr)
>      {
>              return qp->context->ops.post_send(qp, wr, bad_wr);
>      }
> 
> The callback can be
> 
>      static int mana_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
>                                struct ibv_send_wr **bad)
>      {
>              /* This version of driver supports RAW QP only.
>               * Posting WR is done directly in the application.
>               */
>              return EOPNOTSUPP;
>      }
> 
> Neither of them touches errno.
> 
> One of these errno uses is easy to fix, so do that now.  Several more
> will go away later in the series; add temporary FIXME commments.
> Three will remain; add TODO comments.  TODO, not FIXME, because the
> bug might be in Libibverbs documentation.
> 
> [*] https://github.com/linux-rdma/rdma-core.git
>      commit 55fa316b4b18f258d8ac1ceb4aa5a7a35b094dcf
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>


Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> ---
>   migration/rdma.c | 45 +++++++++++++++++++++++++++++++++++++++------
>   1 file changed, 39 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 28097ce604..bba8c99fa9 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -853,6 +853,12 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
>   
>           for (x = 0; x < num_devices; x++) {
>               verbs = ibv_open_device(dev_list[x]);
> +            /*
> +             * ibv_open_device() is not documented to set errno.  If
> +             * it does, it's somebody else's doc bug.  If it doesn't,
> +             * the use of errno below is wrong.
> +             * TODO Find out whether ibv_open_device() sets errno.
> +             */
>               if (!verbs) {
>                   if (errno == EPERM) {
>                       continue;
> @@ -1162,11 +1168,7 @@ static void qemu_rdma_advise_prefetch_mr(struct ibv_pd *pd, uint64_t addr,
>       ret = ibv_advise_mr(pd, advice,
>                           IBV_ADVISE_MR_FLAG_FLUSH, &sg_list, 1);
>       /* ignore the error */
> -    if (ret) {
> -        trace_qemu_rdma_advise_mr(name, len, addr, strerror(errno));
> -    } else {
> -        trace_qemu_rdma_advise_mr(name, len, addr, "successed");
> -    }
> +    trace_qemu_rdma_advise_mr(name, len, addr, strerror(ret));
>   #endif
>   }
>   
> @@ -1183,7 +1185,12 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
>                       local->block[i].local_host_addr,
>                       local->block[i].length, access
>                       );
> -
> +        /*
> +         * ibv_reg_mr() is not documented to set errno.  If it does,
> +         * it's somebody else's doc bug.  If it doesn't, the use of
> +         * errno below is wrong.
> +         * TODO Find out whether ibv_reg_mr() sets errno.
> +         */
>           if (!local->block[i].mr &&
>               errno == ENOTSUP && rdma_support_odp(rdma->verbs)) {
>                   access |= IBV_ACCESS_ON_DEMAND;
> @@ -1291,6 +1298,12 @@ static int qemu_rdma_register_and_get_keys(RDMAContext *rdma,
>           trace_qemu_rdma_register_and_get_keys(len, chunk_start);
>   
>           block->pmr[chunk] = ibv_reg_mr(rdma->pd, chunk_start, len, access);
> +        /*
> +         * ibv_reg_mr() is not documented to set errno.  If it does,
> +         * it's somebody else's doc bug.  If it doesn't, the use of
> +         * errno below is wrong.
> +         * TODO Find out whether ibv_reg_mr() sets errno.
> +         */
>           if (!block->pmr[chunk] &&
>               errno == ENOTSUP && rdma_support_odp(rdma->verbs)) {
>               access |= IBV_ACCESS_ON_DEMAND;
> @@ -1408,6 +1421,11 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
>           block->remote_keys[chunk] = 0;
>   
>           if (ret != 0) {
> +            /*
> +             * FIXME perror() is problematic, bcause ibv_dereg_mr() is
> +             * not documented to set errno.  Will go away later in
> +             * this series.
> +             */
>               perror("unregistration chunk failed");
>               return -ret;
>           }
> @@ -1658,6 +1676,11 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
>   
>           ret = ibv_get_cq_event(ch, &cq, &cq_ctx);
>           if (ret) {
> +            /*
> +             * FIXME perror() is problematic, because ibv_reg_mr() is
> +             * not documented to set errno.  Will go away later in
> +             * this series.
> +             */
>               perror("ibv_get_cq_event");
>               goto err_block_for_wrid;
>           }
> @@ -2199,6 +2222,11 @@ retry:
>           goto retry;
>   
>       } else if (ret > 0) {
> +        /*
> +         * FIXME perror() is problematic, because whether
> +         * ibv_post_send() sets errno is unclear.  Will go away later
> +         * in this series.
> +         */
>           perror("rdma migration: post rdma write failed");
>           return -ret;
>       }
> @@ -2559,6 +2587,11 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
>           ret = rdma_get_cm_event(rdma->channel, &cm_event);
>       }
>       if (ret) {
> +        /*
> +         * FIXME perror() is wrong, because
> +         * qemu_get_cm_event_timeout() can fail without setting errno.
> +         * Will go away later in this series.
> +         */
>           perror("rdma_get_cm_event after rdma_connect");
>           ERROR(errp, "connecting to destination!");
>           goto err_rdma_source_connect;

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 22/53] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port
  2023-09-28 13:19 ` [PATCH v2 22/53] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port Markus Armbruster
  2023-09-29 15:10   ` Fabiano Rosas
  2023-10-04 15:24   ` Juan Quintela
@ 2023-10-07  5:36   ` Zhijian Li (Fujitsu)
  2 siblings, 0 replies; 121+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-10-07  5:36 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: quintela, peterx, leobras, farosas, eblake



On 28/09/2023 21:19, Markus Armbruster wrote:
> qemu_rdma_data_init() neglects to set an Error when it fails because
> @host_port is null.  Fortunately, no caller passes null, so this is
> merely a latent bug.  Drop the flawed code handling null argument.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> ---
>   migration/rdma.c | 29 +++++++++++++----------------
>   1 file changed, 13 insertions(+), 16 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 1a0ad44411..1ae2f87906 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2747,25 +2747,22 @@ static RDMAContext *qemu_rdma_data_init(const char *host_port, Error **errp)
>       RDMAContext *rdma = NULL;
>       InetSocketAddress *addr;
>   
> -    if (host_port) {
> -        rdma = g_new0(RDMAContext, 1);
> -        rdma->current_index = -1;
> -        rdma->current_chunk = -1;
> +    rdma = g_new0(RDMAContext, 1);
> +    rdma->current_index = -1;
> +    rdma->current_chunk = -1;
>   
> -        addr = g_new(InetSocketAddress, 1);
> -        if (!inet_parse(addr, host_port, NULL)) {
> -            rdma->port = atoi(addr->port);
> -            rdma->host = g_strdup(addr->host);
> -            rdma->host_port = g_strdup(host_port);
> -        } else {
> -            ERROR(errp, "bad RDMA migration address '%s'", host_port);
> -            g_free(rdma);
> -            rdma = NULL;
> -        }
> -
> -        qapi_free_InetSocketAddress(addr);
> +    addr = g_new(InetSocketAddress, 1);
> +    if (!inet_parse(addr, host_port, NULL)) {
> +        rdma->port = atoi(addr->port);
> +        rdma->host = g_strdup(addr->host);
> +        rdma->host_port = g_strdup(host_port);
> +    } else {
> +        ERROR(errp, "bad RDMA migration address '%s'", host_port);
> +        g_free(rdma);
> +        rdma = NULL;
>       }
>   
> +    qapi_free_InetSocketAddress(addr);
>       return rdma;
>   }
>   

^ permalink raw reply	[flat|nested] 121+ messages in thread

end of thread, other threads:[~2023-10-07  5:36 UTC | newest]

Thread overview: 121+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-28 13:19 [PATCH v2 00/53] migration/rdma: Error handling fixes Markus Armbruster
2023-09-28 13:19 ` [PATCH v2 01/53] migration/rdma: Clean up qemu_rdma_poll()'s return type Markus Armbruster
2023-10-04 14:26   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 02/53] migration/rdma: Clean up qemu_rdma_data_init()'s " Markus Armbruster
2023-10-04 14:35   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 03/53] migration/rdma: Clean up rdma_delete_block()'s " Markus Armbruster
2023-10-04 14:36   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 04/53] migration/rdma: Drop fragile wr_id formatting Markus Armbruster
2023-10-04 14:38   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 05/53] migration/rdma: Consistently use uint64_t for work request IDs Markus Armbruster
2023-10-04 14:39   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 06/53] migration/rdma: Fix unwanted integer truncation Markus Armbruster
2023-09-28 14:20   ` Fabiano Rosas
2023-10-04 14:41   ` Juan Quintela
2023-10-07  1:53   ` Zhijian Li (Fujitsu)
2023-09-28 13:19 ` [PATCH v2 07/53] migration/rdma: Clean up two more harmless signed vs. unsigned issues Markus Armbruster
2023-10-04 14:44   ` Juan Quintela
2023-10-07  2:38   ` Zhijian Li (Fujitsu)
2023-09-28 13:19 ` [PATCH v2 08/53] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage Markus Armbruster
2023-10-04 14:50   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 09/53] migration/rdma: Fix qemu_rdma_accept() to return failure on errors Markus Armbruster
2023-10-04 14:51   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 10/53] migration/rdma: Put @errp parameter last Markus Armbruster
2023-10-04 14:54   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 11/53] migration/rdma: Eliminate error_propagate() Markus Armbruster
2023-10-04 14:58   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 12/53] migration/rdma: Drop rdma_add_block() error handling Markus Armbruster
2023-10-04 14:58   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 13/53] migration/rdma: Drop qemu_rdma_search_ram_block() " Markus Armbruster
2023-10-04 15:00   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 14/53] migration/rdma: Make qemu_rdma_buffer_mergeable() return bool Markus Armbruster
2023-10-04 15:01   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 15/53] migration/rdma: Use bool for two RDMAContext flags Markus Armbruster
2023-10-04 15:56   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 16/53] migration/rdma: Fix or document problematic uses of errno Markus Armbruster
2023-09-29 15:09   ` Fabiano Rosas
2023-10-04 11:12     ` Markus Armbruster
2023-10-05  6:46   ` Juan Quintela
2023-10-07  5:34   ` Zhijian Li (Fujitsu)
2023-09-28 13:19 ` [PATCH v2 17/53] migration/rdma: Ditch useless numeric error codes in error messages Markus Armbruster
2023-10-04 15:06   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 18/53] migration/rdma: Fix io_writev(), io_readv() methods to obey contract Markus Armbruster
2023-10-04 15:09   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 19/53] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE() Markus Armbruster
2023-10-04 15:10   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 20/53] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error Markus Armbruster
2023-10-04 15:10   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 21/53] migration/rdma: Fix qemu_get_cm_event_timeout() to always " Markus Armbruster
2023-10-04 15:25   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 22/53] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port Markus Armbruster
2023-09-29 15:10   ` Fabiano Rosas
2023-10-04 15:24   ` Juan Quintela
2023-10-07  5:36   ` Zhijian Li (Fujitsu)
2023-09-28 13:19 ` [PATCH v2 23/53] migration/rdma: Fix QEMUFileHooks method return values Markus Armbruster
2023-10-04 15:28   ` Juan Quintela
2023-10-04 16:22   ` Juan Quintela
2023-10-04 16:37     ` Markus Armbruster
2023-09-28 13:19 ` [PATCH v2 24/53] migration/rdma: Fix rdma_getaddrinfo() error checking Markus Armbruster
2023-10-04 15:30   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 25/53] migration/rdma: Return -1 instead of negative errno code Markus Armbruster
2023-10-04 16:19   ` Juan Quintela
2023-10-04 16:23   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 26/53] migration/rdma: Dumb down remaining int error values to -1 Markus Armbruster
2023-10-04 16:25   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 27/53] migration/rdma: Replace int error_state by bool errored Markus Armbruster
2023-10-04 16:25   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 28/53] migration/rdma: Drop superfluous assignments to @ret Markus Armbruster
2023-10-04 16:27   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 29/53] migration/rdma: Check negative error values the same way everywhere Markus Armbruster
2023-09-29 15:28   ` Fabiano Rosas
2023-10-04 16:33   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 30/53] migration/rdma: Plug a memory leak and improve a message Markus Armbruster
2023-10-04 16:27   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 31/53] migration/rdma: Delete inappropriate error_report() in macro ERROR() Markus Armbruster
2023-10-04 16:50   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 32/53] migration/rdma: Retire " Markus Armbruster
2023-10-04 16:50   ` Juan Quintela
2023-09-28 13:19 ` [PATCH v2 33/53] migration/rdma: Fix error handling around rdma_getaddrinfo() Markus Armbruster
2023-10-04 16:51   ` Juan Quintela
2023-09-28 13:20 ` [PATCH v2 34/53] migration/rdma: Drop "@errp is clear" guards around error_setg() Markus Armbruster
2023-10-04 16:52   ` Juan Quintela
2023-09-28 13:20 ` [PATCH v2 35/53] migration/rdma: Convert qemu_rdma_exchange_recv() to Error Markus Armbruster
2023-10-04 16:53   ` Juan Quintela
2023-09-28 13:20 ` [PATCH v2 36/53] migration/rdma: Convert qemu_rdma_exchange_send() " Markus Armbruster
2023-10-04 16:55   ` Juan Quintela
2023-09-28 13:20 ` [PATCH v2 37/53] migration/rdma: Convert qemu_rdma_exchange_get_response() " Markus Armbruster
2023-10-04 16:55   ` Juan Quintela
2023-09-28 13:20 ` [PATCH v2 38/53] migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() " Markus Armbruster
2023-10-04 16:56   ` Juan Quintela
2023-09-28 13:20 ` [PATCH v2 39/53] migration/rdma: Convert qemu_rdma_write_flush() " Markus Armbruster
2023-10-04 16:56   ` Juan Quintela
2023-09-28 13:20 ` [PATCH v2 40/53] migration/rdma: Convert qemu_rdma_write_one() " Markus Armbruster
2023-10-04 16:56   ` Juan Quintela
2023-09-28 13:20 ` [PATCH v2 41/53] migration/rdma: Convert qemu_rdma_write() " Markus Armbruster
2023-10-04 17:23   ` Juan Quintela
2023-09-28 13:20 ` [PATCH v2 42/53] migration/rdma: Convert qemu_rdma_post_send_control() " Markus Armbruster
2023-09-28 13:20 ` [PATCH v2 43/53] migration/rdma: Convert qemu_rdma_post_recv_control() " Markus Armbruster
2023-09-28 13:20 ` [PATCH v2 44/53] migration/rdma: Convert qemu_rdma_alloc_pd_cq() " Markus Armbruster
2023-09-28 13:20 ` [PATCH v2 45/53] migration/rdma: Silence qemu_rdma_resolve_host() Markus Armbruster
2023-09-28 13:20 ` [PATCH v2 46/53] migration/rdma: Silence qemu_rdma_connect() Markus Armbruster
2023-09-28 13:20 ` [PATCH v2 47/53] migration/rdma: Silence qemu_rdma_reg_control() Markus Armbruster
2023-09-28 13:20 ` [PATCH v2 48/53] migration/rdma: Don't report received completion events as error Markus Armbruster
2023-09-28 13:20 ` [PATCH v2 49/53] migration/rdma: Silence qemu_rdma_block_for_wrid() Markus Armbruster
2023-09-28 13:20 ` [PATCH v2 50/53] migration/rdma: Silence qemu_rdma_register_and_get_keys() Markus Armbruster
2023-09-28 13:20 ` [PATCH v2 51/53] migration/rdma: Downgrade qemu_rdma_cleanup() errors to warnings Markus Armbruster
2023-09-29 15:29   ` Fabiano Rosas
2023-10-04 17:47   ` Juan Quintela
2023-10-07  3:50   ` Zhijian Li (Fujitsu)
2023-09-28 13:20 ` [PATCH v2 52/53] migration/rdma: Use error_report() & friends instead of stderr Markus Armbruster
2023-09-29 15:36   ` Fabiano Rosas
2023-10-04 11:15     ` Markus Armbruster
2023-10-04 13:52       ` Fabiano Rosas
2023-10-05  7:24   ` Juan Quintela
2023-10-07  3:56   ` Zhijian Li (Fujitsu)
2023-09-28 13:20 ` [PATCH v2 53/53] migration/rdma: Replace flawed device detail dump by tracing Markus Armbruster
2023-09-29 17:05   ` Fabiano Rosas
2023-10-04 17:50   ` Juan Quintela
2023-10-07  3:57   ` Zhijian Li (Fujitsu)
2023-10-04 17:52 ` [PATCH v2 00/53] migration/rdma: Error handling fixes Juan Quintela
2023-10-05  5:07   ` Markus Armbruster
2023-10-05  6:37     ` Juan Quintela

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.