All of lore.kernel.org
 help / color / mirror / Atom feed
From: Li Zhijian <lizhijian@cn.fujitsu.com>
To: <quintela@redhat.com>, <dgilbert@redhat.com>, <qemu-devel@nongnu.org>
Cc: Li Zhijian <lizhijian@cn.fujitsu.com>
Subject: [PATCH 2/2] migration/rdma: advise prefetch write for ODP region
Date: Sat, 31 Jul 2021 22:03:32 +0800	[thread overview]
Message-ID: <20210731140332.8701-3-lizhijian@cn.fujitsu.com> (raw)
In-Reply-To: <20210731140332.8701-1-lizhijian@cn.fujitsu.com>

The responder mr registering with ODP will sent RNR NAK back to
the requester in the face of the page fault.
---------
ibv_poll_cq wc.status=13 RNR retry counter exceeded!
ibv_poll_cq wrid=WRITE RDMA!
---------
ibv_advise_mr(3) helps to make pages present before the actual IO is
conducted so that the responder does page fault as little as possible.

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 migration/rdma.c       | 40 ++++++++++++++++++++++++++++++++++++++++
 migration/trace-events |  1 +
 2 files changed, 41 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index 8784b5f22a6..a2ad00d665f 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1117,6 +1117,30 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma)
     return 0;
 }
 
+/*
+ * ibv_advise_mr to avoid RNR NAK error as far as possible.
+ * The responder mr registering with ODP will sent RNR NAK back to
+ * the requester in the face of the page fault.
+ */
+static void qemu_rdma_advise_prefetch_write_mr(struct ibv_pd *pd, uint64_t addr,
+                                               uint32_t len,  uint32_t lkey,
+                                               const char *name, bool wr)
+{
+    int ret;
+    int advice = wr ? IBV_ADVISE_MR_ADVICE_PREFETCH_WRITE :
+                 IBV_ADVISE_MR_ADVICE_PREFETCH;
+    struct ibv_sge sg_list = {.lkey = lkey, .addr = addr, .length = len};
+
+    ret = ibv_advise_mr(pd, advice,
+                        IB_UVERBS_ADVISE_MR_FLAG_FLUSH, &sg_list, 1);
+    /* ignore the error */
+    if (ret) {
+        trace_qemu_rdma_advise_mr(name, len, addr, strerror(errno));
+    } else {
+        trace_qemu_rdma_advise_mr(name, len, addr, "successed");
+    }
+}
+
 static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
 {
     int i;
@@ -1140,6 +1164,17 @@ on_demand:
             perror("Failed to register local dest ram block!\n");
             break;
         }
+
+        if (access & IBV_ACCESS_ON_DEMAND) {
+            qemu_rdma_advise_prefetch_write_mr(rdma->pd,
+                                               (uintptr_t)
+                                               local->block[i].local_host_addr,
+                                               local->block[i].length,
+                                               local->block[i].mr->lkey,
+                                               local->block[i].block_name,
+                                               true);
+        }
+
         rdma->total_registrations++;
     }
 
@@ -1244,6 +1279,11 @@ on_demand:
                             rdma->total_registrations);
             return -1;
         }
+        if (access & IBV_ACCESS_ON_DEMAND) {
+            qemu_rdma_advise_prefetch_write_mr(rdma->pd, (uintptr_t)chunk_start,
+                                               len, block->pmr[chunk]->lkey,
+                                               block->block_name, rkey);
+        }
         rdma->total_registrations++;
     }
 
diff --git a/migration/trace-events b/migration/trace-events
index 5f6aa580def..901c1d54c12 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -213,6 +213,7 @@ qemu_rdma_poll_other(const char *compstr, int64_t comp, int left) "other complet
 qemu_rdma_post_send_control(const char *desc) "CONTROL: sending %s.."
 qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %" PRIu64 " bytes @ %p"
 qemu_rdma_register_odp_mr(const char *name) "Try to register On-Demand Paging memory region: %s"
+qemu_rdma_advise_mr(const char *name, uint32_t len, uint64_t addr, const char *res) "Try to advise block %s prefetch write at %" PRIu32 "@0x%" PRIx64 ": %s"
 qemu_rdma_registration_handle_compress(int64_t length, int index, int64_t offset) "Zapping zero chunk: %" PRId64 " bytes, index %d, offset %" PRId64
 qemu_rdma_registration_handle_finished(void) ""
 qemu_rdma_registration_handle_ram_blocks(void) ""
-- 
2.31.1





  parent reply	other threads:[~2021-07-31 14:03 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-31 14:03 [PATCH 0/2] enable fsdax rdma migration Li Zhijian
2021-07-31 14:03 ` [PATCH 1/2] migration/rdma: Try to register On-Demand Paging memory region Li Zhijian
2021-08-22  8:53   ` Marcel Apfelbaum
2021-08-23  1:41     ` lizhijian
2021-08-23  7:03       ` Marcel Apfelbaum
2021-07-31 14:03 ` Li Zhijian [this message]
2021-08-22  8:39   ` [PATCH 2/2] migration/rdma: advise prefetch write for ODP region Marcel Apfelbaum
2021-08-23  1:21     ` lizhijian
2021-08-16  2:10 ` [PATCH 0/2] enable fsdax rdma migration lizhijian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210731140332.8701-3-lizhijian@cn.fujitsu.com \
    --to=lizhijian@cn.fujitsu.com \
    --cc=dgilbert@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.