From: Li Zhijian <lizhijian@cn.fujitsu.com>
To: <quintela@redhat.com>, <dgilbert@redhat.com>, <qemu-devel@nongnu.org>
Cc: Li Zhijian <lizhijian@cn.fujitsu.com>
Subject: [PATCH 2/2] migration/rdma: advise prefetch write for ODP region
Date: Sat, 31 Jul 2021 22:03:32 +0800 [thread overview]
Message-ID: <20210731140332.8701-3-lizhijian@cn.fujitsu.com> (raw)
In-Reply-To: <20210731140332.8701-1-lizhijian@cn.fujitsu.com>
The responder mr registering with ODP will sent RNR NAK back to
the requester in the face of the page fault.
---------
ibv_poll_cq wc.status=13 RNR retry counter exceeded!
ibv_poll_cq wrid=WRITE RDMA!
---------
ibv_advise_mr(3) helps to make pages present before the actual IO is
conducted so that the responder does page fault as little as possible.
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
migration/rdma.c | 40 ++++++++++++++++++++++++++++++++++++++++
migration/trace-events | 1 +
2 files changed, 41 insertions(+)
diff --git a/migration/rdma.c b/migration/rdma.c
index 8784b5f22a6..a2ad00d665f 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1117,6 +1117,30 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma)
return 0;
}
+/*
+ * ibv_advise_mr to avoid RNR NAK error as far as possible.
+ * The responder mr registering with ODP will sent RNR NAK back to
+ * the requester in the face of the page fault.
+ */
+static void qemu_rdma_advise_prefetch_write_mr(struct ibv_pd *pd, uint64_t addr,
+ uint32_t len, uint32_t lkey,
+ const char *name, bool wr)
+{
+ int ret;
+ int advice = wr ? IBV_ADVISE_MR_ADVICE_PREFETCH_WRITE :
+ IBV_ADVISE_MR_ADVICE_PREFETCH;
+ struct ibv_sge sg_list = {.lkey = lkey, .addr = addr, .length = len};
+
+ ret = ibv_advise_mr(pd, advice,
+ IB_UVERBS_ADVISE_MR_FLAG_FLUSH, &sg_list, 1);
+ /* ignore the error */
+ if (ret) {
+ trace_qemu_rdma_advise_mr(name, len, addr, strerror(errno));
+ } else {
+ trace_qemu_rdma_advise_mr(name, len, addr, "successed");
+ }
+}
+
static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
{
int i;
@@ -1140,6 +1164,17 @@ on_demand:
perror("Failed to register local dest ram block!\n");
break;
}
+
+ if (access & IBV_ACCESS_ON_DEMAND) {
+ qemu_rdma_advise_prefetch_write_mr(rdma->pd,
+ (uintptr_t)
+ local->block[i].local_host_addr,
+ local->block[i].length,
+ local->block[i].mr->lkey,
+ local->block[i].block_name,
+ true);
+ }
+
rdma->total_registrations++;
}
@@ -1244,6 +1279,11 @@ on_demand:
rdma->total_registrations);
return -1;
}
+ if (access & IBV_ACCESS_ON_DEMAND) {
+ qemu_rdma_advise_prefetch_write_mr(rdma->pd, (uintptr_t)chunk_start,
+ len, block->pmr[chunk]->lkey,
+ block->block_name, rkey);
+ }
rdma->total_registrations++;
}
diff --git a/migration/trace-events b/migration/trace-events
index 5f6aa580def..901c1d54c12 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -213,6 +213,7 @@ qemu_rdma_poll_other(const char *compstr, int64_t comp, int left) "other complet
qemu_rdma_post_send_control(const char *desc) "CONTROL: sending %s.."
qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %" PRIu64 " bytes @ %p"
qemu_rdma_register_odp_mr(const char *name) "Try to register On-Demand Paging memory region: %s"
+qemu_rdma_advise_mr(const char *name, uint32_t len, uint64_t addr, const char *res) "Try to advise block %s prefetch write at %" PRIu32 "@0x%" PRIx64 ": %s"
qemu_rdma_registration_handle_compress(int64_t length, int index, int64_t offset) "Zapping zero chunk: %" PRId64 " bytes, index %d, offset %" PRId64
qemu_rdma_registration_handle_finished(void) ""
qemu_rdma_registration_handle_ram_blocks(void) ""
--
2.31.1
next prev parent reply other threads:[~2021-07-31 14:03 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-31 14:03 [PATCH 0/2] enable fsdax rdma migration Li Zhijian
2021-07-31 14:03 ` [PATCH 1/2] migration/rdma: Try to register On-Demand Paging memory region Li Zhijian
2021-08-22 8:53 ` Marcel Apfelbaum
2021-08-23 1:41 ` lizhijian
2021-08-23 7:03 ` Marcel Apfelbaum
2021-07-31 14:03 ` Li Zhijian [this message]
2021-08-22 8:39 ` [PATCH 2/2] migration/rdma: advise prefetch write for ODP region Marcel Apfelbaum
2021-08-23 1:21 ` lizhijian
2021-08-16 2:10 ` [PATCH 0/2] enable fsdax rdma migration lizhijian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210731140332.8701-3-lizhijian@cn.fujitsu.com \
--to=lizhijian@cn.fujitsu.com \
--cc=dgilbert@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.