QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [Qemu-devel] [PATCH 0/2] rdma: Utilize ibv_reg_mr_iova for memory registration
@ 2019-08-18 13:21 Yuval Shaia
  2019-08-18 13:21 ` [Qemu-devel] [PATCH 1/2] configure: Check if we can use ibv_reg_mr_iova Yuval Shaia
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Yuval Shaia @ 2019-08-18 13:21 UTC (permalink / raw)
  To: berrange, thuth, philmd, pbonzini, laurent, qemu-devel,
	yuval.shaia, marcel.apfelbaum

The virtual address that is provided by the guest in post_send and
post_recv operations is related to the guest address space. This
address
space is unknown to the HCA resides on host so extra step in these
operations is needed to adjust the address to host virtual address.

This step, which is done in data-path affects performances.

An enhanced verion of MR registration introduced here
https://patchwork.kernel.org/patch/11044467/ can be used so that the
guest virtual address space for this MR is known to the HCA in host.

This will save the data-path adjustment.

patch #1 deals with what is needed to detect if the library installed in
the host supports this function
patch #2 enhance the data-path ops by utilizing the new function

Yuval Shaia (2):
  configure: Check if we can use ibv_reg_mr_iova
  hw/rdma: Utilize ibv_reg_mr_iova for memory registration

 configure                 | 28 ++++++++++++++++++++++++++++
 hw/rdma/rdma_backend.c    | 13 +++++++++++++
 hw/rdma/rdma_backend.h    |  5 +++++
 hw/rdma/rdma_rm.c         |  5 +++++
 hw/rdma/vmw/pvrdma_main.c |  6 ++++++
 5 files changed, 57 insertions(+)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 1/2] configure: Check if we can use ibv_reg_mr_iova
  2019-08-18 13:21 [Qemu-devel] [PATCH 0/2] rdma: Utilize ibv_reg_mr_iova for memory registration Yuval Shaia
@ 2019-08-18 13:21 ` Yuval Shaia
  2019-08-31 19:28   ` Marcel Apfelbaum
  2019-08-18 13:21 ` [Qemu-devel] [PATCH 2/2] hw/rdma: Utilize ibv_reg_mr_iova for memory registration Yuval Shaia
  2019-08-31 19:26 ` [Qemu-devel] [PATCH 0/2] rdma: " Marcel Apfelbaum
  2 siblings, 1 reply; 8+ messages in thread
From: Yuval Shaia @ 2019-08-18 13:21 UTC (permalink / raw)
  To: berrange, thuth, philmd, pbonzini, laurent, qemu-devel,
	yuval.shaia, marcel.apfelbaum

The function reg_mr_iova is an enhanced version of ibv_reg_mr function
that can help to easly register and use guest's MRs.

Add check in 'configure' phase to detect if we have libibverbs with this
support.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 configure | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/configure b/configure
index 714e7fb6a1..e8e4a57b6f 100755
--- a/configure
+++ b/configure
@@ -3205,6 +3205,34 @@ else
     pvrdma="no"
 fi
 
+# Let's see if enhanced reg_mr is supported
+if test "$pvrdma" = "yes" ; then
+
+cat > $TMPC <<EOF &&
+#include <infiniband/verbs.h>
+
+int
+main(void)
+{
+    struct ibv_mr *mr;
+    struct ibv_pd *pd = NULL;
+    size_t length = 10;
+    uint64_t iova = 0;
+    int access = 0;
+    void *addr = NULL;
+
+    mr = ibv_reg_mr_iova(pd, addr, length, iova, access);
+
+    ibv_dereg_mr(mr);
+
+    return 0;
+}
+EOF
+    if ! compile_prog "" "-libverbs"; then
+        QEMU_CFLAGS="$QEMU_CFLAGS -DLEGACY_RDMA_REG_MR"
+    fi
+fi
+
 ##########################################
 # VNC SASL detection
 if test "$vnc" = "yes" && test "$vnc_sasl" != "no" ; then
-- 
2.20.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 2/2] hw/rdma: Utilize ibv_reg_mr_iova for memory registration
  2019-08-18 13:21 [Qemu-devel] [PATCH 0/2] rdma: Utilize ibv_reg_mr_iova for memory registration Yuval Shaia
  2019-08-18 13:21 ` [Qemu-devel] [PATCH 1/2] configure: Check if we can use ibv_reg_mr_iova Yuval Shaia
@ 2019-08-18 13:21 ` Yuval Shaia
  2019-08-31 19:31   ` Marcel Apfelbaum
  2019-08-31 19:26 ` [Qemu-devel] [PATCH 0/2] rdma: " Marcel Apfelbaum
  2 siblings, 1 reply; 8+ messages in thread
From: Yuval Shaia @ 2019-08-18 13:21 UTC (permalink / raw)
  To: berrange, thuth, philmd, pbonzini, laurent, qemu-devel,
	yuval.shaia, marcel.apfelbaum

The virtual address that is provided by the guest in post_send and
post_recv operations is related to the guest address space. This address
space is unknown to the HCA resides on host so extra step in these
operations is needed to adjust the address to host virtual address.

This step, which is done in data-path affects performances.

An enhanced verion of MR registration introduced here
https://patchwork.kernel.org/patch/11044467/ can be used so that the
guest virtual address space for this MR is known to the HCA in host.

This will save the data-path adjustment.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 hw/rdma/rdma_backend.c    | 13 +++++++++++++
 hw/rdma/rdma_backend.h    |  5 +++++
 hw/rdma/rdma_rm.c         |  5 +++++
 hw/rdma/vmw/pvrdma_main.c |  6 ++++++
 4 files changed, 29 insertions(+)

diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index c39051068d..c346407cd3 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -391,7 +391,11 @@ static int build_host_sge_array(RdmaDeviceResources *rdma_dev_res,
             return VENDOR_ERR_INVLKEY | ssge[ssge_idx].lkey;
         }
 
+#ifdef LEGACY_RDMA_REG_MR
         dsge->addr = (uintptr_t)mr->virt + ssge[ssge_idx].addr - mr->start;
+#else
+        dsge->addr = ssge[ssge_idx].addr;
+#endif
         dsge->length = ssge[ssge_idx].length;
         dsge->lkey = rdma_backend_mr_lkey(&mr->backend_mr);
 
@@ -735,10 +739,19 @@ void rdma_backend_destroy_pd(RdmaBackendPD *pd)
     }
 }
 
+#ifdef LEGACY_RDMA_REG_MR
 int rdma_backend_create_mr(RdmaBackendMR *mr, RdmaBackendPD *pd, void *addr,
                            size_t length, int access)
+#else
+int rdma_backend_create_mr(RdmaBackendMR *mr, RdmaBackendPD *pd, void *addr,
+                           size_t length, uint64_t guest_start, int access)
+#endif
 {
+#ifdef LEGACY_RDMA_REG_MR
     mr->ibmr = ibv_reg_mr(pd->ibpd, addr, length, access);
+#else
+    mr->ibmr = ibv_reg_mr_iova(pd->ibpd, addr, length, guest_start, access);
+#endif
     if (!mr->ibmr) {
         rdma_error_report("ibv_reg_mr fail, errno=%d", errno);
         return -EIO;
diff --git a/hw/rdma/rdma_backend.h b/hw/rdma/rdma_backend.h
index 7c1a19a2b5..127f96e2d5 100644
--- a/hw/rdma/rdma_backend.h
+++ b/hw/rdma/rdma_backend.h
@@ -78,8 +78,13 @@ int rdma_backend_query_port(RdmaBackendDev *backend_dev,
 int rdma_backend_create_pd(RdmaBackendDev *backend_dev, RdmaBackendPD *pd);
 void rdma_backend_destroy_pd(RdmaBackendPD *pd);
 
+#ifdef LEGACY_RDMA_REG_MR
 int rdma_backend_create_mr(RdmaBackendMR *mr, RdmaBackendPD *pd, void *addr,
                            size_t length, int access);
+#else
+int rdma_backend_create_mr(RdmaBackendMR *mr, RdmaBackendPD *pd, void *addr,
+                           size_t length, uint64_t guest_start, int access);
+#endif
 void rdma_backend_destroy_mr(RdmaBackendMR *mr);
 
 int rdma_backend_create_cq(RdmaBackendDev *backend_dev, RdmaBackendCQ *cq,
diff --git a/hw/rdma/rdma_rm.c b/hw/rdma/rdma_rm.c
index 1927f85472..1524dfaeaa 100644
--- a/hw/rdma/rdma_rm.c
+++ b/hw/rdma/rdma_rm.c
@@ -227,8 +227,13 @@ int rdma_rm_alloc_mr(RdmaDeviceResources *dev_res, uint32_t pd_handle,
         mr->length = guest_length;
         mr->virt += (mr->start & (TARGET_PAGE_SIZE - 1));
 
+#ifdef LEGACY_RDMA_REG_MR
         ret = rdma_backend_create_mr(&mr->backend_mr, &pd->backend_pd, mr->virt,
                                      mr->length, access_flags);
+#else
+        ret = rdma_backend_create_mr(&mr->backend_mr, &pd->backend_pd, mr->virt,
+                                     mr->length, guest_start, access_flags);
+#endif
         if (ret) {
             ret = -EIO;
             goto out_dealloc_mr;
diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
index 3e36e13013..18075285f6 100644
--- a/hw/rdma/vmw/pvrdma_main.c
+++ b/hw/rdma/vmw/pvrdma_main.c
@@ -664,6 +664,12 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp)
     dev->shutdown_notifier.notify = pvrdma_shutdown_notifier;
     qemu_register_shutdown_notifier(&dev->shutdown_notifier);
 
+#ifdef LEGACY_RDMA_REG_MR
+    rdma_info_report("Using legacy reg_mr");
+#else
+    rdma_info_report("Using iova reg_mr");
+#endif
+
 out:
     if (rc) {
         pvrdma_fini(pdev);
-- 
2.20.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] rdma: Utilize ibv_reg_mr_iova for memory registration
  2019-08-18 13:21 [Qemu-devel] [PATCH 0/2] rdma: Utilize ibv_reg_mr_iova for memory registration Yuval Shaia
  2019-08-18 13:21 ` [Qemu-devel] [PATCH 1/2] configure: Check if we can use ibv_reg_mr_iova Yuval Shaia
  2019-08-18 13:21 ` [Qemu-devel] [PATCH 2/2] hw/rdma: Utilize ibv_reg_mr_iova for memory registration Yuval Shaia
@ 2019-08-31 19:26 ` " Marcel Apfelbaum
  2 siblings, 0 replies; 8+ messages in thread
From: Marcel Apfelbaum @ 2019-08-31 19:26 UTC (permalink / raw)
  To: Yuval Shaia, berrange, thuth, philmd, pbonzini, laurent, qemu-devel

Hi Yuval,

On 8/18/19 4:21 PM, Yuval Shaia wrote:
> The virtual address that is provided by the guest in post_send and
> post_recv operations is related to the guest address space. This
> address
> space is unknown to the HCA resides on host so extra step in these
> operations is needed to adjust the address to host virtual address.
>
> This step, which is done in data-path affects performances.
>
> An enhanced verion of MR registration introduced here
> https://patchwork.kernel.org/patch/11044467/ can be used so that the
> guest virtual address space for this MR is known to the HCA in host.

Nice work on kernel side !
Thanks,
Marcel

>
> This will save the data-path adjustment.
>
> patch #1 deals with what is needed to detect if the library installed in
> the host supports this function
> patch #2 enhance the data-path ops by utilizing the new function
>
> Yuval Shaia (2):
>    configure: Check if we can use ibv_reg_mr_iova
>    hw/rdma: Utilize ibv_reg_mr_iova for memory registration
>
>   configure                 | 28 ++++++++++++++++++++++++++++
>   hw/rdma/rdma_backend.c    | 13 +++++++++++++
>   hw/rdma/rdma_backend.h    |  5 +++++
>   hw/rdma/rdma_rm.c         |  5 +++++
>   hw/rdma/vmw/pvrdma_main.c |  6 ++++++
>   5 files changed, 57 insertions(+)
>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] configure: Check if we can use ibv_reg_mr_iova
  2019-08-18 13:21 ` [Qemu-devel] [PATCH 1/2] configure: Check if we can use ibv_reg_mr_iova Yuval Shaia
@ 2019-08-31 19:28   ` Marcel Apfelbaum
  2019-09-01  9:25     ` Yuval Shaia
  0 siblings, 1 reply; 8+ messages in thread
From: Marcel Apfelbaum @ 2019-08-31 19:28 UTC (permalink / raw)
  To: Yuval Shaia, berrange, thuth, philmd, pbonzini, laurent, qemu-devel



On 8/18/19 4:21 PM, Yuval Shaia wrote:
> The function reg_mr_iova is an enhanced version of ibv_reg_mr function
> that can help to easly register and use guest's MRs.
>
> Add check in 'configure' phase to detect if we have libibverbs with this
> support.
>
> Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
> ---
>   configure | 28 ++++++++++++++++++++++++++++
>   1 file changed, 28 insertions(+)
>
> diff --git a/configure b/configure
> index 714e7fb6a1..e8e4a57b6f 100755
> --- a/configure
> +++ b/configure
> @@ -3205,6 +3205,34 @@ else
>       pvrdma="no"
>   fi
>   
> +# Let's see if enhanced reg_mr is supported
> +if test "$pvrdma" = "yes" ; then
> +
> +cat > $TMPC <<EOF &&
> +#include <infiniband/verbs.h>
> +
> +int
> +main(void)
> +{
> +    struct ibv_mr *mr;
> +    struct ibv_pd *pd = NULL;
> +    size_t length = 10;
> +    uint64_t iova = 0;
> +    int access = 0;
> +    void *addr = NULL;
> +
> +    mr = ibv_reg_mr_iova(pd, addr, length, iova, access);

Here you check if the API is changed, right?
Can you query for a library version instead?

Thanks,
Marcel

> +
> +    ibv_dereg_mr(mr);
> +
> +    return 0;
> +}
> +EOF
> +    if ! compile_prog "" "-libverbs"; then
> +        QEMU_CFLAGS="$QEMU_CFLAGS -DLEGACY_RDMA_REG_MR"
> +    fi
> +fi
> +
>   ##########################################
>   # VNC SASL detection
>   if test "$vnc" = "yes" && test "$vnc_sasl" != "no" ; then



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] hw/rdma: Utilize ibv_reg_mr_iova for memory registration
  2019-08-18 13:21 ` [Qemu-devel] [PATCH 2/2] hw/rdma: Utilize ibv_reg_mr_iova for memory registration Yuval Shaia
@ 2019-08-31 19:31   ` Marcel Apfelbaum
  2019-09-01  9:30     ` Yuval Shaia
  0 siblings, 1 reply; 8+ messages in thread
From: Marcel Apfelbaum @ 2019-08-31 19:31 UTC (permalink / raw)
  To: Yuval Shaia, berrange, thuth, philmd, pbonzini, laurent, qemu-devel

Hi Yuval,

On 8/18/19 4:21 PM, Yuval Shaia wrote:
> The virtual address that is provided by the guest in post_send and
> post_recv operations is related to the guest address space. This address
> space is unknown to the HCA resides on host so extra step in these
> operations is needed to adjust the address to host virtual address.
>
> This step, which is done in data-path affects performances.
>
> An enhanced verion of MR registration introduced here
> https://patchwork.kernel.org/patch/11044467/ can be used so that the
> guest virtual address space for this MR is known to the HCA in host.
>
> This will save the data-path adjustment.
>
> Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
> ---
>   hw/rdma/rdma_backend.c    | 13 +++++++++++++
>   hw/rdma/rdma_backend.h    |  5 +++++
>   hw/rdma/rdma_rm.c         |  5 +++++
>   hw/rdma/vmw/pvrdma_main.c |  6 ++++++
>   4 files changed, 29 insertions(+)
>
> diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
> index c39051068d..c346407cd3 100644
> --- a/hw/rdma/rdma_backend.c
> +++ b/hw/rdma/rdma_backend.c
> @@ -391,7 +391,11 @@ static int build_host_sge_array(RdmaDeviceResources *rdma_dev_res,
>               return VENDOR_ERR_INVLKEY | ssge[ssge_idx].lkey;
>           }
>   
> +#ifdef LEGACY_RDMA_REG_MR
>           dsge->addr = (uintptr_t)mr->virt + ssge[ssge_idx].addr - mr->start;

This is the performance hit you are addressing? The address computation?

Thanks,
Marcel


> +#else
> +        dsge->addr = ssge[ssge_idx].addr;
> +#endif
>           dsge->length = ssge[ssge_idx].length;
>           dsge->lkey = rdma_backend_mr_lkey(&mr->backend_mr);
>   
> @@ -735,10 +739,19 @@ void rdma_backend_destroy_pd(RdmaBackendPD *pd)
>       }
>   }
>   
> +#ifdef LEGACY_RDMA_REG_MR
>   int rdma_backend_create_mr(RdmaBackendMR *mr, RdmaBackendPD *pd, void *addr,
>                              size_t length, int access)
> +#else
> +int rdma_backend_create_mr(RdmaBackendMR *mr, RdmaBackendPD *pd, void *addr,
> +                           size_t length, uint64_t guest_start, int access)
> +#endif
>   {
> +#ifdef LEGACY_RDMA_REG_MR
>       mr->ibmr = ibv_reg_mr(pd->ibpd, addr, length, access);
> +#else
> +    mr->ibmr = ibv_reg_mr_iova(pd->ibpd, addr, length, guest_start, access);
> +#endif
>       if (!mr->ibmr) {
>           rdma_error_report("ibv_reg_mr fail, errno=%d", errno);
>           return -EIO;
> diff --git a/hw/rdma/rdma_backend.h b/hw/rdma/rdma_backend.h
> index 7c1a19a2b5..127f96e2d5 100644
> --- a/hw/rdma/rdma_backend.h
> +++ b/hw/rdma/rdma_backend.h
> @@ -78,8 +78,13 @@ int rdma_backend_query_port(RdmaBackendDev *backend_dev,
>   int rdma_backend_create_pd(RdmaBackendDev *backend_dev, RdmaBackendPD *pd);
>   void rdma_backend_destroy_pd(RdmaBackendPD *pd);
>   
> +#ifdef LEGACY_RDMA_REG_MR
>   int rdma_backend_create_mr(RdmaBackendMR *mr, RdmaBackendPD *pd, void *addr,
>                              size_t length, int access);
> +#else
> +int rdma_backend_create_mr(RdmaBackendMR *mr, RdmaBackendPD *pd, void *addr,
> +                           size_t length, uint64_t guest_start, int access);
> +#endif
>   void rdma_backend_destroy_mr(RdmaBackendMR *mr);
>   
>   int rdma_backend_create_cq(RdmaBackendDev *backend_dev, RdmaBackendCQ *cq,
> diff --git a/hw/rdma/rdma_rm.c b/hw/rdma/rdma_rm.c
> index 1927f85472..1524dfaeaa 100644
> --- a/hw/rdma/rdma_rm.c
> +++ b/hw/rdma/rdma_rm.c
> @@ -227,8 +227,13 @@ int rdma_rm_alloc_mr(RdmaDeviceResources *dev_res, uint32_t pd_handle,
>           mr->length = guest_length;
>           mr->virt += (mr->start & (TARGET_PAGE_SIZE - 1));
>   
> +#ifdef LEGACY_RDMA_REG_MR
>           ret = rdma_backend_create_mr(&mr->backend_mr, &pd->backend_pd, mr->virt,
>                                        mr->length, access_flags);
> +#else
> +        ret = rdma_backend_create_mr(&mr->backend_mr, &pd->backend_pd, mr->virt,
> +                                     mr->length, guest_start, access_flags);
> +#endif
>           if (ret) {
>               ret = -EIO;
>               goto out_dealloc_mr;
> diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> index 3e36e13013..18075285f6 100644
> --- a/hw/rdma/vmw/pvrdma_main.c
> +++ b/hw/rdma/vmw/pvrdma_main.c
> @@ -664,6 +664,12 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp)
>       dev->shutdown_notifier.notify = pvrdma_shutdown_notifier;
>       qemu_register_shutdown_notifier(&dev->shutdown_notifier);
>   
> +#ifdef LEGACY_RDMA_REG_MR
> +    rdma_info_report("Using legacy reg_mr");
> +#else
> +    rdma_info_report("Using iova reg_mr");
> +#endif
> +
>   out:
>       if (rc) {
>           pvrdma_fini(pdev);



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] configure: Check if we can use ibv_reg_mr_iova
  2019-08-31 19:28   ` Marcel Apfelbaum
@ 2019-09-01  9:25     ` Yuval Shaia
  0 siblings, 0 replies; 8+ messages in thread
From: Yuval Shaia @ 2019-09-01  9:25 UTC (permalink / raw)
  To: Marcel Apfelbaum; +Cc: thuth, berrange, qemu-devel, laurent, pbonzini, philmd

On Sat, Aug 31, 2019 at 10:28:18PM +0300, Marcel Apfelbaum wrote:
> 
> 
> On 8/18/19 4:21 PM, Yuval Shaia wrote:
> > The function reg_mr_iova is an enhanced version of ibv_reg_mr function
> > that can help to easly register and use guest's MRs.
> > 
> > Add check in 'configure' phase to detect if we have libibverbs with this
> > support.
> > 
> > Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
> > ---
> >   configure | 28 ++++++++++++++++++++++++++++
> >   1 file changed, 28 insertions(+)
> > 
> > diff --git a/configure b/configure
> > index 714e7fb6a1..e8e4a57b6f 100755
> > --- a/configure
> > +++ b/configure
> > @@ -3205,6 +3205,34 @@ else
> >       pvrdma="no"
> >   fi
> > +# Let's see if enhanced reg_mr is supported
> > +if test "$pvrdma" = "yes" ; then
> > +
> > +cat > $TMPC <<EOF &&
> > +#include <infiniband/verbs.h>
> > +
> > +int
> > +main(void)
> > +{
> > +    struct ibv_mr *mr;
> > +    struct ibv_pd *pd = NULL;
> > +    size_t length = 10;
> > +    uint64_t iova = 0;
> > +    int access = 0;
> > +    void *addr = NULL;
> > +
> > +    mr = ibv_reg_mr_iova(pd, addr, length, iova, access);
> 
> Here you check if the API is changed, right?

Yes.

> Can you query for a library version instead?

Library version is set in the spec file which is under the distros'es
responsibility. I don't see a reason to be depend on that especially when
the check for support is so easy. In addition, this way allows one to
download latest upstream code and compile against it even when his distro
still didn't update the repo.

> 
> Thanks,
> Marcel
> 
> > +
> > +    ibv_dereg_mr(mr);
> > +
> > +    return 0;
> > +}
> > +EOF
> > +    if ! compile_prog "" "-libverbs"; then
> > +        QEMU_CFLAGS="$QEMU_CFLAGS -DLEGACY_RDMA_REG_MR"
> > +    fi
> > +fi
> > +
> >   ##########################################
> >   # VNC SASL detection
> >   if test "$vnc" = "yes" && test "$vnc_sasl" != "no" ; then
> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] hw/rdma: Utilize ibv_reg_mr_iova for memory registration
  2019-08-31 19:31   ` Marcel Apfelbaum
@ 2019-09-01  9:30     ` Yuval Shaia
  0 siblings, 0 replies; 8+ messages in thread
From: Yuval Shaia @ 2019-09-01  9:30 UTC (permalink / raw)
  To: Marcel Apfelbaum; +Cc: thuth, berrange, qemu-devel, laurent, pbonzini, philmd

On Sat, Aug 31, 2019 at 10:31:57PM +0300, Marcel Apfelbaum wrote:
> Hi Yuval,
> 
> On 8/18/19 4:21 PM, Yuval Shaia wrote:
> > The virtual address that is provided by the guest in post_send and
> > post_recv operations is related to the guest address space. This address
> > space is unknown to the HCA resides on host so extra step in these
> > operations is needed to adjust the address to host virtual address.
> > 
> > This step, which is done in data-path affects performances.
> > 
> > An enhanced verion of MR registration introduced here
> > https://patchwork.kernel.org/patch/11044467/ can be used so that the
> > guest virtual address space for this MR is known to the HCA in host.
> > 
> > This will save the data-path adjustment.
> > 
> > Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
> > ---
> >   hw/rdma/rdma_backend.c    | 13 +++++++++++++
> >   hw/rdma/rdma_backend.h    |  5 +++++
> >   hw/rdma/rdma_rm.c         |  5 +++++
> >   hw/rdma/vmw/pvrdma_main.c |  6 ++++++
> >   4 files changed, 29 insertions(+)
> > 
> > diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
> > index c39051068d..c346407cd3 100644
> > --- a/hw/rdma/rdma_backend.c
> > +++ b/hw/rdma/rdma_backend.c
> > @@ -391,7 +391,11 @@ static int build_host_sge_array(RdmaDeviceResources *rdma_dev_res,
> >               return VENDOR_ERR_INVLKEY | ssge[ssge_idx].lkey;
> >           }
> > +#ifdef LEGACY_RDMA_REG_MR
> >           dsge->addr = (uintptr_t)mr->virt + ssge[ssge_idx].addr - mr->start;
> 
> This is the performance hit you are addressing? The address computation?

This is the support for legacy library, see below the enhancement.

> 
> Thanks,
> Marcel
> 
> 
> > +#else
> > +        dsge->addr = ssge[ssge_idx].addr;

Here it is, no need to adjust to host virtual address.

Please note also that this is a huge step toward virtio-rdma support where
the emulation will be bypassed in data path so no chance for address
adjustment.

> > +#endif
> >           dsge->length = ssge[ssge_idx].length;
> >           dsge->lkey = rdma_backend_mr_lkey(&mr->backend_mr);
> > @@ -735,10 +739,19 @@ void rdma_backend_destroy_pd(RdmaBackendPD *pd)
> >       }
> >   }
> > +#ifdef LEGACY_RDMA_REG_MR
> >   int rdma_backend_create_mr(RdmaBackendMR *mr, RdmaBackendPD *pd, void *addr,
> >                              size_t length, int access)
> > +#else
> > +int rdma_backend_create_mr(RdmaBackendMR *mr, RdmaBackendPD *pd, void *addr,
> > +                           size_t length, uint64_t guest_start, int access)
> > +#endif
> >   {
> > +#ifdef LEGACY_RDMA_REG_MR
> >       mr->ibmr = ibv_reg_mr(pd->ibpd, addr, length, access);
> > +#else
> > +    mr->ibmr = ibv_reg_mr_iova(pd->ibpd, addr, length, guest_start, access);
> > +#endif
> >       if (!mr->ibmr) {
> >           rdma_error_report("ibv_reg_mr fail, errno=%d", errno);
> >           return -EIO;
> > diff --git a/hw/rdma/rdma_backend.h b/hw/rdma/rdma_backend.h
> > index 7c1a19a2b5..127f96e2d5 100644
> > --- a/hw/rdma/rdma_backend.h
> > +++ b/hw/rdma/rdma_backend.h
> > @@ -78,8 +78,13 @@ int rdma_backend_query_port(RdmaBackendDev *backend_dev,
> >   int rdma_backend_create_pd(RdmaBackendDev *backend_dev, RdmaBackendPD *pd);
> >   void rdma_backend_destroy_pd(RdmaBackendPD *pd);
> > +#ifdef LEGACY_RDMA_REG_MR
> >   int rdma_backend_create_mr(RdmaBackendMR *mr, RdmaBackendPD *pd, void *addr,
> >                              size_t length, int access);
> > +#else
> > +int rdma_backend_create_mr(RdmaBackendMR *mr, RdmaBackendPD *pd, void *addr,
> > +                           size_t length, uint64_t guest_start, int access);
> > +#endif
> >   void rdma_backend_destroy_mr(RdmaBackendMR *mr);
> >   int rdma_backend_create_cq(RdmaBackendDev *backend_dev, RdmaBackendCQ *cq,
> > diff --git a/hw/rdma/rdma_rm.c b/hw/rdma/rdma_rm.c
> > index 1927f85472..1524dfaeaa 100644
> > --- a/hw/rdma/rdma_rm.c
> > +++ b/hw/rdma/rdma_rm.c
> > @@ -227,8 +227,13 @@ int rdma_rm_alloc_mr(RdmaDeviceResources *dev_res, uint32_t pd_handle,
> >           mr->length = guest_length;
> >           mr->virt += (mr->start & (TARGET_PAGE_SIZE - 1));
> > +#ifdef LEGACY_RDMA_REG_MR
> >           ret = rdma_backend_create_mr(&mr->backend_mr, &pd->backend_pd, mr->virt,
> >                                        mr->length, access_flags);
> > +#else
> > +        ret = rdma_backend_create_mr(&mr->backend_mr, &pd->backend_pd, mr->virt,
> > +                                     mr->length, guest_start, access_flags);
> > +#endif
> >           if (ret) {
> >               ret = -EIO;
> >               goto out_dealloc_mr;
> > diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> > index 3e36e13013..18075285f6 100644
> > --- a/hw/rdma/vmw/pvrdma_main.c
> > +++ b/hw/rdma/vmw/pvrdma_main.c
> > @@ -664,6 +664,12 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp)
> >       dev->shutdown_notifier.notify = pvrdma_shutdown_notifier;
> >       qemu_register_shutdown_notifier(&dev->shutdown_notifier);
> > +#ifdef LEGACY_RDMA_REG_MR
> > +    rdma_info_report("Using legacy reg_mr");
> > +#else
> > +    rdma_info_report("Using iova reg_mr");
> > +#endif
> > +
> >   out:
> >       if (rc) {
> >           pvrdma_fini(pdev);
> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-18 13:21 [Qemu-devel] [PATCH 0/2] rdma: Utilize ibv_reg_mr_iova for memory registration Yuval Shaia
2019-08-18 13:21 ` [Qemu-devel] [PATCH 1/2] configure: Check if we can use ibv_reg_mr_iova Yuval Shaia
2019-08-31 19:28   ` Marcel Apfelbaum
2019-09-01  9:25     ` Yuval Shaia
2019-08-18 13:21 ` [Qemu-devel] [PATCH 2/2] hw/rdma: Utilize ibv_reg_mr_iova for memory registration Yuval Shaia
2019-08-31 19:31   ` Marcel Apfelbaum
2019-09-01  9:30     ` Yuval Shaia
2019-08-31 19:26 ` [Qemu-devel] [PATCH 0/2] rdma: " Marcel Apfelbaum

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org qemu-devel@archiver.kernel.org
	public-inbox-index qemu-devel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox