All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V2 0/5] hw/pvrdma: PVRDMA device implementation
@ 2017-12-17 12:54 Marcel Apfelbaum
  2017-12-17 12:54 ` [Qemu-devel] [PATCH V2 1/5] pci/shpc: Move function to generic header file Marcel Apfelbaum
                   ` (5 more replies)
  0 siblings, 6 replies; 30+ messages in thread
From: Marcel Apfelbaum @ 2017-12-17 12:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: ehabkost, imammedo, yuval.shaia, marcel, pbonzini, mst

RFC -> V2:
 - Full implementation of the pvrdma device
 - Backend is an ibdevice interface, no need for the KDBR module


General description
===================
PVRDMA is the QEMU implementation of VMware's paravirtualized RDMA device.
It works with its Linux Kernel driver AS IS, no need for any special guest
modifications.

While it complies with the VMware device, it can also communicate with bare
metal RDMA-enabled machines and does not require an RDMA HCA in the host, it
can work with Soft-RoCE (rxe).

It does not require the whole guest RAM to be pinned allowing memory
over-commit and, even if not implemented yet, migration support will be
possible with some HW assistance.


 Design
 ======
 - Follows the behavior of VMware's pvrdma device, however is not tightly
   coupled with it and most of the code can be reused if we decide to
   continue to a Virtio based RDMA device.

 - It exposes 3 BARs:
    BAR 0 - MSIX, utilize 3 vectors for command ring, async events and
            completions
    BAR 1 - Configuration of registers
    BAR 2 - UAR, used to pass HW commands from driver.

 - The device performs internal management of the RDMA
   resources (PDs, CQs, QPs, ...), meaning the objects
   are not directly coupled to a physical RDMA device resources.

The pvrdma backend is an ibdevice interface that can be exposed
either by a Soft-RoCE(rxe) device on machines with no RDMA device,
or an HCA SRIOV function(VF/PF).
Note that ibdevice interfaces can't be shared between pvrdma devices,
each one requiring a separate instance (rxe or SRIOV VF).


Tests and performance
=====================
Tested with SoftRoCE backend (rxe)/Mellanox ConnectX3,
and Mellanox ConnectX4 HCAs with:
  - VMs in the same host
  - VMs in different hosts 
  - VMs to bare metal.

The best performance achieved with ConnectX HCAs and buffer size
bigger than 1MB which was the line rate ~ 50Gb/s.
The conclusion is that using the PVRDMA device there are no
actual performance penalties compared to bare metal for big enough
buffers (which is quite common when using RDMA), while allowing
memory overcommit.

Marcel Apfelbaum (3):
  mem: add share parameter to memory-backend-ram
  docs: add pvrdma device documentation.
  MAINTAINERS: add entry for hw/net/pvrdma

Yuval Shaia (2):
  pci/shpc: Move function to generic header file
  pvrdma: initial implementation

 MAINTAINERS                         |   7 +
 Makefile.objs                       |   1 +
 backends/hostmem-file.c             |  25 +-
 backends/hostmem-ram.c              |   4 +-
 backends/hostmem.c                  |  21 +
 configure                           |   9 +-
 default-configs/arm-softmmu.mak     |   2 +
 default-configs/i386-softmmu.mak    |   1 +
 default-configs/x86_64-softmmu.mak  |   1 +
 docs/pvrdma.txt                     | 145 ++++++
 exec.c                              |  26 +-
 hw/net/Makefile.objs                |   7 +
 hw/net/pvrdma/pvrdma.h              | 179 +++++++
 hw/net/pvrdma/pvrdma_backend.c      | 986 ++++++++++++++++++++++++++++++++++++
 hw/net/pvrdma/pvrdma_backend.h      |  74 +++
 hw/net/pvrdma/pvrdma_backend_defs.h |  68 +++
 hw/net/pvrdma/pvrdma_cmd.c          | 338 ++++++++++++
 hw/net/pvrdma/pvrdma_defs.h         | 121 +++++
 hw/net/pvrdma/pvrdma_dev_api.h      | 580 +++++++++++++++++++++
 hw/net/pvrdma/pvrdma_dev_ring.c     | 138 +++++
 hw/net/pvrdma/pvrdma_dev_ring.h     |  42 ++
 hw/net/pvrdma/pvrdma_ib_verbs.h     | 399 +++++++++++++++
 hw/net/pvrdma/pvrdma_main.c         | 664 ++++++++++++++++++++++++
 hw/net/pvrdma/pvrdma_qp_ops.c       | 187 +++++++
 hw/net/pvrdma/pvrdma_qp_ops.h       |  26 +
 hw/net/pvrdma/pvrdma_ring.h         | 134 +++++
 hw/net/pvrdma/pvrdma_rm.c           | 791 +++++++++++++++++++++++++++++
 hw/net/pvrdma/pvrdma_rm.h           |  54 ++
 hw/net/pvrdma/pvrdma_rm_defs.h      | 111 ++++
 hw/net/pvrdma/pvrdma_types.h        |  37 ++
 hw/net/pvrdma/pvrdma_utils.c        | 133 +++++
 hw/net/pvrdma/pvrdma_utils.h        |  41 ++
 hw/net/pvrdma/trace-events          |   9 +
 hw/pci/shpc.c                       |  11 +-
 include/exec/memory.h               |  23 +
 include/exec/ram_addr.h             |   3 +-
 include/hw/pci/pci_ids.h            |   3 +
 include/qemu/cutils.h               |  10 +
 include/qemu/osdep.h                |   2 +-
 include/sysemu/hostmem.h            |   2 +-
 include/sysemu/kvm.h                |   2 +-
 memory.c                            |  16 +-
 util/oslib-posix.c                  |   4 +-
 util/oslib-win32.c                  |   2 +-
 44 files changed, 5378 insertions(+), 61 deletions(-)
 create mode 100644 docs/pvrdma.txt
 create mode 100644 hw/net/pvrdma/pvrdma.h
 create mode 100644 hw/net/pvrdma/pvrdma_backend.c
 create mode 100644 hw/net/pvrdma/pvrdma_backend.h
 create mode 100644 hw/net/pvrdma/pvrdma_backend_defs.h
 create mode 100644 hw/net/pvrdma/pvrdma_cmd.c
 create mode 100644 hw/net/pvrdma/pvrdma_defs.h
 create mode 100644 hw/net/pvrdma/pvrdma_dev_api.h
 create mode 100644 hw/net/pvrdma/pvrdma_dev_ring.c
 create mode 100644 hw/net/pvrdma/pvrdma_dev_ring.h
 create mode 100644 hw/net/pvrdma/pvrdma_ib_verbs.h
 create mode 100644 hw/net/pvrdma/pvrdma_main.c
 create mode 100644 hw/net/pvrdma/pvrdma_qp_ops.c
 create mode 100644 hw/net/pvrdma/pvrdma_qp_ops.h
 create mode 100644 hw/net/pvrdma/pvrdma_ring.h
 create mode 100644 hw/net/pvrdma/pvrdma_rm.c
 create mode 100644 hw/net/pvrdma/pvrdma_rm.h
 create mode 100644 hw/net/pvrdma/pvrdma_rm_defs.h
 create mode 100644 hw/net/pvrdma/pvrdma_types.h
 create mode 100644 hw/net/pvrdma/pvrdma_utils.c
 create mode 100644 hw/net/pvrdma/pvrdma_utils.h
 create mode 100644 hw/net/pvrdma/trace-events

-- 
2.13.5

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2017-12-22  4:58 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-17 12:54 [Qemu-devel] [PATCH V2 0/5] hw/pvrdma: PVRDMA device implementation Marcel Apfelbaum
2017-12-17 12:54 ` [Qemu-devel] [PATCH V2 1/5] pci/shpc: Move function to generic header file Marcel Apfelbaum
2017-12-17 18:16   ` Philippe Mathieu-Daudé
2017-12-17 19:03     ` Yuval Shaia
2017-12-17 12:54 ` [Qemu-devel] [PATCH V2 2/5] mem: add share parameter to memory-backend-ram Marcel Apfelbaum
2017-12-17 12:54 ` [Qemu-devel] [PATCH V2 3/5] docs: add pvrdma device documentation Marcel Apfelbaum
2017-12-19 17:47   ` Michael S. Tsirkin
2017-12-20 14:45     ` Marcel Apfelbaum
2017-12-17 12:54 ` [Qemu-devel] [PATCH V2 4/5] pvrdma: initial implementation Marcel Apfelbaum
2017-12-19 16:12   ` Michael S. Tsirkin
2017-12-19 17:29     ` Marcel Apfelbaum
2017-12-19 17:48   ` Michael S. Tsirkin
2017-12-20 15:25     ` Yuval Shaia
2017-12-20 18:01       ` Michael S. Tsirkin
2017-12-19 19:13   ` Philippe Mathieu-Daudé
2017-12-20  4:08     ` Michael S. Tsirkin
2017-12-20 14:46       ` Marcel Apfelbaum
2017-12-17 12:54 ` [Qemu-devel] [PATCH V2 5/5] MAINTAINERS: add entry for hw/net/pvrdma Marcel Apfelbaum
2017-12-19 17:49   ` Michael S. Tsirkin
2017-12-19 18:05 ` [Qemu-devel] [PATCH V2 0/5] hw/pvrdma: PVRDMA device implementation Michael S. Tsirkin
2017-12-20 15:07   ` Marcel Apfelbaum
2017-12-21  0:05     ` Michael S. Tsirkin
2017-12-21  7:27       ` Yuval Shaia
2017-12-21 14:22         ` Michael S. Tsirkin
2017-12-21 15:59           ` Marcel Apfelbaum
2017-12-21 20:46             ` Michael S. Tsirkin
2017-12-21 22:30               ` Yuval Shaia
2017-12-22  4:58                 ` Marcel Apfelbaum
2017-12-20 17:56   ` Yuval Shaia
2017-12-20 18:05     ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.