All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v1 0/6] Extend vhost-user to support VFIO based accelerators
@ 2018-01-25  4:03 ` Tiwei Bie
  0 siblings, 0 replies; 39+ messages in thread
From: Tiwei Bie @ 2018-01-25  4:03 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

This patch set does some small extensions to vhost-user protocol
to support VFIO based accelerators, and makes it possible to get
the similar performance of VFIO based PCI passthru while keeping
the virtio device emulation in QEMU.

How does accelerator accelerate vhost (data path)
=================================================

Any virtio ring compatible devices potentially can be used as the
vhost data path accelerators. We can setup the accelerator based
on the informations (e.g. memory table, features, ring info, etc)
available on the vhost backend. And accelerator will be able to use
the virtio ring provided by the virtio driver in the VM directly.
So the virtio driver in the VM can exchange e.g. network packets
with the accelerator directly via the virtio ring. That is to say,
we will be able to use the accelerator to accelerate the vhost
data path. We call it vDPA: vhost Data Path Acceleration.

Notice: Although the accelerator can talk with the virtio driver
in the VM via the virtio ring directly. The control path events
(e.g. device start/stop) in the VM will still be trapped and handled
by QEMU, and QEMU will deliver such events to the vhost backend
via standard vhost protocol.

Below link is an example showing how to setup a such environment
via nested VM. In this case, the virtio device in the outer VM is
the accelerator. It will be used to accelerate the virtio device
in the inner VM. In reality, we could use virtio ring compatible
hardware device as the accelerators.

http://dpdk.org/ml/archives/dev/2017-December/085044.html

In above example, it doesn't require any changes to QEMU, but
it has lower performance compared with the traditional VFIO
based PCI passthru. And that's the problem this patch set wants
to solve.

The performance issue of vDPA/vhost-user and solutions
======================================================

For vhost-user backend, the critical issue in vDPA is that the
data path performance is relatively low and some host threads are
needed for the data path, because some necessary mechanisms are
missing to support:

1) guest driver notifies the device directly;
2) device interrupts the guest directly;

So this patch set does some small extensions to the vhost-user
protocol to make both of them possible. It leverages the same
mechanisms (e.g. EPT and Posted-Interrupt on Intel platform) as
the PCI passthru.

A new protocol feature bit is added to negotiate the accelerator
feature support. Two new slave message types are added to control
the notify region and queue interrupt passthru for each queue.
>From the view of vhost-user protocol design, it's very flexible.
The passthru can be enabled/disabled for each queue individually,
and it's possible to accelerate each queue by different devices.
More design and implementation details can be found from the last
patch.

Difference between vDPA and PCI passthru
========================================

The key difference between PCI passthru and vDPA is that, in vDPA
only the data path of the device (e.g. DMA ring, notify region and
queue interrupt) is pass-throughed to the VM, the device control
path (e.g. PCI configuration space and MMIO regions) is still
defined and emulated by QEMU.

The benefits of keeping virtio device emulation in QEMU compared
with virtio device PCI passthru include (but not limit to):

- consistent device interface for guest OS in the VM;
- max flexibility on the hardware (i.e. the accelerators) design;
- leveraging the existing virtio live-migration framework;

Why extend vhost-user for vDPA
==============================

We have already implemented various virtual switches (e.g. OVS-DPDK)
based on vhost-user for VMs in the Cloud. They are purely software
running on CPU cores. When we have accelerators for such NFVi applications,
it's ideal if the applications could keep using the original interface
(i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
when and how to switch between CPU and accelerators within the interface.
And the switching (i.e. switch between CPU and accelerators) can be done
flexibly and quickly inside the applications.

More details about this can be found from the Cunming's discussions on
the RFC patch set.

The previous links:
RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html

RFC -> v1:
- Add some details about how vDPA works in cover letter (Alexey)
- Add some details about the OVS offload use-case in cover letter (Jason)
- Move PCI specific stuffs out of vhost-user (Jason)
- Handle the virtual IOMMU case (Jason)
- Move VFIO group management code into vfio/common.c (Alex)
- Various refinements;
(approximately sorted by comment posting time)

Tiwei Bie (6):
  vhost-user: support receiving file descriptors in slave_read
  vhost-user: introduce shared vhost-user state
  virtio: support adding sub-regions for notify region
  vfio: support getting VFIOGroup from groupfd
  vfio: remove DPRINTF() definition from vfio-common.h
  vhost-user: add VFIO based accelerators support

 Makefile.target                 |   4 +
 docs/interop/vhost-user.txt     |  57 +++++++++
 hw/scsi/vhost-user-scsi.c       |   6 +-
 hw/vfio/common.c                |  96 ++++++++++++++-
 hw/virtio/vhost-user.c          | 250 +++++++++++++++++++++++++++++++++++++++-
 hw/virtio/virtio-pci.c          |  48 ++++++++
 hw/virtio/virtio-pci.h          |   5 +
 hw/virtio/virtio.c              |  39 +++++++
 include/hw/vfio/vfio-common.h   |  11 +-
 include/hw/virtio/vhost-user.h  |  34 ++++++
 include/hw/virtio/virtio-scsi.h |   6 +-
 include/hw/virtio/virtio.h      |   5 +
 include/qemu/osdep.h            |   1 +
 net/vhost-user.c                |  30 ++---
 14 files changed, 559 insertions(+), 33 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user.h

-- 
2.13.3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [virtio-dev] [PATCH v1 0/6] Extend vhost-user to support VFIO based accelerators
@ 2018-01-25  4:03 ` Tiwei Bie
  0 siblings, 0 replies; 39+ messages in thread
From: Tiwei Bie @ 2018-01-25  4:03 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

This patch set does some small extensions to vhost-user protocol
to support VFIO based accelerators, and makes it possible to get
the similar performance of VFIO based PCI passthru while keeping
the virtio device emulation in QEMU.

How does accelerator accelerate vhost (data path)
=================================================

Any virtio ring compatible devices potentially can be used as the
vhost data path accelerators. We can setup the accelerator based
on the informations (e.g. memory table, features, ring info, etc)
available on the vhost backend. And accelerator will be able to use
the virtio ring provided by the virtio driver in the VM directly.
So the virtio driver in the VM can exchange e.g. network packets
with the accelerator directly via the virtio ring. That is to say,
we will be able to use the accelerator to accelerate the vhost
data path. We call it vDPA: vhost Data Path Acceleration.

Notice: Although the accelerator can talk with the virtio driver
in the VM via the virtio ring directly. The control path events
(e.g. device start/stop) in the VM will still be trapped and handled
by QEMU, and QEMU will deliver such events to the vhost backend
via standard vhost protocol.

Below link is an example showing how to setup a such environment
via nested VM. In this case, the virtio device in the outer VM is
the accelerator. It will be used to accelerate the virtio device
in the inner VM. In reality, we could use virtio ring compatible
hardware device as the accelerators.

http://dpdk.org/ml/archives/dev/2017-December/085044.html

In above example, it doesn't require any changes to QEMU, but
it has lower performance compared with the traditional VFIO
based PCI passthru. And that's the problem this patch set wants
to solve.

The performance issue of vDPA/vhost-user and solutions
======================================================

For vhost-user backend, the critical issue in vDPA is that the
data path performance is relatively low and some host threads are
needed for the data path, because some necessary mechanisms are
missing to support:

1) guest driver notifies the device directly;
2) device interrupts the guest directly;

So this patch set does some small extensions to the vhost-user
protocol to make both of them possible. It leverages the same
mechanisms (e.g. EPT and Posted-Interrupt on Intel platform) as
the PCI passthru.

A new protocol feature bit is added to negotiate the accelerator
feature support. Two new slave message types are added to control
the notify region and queue interrupt passthru for each queue.
From the view of vhost-user protocol design, it's very flexible.
The passthru can be enabled/disabled for each queue individually,
and it's possible to accelerate each queue by different devices.
More design and implementation details can be found from the last
patch.

Difference between vDPA and PCI passthru
========================================

The key difference between PCI passthru and vDPA is that, in vDPA
only the data path of the device (e.g. DMA ring, notify region and
queue interrupt) is pass-throughed to the VM, the device control
path (e.g. PCI configuration space and MMIO regions) is still
defined and emulated by QEMU.

The benefits of keeping virtio device emulation in QEMU compared
with virtio device PCI passthru include (but not limit to):

- consistent device interface for guest OS in the VM;
- max flexibility on the hardware (i.e. the accelerators) design;
- leveraging the existing virtio live-migration framework;

Why extend vhost-user for vDPA
==============================

We have already implemented various virtual switches (e.g. OVS-DPDK)
based on vhost-user for VMs in the Cloud. They are purely software
running on CPU cores. When we have accelerators for such NFVi applications,
it's ideal if the applications could keep using the original interface
(i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
when and how to switch between CPU and accelerators within the interface.
And the switching (i.e. switch between CPU and accelerators) can be done
flexibly and quickly inside the applications.

More details about this can be found from the Cunming's discussions on
the RFC patch set.

The previous links:
RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html

RFC -> v1:
- Add some details about how vDPA works in cover letter (Alexey)
- Add some details about the OVS offload use-case in cover letter (Jason)
- Move PCI specific stuffs out of vhost-user (Jason)
- Handle the virtual IOMMU case (Jason)
- Move VFIO group management code into vfio/common.c (Alex)
- Various refinements;
(approximately sorted by comment posting time)

Tiwei Bie (6):
  vhost-user: support receiving file descriptors in slave_read
  vhost-user: introduce shared vhost-user state
  virtio: support adding sub-regions for notify region
  vfio: support getting VFIOGroup from groupfd
  vfio: remove DPRINTF() definition from vfio-common.h
  vhost-user: add VFIO based accelerators support

 Makefile.target                 |   4 +
 docs/interop/vhost-user.txt     |  57 +++++++++
 hw/scsi/vhost-user-scsi.c       |   6 +-
 hw/vfio/common.c                |  96 ++++++++++++++-
 hw/virtio/vhost-user.c          | 250 +++++++++++++++++++++++++++++++++++++++-
 hw/virtio/virtio-pci.c          |  48 ++++++++
 hw/virtio/virtio-pci.h          |   5 +
 hw/virtio/virtio.c              |  39 +++++++
 include/hw/vfio/vfio-common.h   |  11 +-
 include/hw/virtio/vhost-user.h  |  34 ++++++
 include/hw/virtio/virtio-scsi.h |   6 +-
 include/hw/virtio/virtio.h      |   5 +
 include/qemu/osdep.h            |   1 +
 net/vhost-user.c                |  30 ++---
 14 files changed, 559 insertions(+), 33 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user.h

-- 
2.13.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v1 1/6] vhost-user: support receiving file descriptors in slave_read
  2018-01-25  4:03 ` [virtio-dev] " Tiwei Bie
@ 2018-01-25  4:03   ` Tiwei Bie
  -1 siblings, 0 replies; 39+ messages in thread
From: Tiwei Bie @ 2018-01-25  4:03 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/virtio/vhost-user.c | 40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 093675ed98..e7108138fd 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -614,14 +614,43 @@ static void slave_read(void *opaque)
     struct vhost_user *u = dev->opaque;
     VhostUserMsg msg = { 0, };
     int size, ret = 0;
+    struct iovec iov;
+    struct msghdr msgh;
+    int fd = -1;
+    size_t fdsize = sizeof(fd);
+    char control[CMSG_SPACE(fdsize)];
+    struct cmsghdr *cmsg;
+
+    memset(&msgh, 0, sizeof(msgh));
+    msgh.msg_iov = &iov;
+    msgh.msg_iovlen = 1;
+    msgh.msg_control = control;
+    msgh.msg_controllen = sizeof(control);
 
     /* Read header */
-    size = read(u->slave_fd, &msg, VHOST_USER_HDR_SIZE);
+    iov.iov_base = &msg;
+    iov.iov_len = VHOST_USER_HDR_SIZE;
+
+    size = recvmsg(u->slave_fd, &msgh, 0);
     if (size != VHOST_USER_HDR_SIZE) {
         error_report("Failed to read from slave.");
         goto err;
     }
 
+    if (msgh.msg_flags & MSG_CTRUNC) {
+        error_report("Truncated message.");
+        goto err;
+    }
+
+    for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
+         cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
+            if (cmsg->cmsg_level == SOL_SOCKET &&
+                cmsg->cmsg_type == SCM_RIGHTS) {
+                    memcpy(&fd, CMSG_DATA(cmsg), fdsize);
+                    break;
+            }
+    }
+
     if (msg.size > VHOST_USER_PAYLOAD_SIZE) {
         error_report("Failed to read msg header."
                 " Size %d exceeds the maximum %zu.", msg.size,
@@ -642,9 +671,15 @@ static void slave_read(void *opaque)
         break;
     default:
         error_report("Received unexpected msg type.");
+        if (fd != -1) {
+            close(fd);
+        }
         ret = -EINVAL;
     }
 
+    /* Message handlers need to make sure that fd will be consumed. */
+    fd = -1;
+
     /*
      * REPLY_ACK feature handling. Other reply types has to be managed
      * directly in their request handlers.
@@ -669,6 +704,9 @@ err:
     qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
     close(u->slave_fd);
     u->slave_fd = -1;
+    if (fd != -1) {
+        close(fd);
+    }
     return;
 }
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [virtio-dev] [PATCH v1 1/6] vhost-user: support receiving file descriptors in slave_read
@ 2018-01-25  4:03   ` Tiwei Bie
  0 siblings, 0 replies; 39+ messages in thread
From: Tiwei Bie @ 2018-01-25  4:03 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/virtio/vhost-user.c | 40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 093675ed98..e7108138fd 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -614,14 +614,43 @@ static void slave_read(void *opaque)
     struct vhost_user *u = dev->opaque;
     VhostUserMsg msg = { 0, };
     int size, ret = 0;
+    struct iovec iov;
+    struct msghdr msgh;
+    int fd = -1;
+    size_t fdsize = sizeof(fd);
+    char control[CMSG_SPACE(fdsize)];
+    struct cmsghdr *cmsg;
+
+    memset(&msgh, 0, sizeof(msgh));
+    msgh.msg_iov = &iov;
+    msgh.msg_iovlen = 1;
+    msgh.msg_control = control;
+    msgh.msg_controllen = sizeof(control);
 
     /* Read header */
-    size = read(u->slave_fd, &msg, VHOST_USER_HDR_SIZE);
+    iov.iov_base = &msg;
+    iov.iov_len = VHOST_USER_HDR_SIZE;
+
+    size = recvmsg(u->slave_fd, &msgh, 0);
     if (size != VHOST_USER_HDR_SIZE) {
         error_report("Failed to read from slave.");
         goto err;
     }
 
+    if (msgh.msg_flags & MSG_CTRUNC) {
+        error_report("Truncated message.");
+        goto err;
+    }
+
+    for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
+         cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
+            if (cmsg->cmsg_level == SOL_SOCKET &&
+                cmsg->cmsg_type == SCM_RIGHTS) {
+                    memcpy(&fd, CMSG_DATA(cmsg), fdsize);
+                    break;
+            }
+    }
+
     if (msg.size > VHOST_USER_PAYLOAD_SIZE) {
         error_report("Failed to read msg header."
                 " Size %d exceeds the maximum %zu.", msg.size,
@@ -642,9 +671,15 @@ static void slave_read(void *opaque)
         break;
     default:
         error_report("Received unexpected msg type.");
+        if (fd != -1) {
+            close(fd);
+        }
         ret = -EINVAL;
     }
 
+    /* Message handlers need to make sure that fd will be consumed. */
+    fd = -1;
+
     /*
      * REPLY_ACK feature handling. Other reply types has to be managed
      * directly in their request handlers.
@@ -669,6 +704,9 @@ err:
     qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
     close(u->slave_fd);
     u->slave_fd = -1;
+    if (fd != -1) {
+        close(fd);
+    }
     return;
 }
 
-- 
2.13.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v1 2/6] vhost-user: introduce shared vhost-user state
  2018-01-25  4:03 ` [virtio-dev] " Tiwei Bie
@ 2018-01-25  4:03   ` Tiwei Bie
  -1 siblings, 0 replies; 39+ messages in thread
From: Tiwei Bie @ 2018-01-25  4:03 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

When multi-queue is enabled for virtio-net, each virtio
queue pair will have a vhost_dev, and the only thing they
share currently is the chardev. This patch introduces a
vhost-user state structure which will be shared by all
virtio queue pairs of the same virtio device.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/scsi/vhost-user-scsi.c       |  6 +++---
 hw/virtio/vhost-user.c          |  9 +++++----
 include/hw/virtio/vhost-user.h  | 17 +++++++++++++++++
 include/hw/virtio/virtio-scsi.h |  6 +++++-
 net/vhost-user.c                | 30 ++++++++++++++++--------------
 5 files changed, 46 insertions(+), 22 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user.h

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index f7561e23fa..2c46c74128 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -73,7 +73,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
     Error *err = NULL;
     int ret;
 
-    if (!vs->conf.chardev.chr) {
+    if (!vs->conf.vhost_user.chr.chr) {
         error_setg(errp, "vhost-user-scsi: missing chardev");
         return;
     }
@@ -91,7 +91,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
     vsc->dev.vq_index = 0;
     vsc->dev.backend_features = 0;
 
-    ret = vhost_dev_init(&vsc->dev, (void *)&vs->conf.chardev,
+    ret = vhost_dev_init(&vsc->dev, (void *)&vs->conf.vhost_user,
                          VHOST_BACKEND_TYPE_USER, 0);
     if (ret < 0) {
         error_setg(errp, "vhost-user-scsi: vhost initialization failed: %s",
@@ -132,7 +132,7 @@ static uint64_t vhost_user_scsi_get_features(VirtIODevice *vdev,
 }
 
 static Property vhost_user_scsi_properties[] = {
-    DEFINE_PROP_CHR("chardev", VirtIOSCSICommon, conf.chardev),
+    DEFINE_PROP_CHR("chardev", VirtIOSCSICommon, conf.vhost_user.chr),
     DEFINE_PROP_UINT32("boot_tpgt", VirtIOSCSICommon, conf.boot_tpgt, 0),
     DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues, 1),
     DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSICommon, conf.virtqueue_size,
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index e7108138fd..3e308d0a62 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -12,6 +12,7 @@
 #include "qapi/error.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-backend.h"
+#include "hw/virtio/vhost-user.h"
 #include "hw/virtio/virtio-net.h"
 #include "chardev/char-fe.h"
 #include "sysemu/kvm.h"
@@ -123,7 +124,7 @@ static VhostUserMsg m __attribute__ ((unused));
 #define VHOST_USER_VERSION    (0x1)
 
 struct vhost_user {
-    CharBackend *chr;
+    VhostUser *shared;
     int slave_fd;
 };
 
@@ -135,7 +136,7 @@ static bool ioeventfd_enabled(void)
 static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
 {
     struct vhost_user *u = dev->opaque;
-    CharBackend *chr = u->chr;
+    CharBackend *chr = &u->shared->chr;
     uint8_t *p = (uint8_t *) msg;
     int r, size = VHOST_USER_HDR_SIZE;
 
@@ -221,7 +222,7 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
                             int *fds, int fd_num)
 {
     struct vhost_user *u = dev->opaque;
-    CharBackend *chr = u->chr;
+    CharBackend *chr = &u->shared->chr;
     int ret, size = VHOST_USER_HDR_SIZE + msg->size;
 
     /*
@@ -767,7 +768,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
     u = g_new0(struct vhost_user, 1);
-    u->chr = opaque;
+    u->shared = opaque;
     u->slave_fd = -1;
     dev->opaque = u;
 
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
new file mode 100644
index 0000000000..4f5a1477d1
--- /dev/null
+++ b/include/hw/virtio/vhost-user.h
@@ -0,0 +1,17 @@
+/*
+ * Copyright (c) 2017-2018 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_VIRTIO_VHOST_USER_H
+#define HW_VIRTIO_VHOST_USER_H
+
+#include "chardev/char-fe.h"
+
+typedef struct VhostUser {
+    CharBackend chr;
+} VhostUser;
+
+#endif
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 4c0bcdb788..885c3e84b5 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -19,6 +19,7 @@
 #define VIRTIO_SCSI_SENSE_SIZE 0
 #include "standard-headers/linux/virtio_scsi.h"
 #include "hw/virtio/virtio.h"
+#include "hw/virtio/vhost-user.h"
 #include "hw/pci/pci.h"
 #include "hw/scsi/scsi.h"
 #include "chardev/char-fe.h"
@@ -54,7 +55,10 @@ struct VirtIOSCSIConf {
     char *vhostfd;
     char *wwpn;
 #endif
-    CharBackend chardev;
+    union {
+        VhostUser vhost_user;
+        CharBackend chardev;
+    };
     uint32_t boot_tpgt;
     IOThread *iothread;
 };
diff --git a/net/vhost-user.c b/net/vhost-user.c
index c23927c912..b398294074 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -12,6 +12,7 @@
 #include "clients.h"
 #include "net/vhost_net.h"
 #include "net/vhost-user.h"
+#include "hw/virtio/vhost-user.h"
 #include "chardev/char-fe.h"
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
@@ -20,7 +21,7 @@
 
 typedef struct VhostUserState {
     NetClientState nc;
-    CharBackend chr; /* only queue index 0 */
+    VhostUser vhost_user; /* only queue index 0 */
     VHostNetState *vhost_net;
     guint watch;
     uint64_t acked_features;
@@ -62,7 +63,7 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
     }
 }
 
-static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
+static int vhost_user_start(int queues, NetClientState *ncs[], void *be)
 {
     VhostNetOptions options;
     struct vhost_net *net = NULL;
@@ -155,7 +156,7 @@ static void vhost_user_cleanup(NetClientState *nc)
             g_source_remove(s->watch);
             s->watch = 0;
         }
-        qemu_chr_fe_deinit(&s->chr, true);
+        qemu_chr_fe_deinit(&s->vhost_user.chr, true);
     }
 
     qemu_purge_queued_packets(nc);
@@ -189,7 +190,7 @@ static gboolean net_vhost_user_watch(GIOChannel *chan, GIOCondition cond,
 {
     VhostUserState *s = opaque;
 
-    qemu_chr_fe_disconnect(&s->chr);
+    qemu_chr_fe_disconnect(&s->vhost_user.chr);
 
     return TRUE;
 }
@@ -214,7 +215,8 @@ static void chr_closed_bh(void *opaque)
     qmp_set_link(name, false, &err);
     vhost_user_stop(queues, ncs);
 
-    qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, net_vhost_user_event,
+    qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL,
+                             net_vhost_user_event,
                              NULL, opaque, NULL, true);
 
     if (err) {
@@ -237,15 +239,15 @@ static void net_vhost_user_event(void *opaque, int event)
     assert(queues < MAX_QUEUE_NUM);
 
     s = DO_UPCAST(VhostUserState, nc, ncs[0]);
-    chr = qemu_chr_fe_get_driver(&s->chr);
+    chr = qemu_chr_fe_get_driver(&s->vhost_user.chr);
     trace_vhost_user_event(chr->label, event);
     switch (event) {
     case CHR_EVENT_OPENED:
-        if (vhost_user_start(queues, ncs, &s->chr) < 0) {
-            qemu_chr_fe_disconnect(&s->chr);
+        if (vhost_user_start(queues, ncs, &s->vhost_user) < 0) {
+            qemu_chr_fe_disconnect(&s->vhost_user.chr);
             return;
         }
-        s->watch = qemu_chr_fe_add_watch(&s->chr, G_IO_HUP,
+        s->watch = qemu_chr_fe_add_watch(&s->vhost_user.chr, G_IO_HUP,
                                          net_vhost_user_watch, s);
         qmp_set_link(name, true, &err);
         s->started = true;
@@ -261,8 +263,8 @@ static void net_vhost_user_event(void *opaque, int event)
 
             g_source_remove(s->watch);
             s->watch = 0;
-            qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
-                                     NULL, NULL, false);
+            qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL, NULL,
+                                     NULL, NULL, NULL, false);
 
             aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
         }
@@ -294,7 +296,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
         if (!nc0) {
             nc0 = nc;
             s = DO_UPCAST(VhostUserState, nc, nc);
-            if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
+            if (!qemu_chr_fe_init(&s->vhost_user.chr, chr, &err)) {
                 error_report_err(err);
                 return -1;
             }
@@ -304,11 +306,11 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
 
     s = DO_UPCAST(VhostUserState, nc, nc0);
     do {
-        if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
+        if (qemu_chr_fe_wait_connected(&s->vhost_user.chr, &err) < 0) {
             error_report_err(err);
             return -1;
         }
-        qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
+        qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL,
                                  net_vhost_user_event, NULL, nc0->name, NULL,
                                  true);
     } while (!s->started);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [virtio-dev] [PATCH v1 2/6] vhost-user: introduce shared vhost-user state
@ 2018-01-25  4:03   ` Tiwei Bie
  0 siblings, 0 replies; 39+ messages in thread
From: Tiwei Bie @ 2018-01-25  4:03 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

When multi-queue is enabled for virtio-net, each virtio
queue pair will have a vhost_dev, and the only thing they
share currently is the chardev. This patch introduces a
vhost-user state structure which will be shared by all
virtio queue pairs of the same virtio device.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/scsi/vhost-user-scsi.c       |  6 +++---
 hw/virtio/vhost-user.c          |  9 +++++----
 include/hw/virtio/vhost-user.h  | 17 +++++++++++++++++
 include/hw/virtio/virtio-scsi.h |  6 +++++-
 net/vhost-user.c                | 30 ++++++++++++++++--------------
 5 files changed, 46 insertions(+), 22 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user.h

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index f7561e23fa..2c46c74128 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -73,7 +73,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
     Error *err = NULL;
     int ret;
 
-    if (!vs->conf.chardev.chr) {
+    if (!vs->conf.vhost_user.chr.chr) {
         error_setg(errp, "vhost-user-scsi: missing chardev");
         return;
     }
@@ -91,7 +91,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
     vsc->dev.vq_index = 0;
     vsc->dev.backend_features = 0;
 
-    ret = vhost_dev_init(&vsc->dev, (void *)&vs->conf.chardev,
+    ret = vhost_dev_init(&vsc->dev, (void *)&vs->conf.vhost_user,
                          VHOST_BACKEND_TYPE_USER, 0);
     if (ret < 0) {
         error_setg(errp, "vhost-user-scsi: vhost initialization failed: %s",
@@ -132,7 +132,7 @@ static uint64_t vhost_user_scsi_get_features(VirtIODevice *vdev,
 }
 
 static Property vhost_user_scsi_properties[] = {
-    DEFINE_PROP_CHR("chardev", VirtIOSCSICommon, conf.chardev),
+    DEFINE_PROP_CHR("chardev", VirtIOSCSICommon, conf.vhost_user.chr),
     DEFINE_PROP_UINT32("boot_tpgt", VirtIOSCSICommon, conf.boot_tpgt, 0),
     DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues, 1),
     DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSICommon, conf.virtqueue_size,
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index e7108138fd..3e308d0a62 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -12,6 +12,7 @@
 #include "qapi/error.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-backend.h"
+#include "hw/virtio/vhost-user.h"
 #include "hw/virtio/virtio-net.h"
 #include "chardev/char-fe.h"
 #include "sysemu/kvm.h"
@@ -123,7 +124,7 @@ static VhostUserMsg m __attribute__ ((unused));
 #define VHOST_USER_VERSION    (0x1)
 
 struct vhost_user {
-    CharBackend *chr;
+    VhostUser *shared;
     int slave_fd;
 };
 
@@ -135,7 +136,7 @@ static bool ioeventfd_enabled(void)
 static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
 {
     struct vhost_user *u = dev->opaque;
-    CharBackend *chr = u->chr;
+    CharBackend *chr = &u->shared->chr;
     uint8_t *p = (uint8_t *) msg;
     int r, size = VHOST_USER_HDR_SIZE;
 
@@ -221,7 +222,7 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
                             int *fds, int fd_num)
 {
     struct vhost_user *u = dev->opaque;
-    CharBackend *chr = u->chr;
+    CharBackend *chr = &u->shared->chr;
     int ret, size = VHOST_USER_HDR_SIZE + msg->size;
 
     /*
@@ -767,7 +768,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
     u = g_new0(struct vhost_user, 1);
-    u->chr = opaque;
+    u->shared = opaque;
     u->slave_fd = -1;
     dev->opaque = u;
 
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
new file mode 100644
index 0000000000..4f5a1477d1
--- /dev/null
+++ b/include/hw/virtio/vhost-user.h
@@ -0,0 +1,17 @@
+/*
+ * Copyright (c) 2017-2018 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_VIRTIO_VHOST_USER_H
+#define HW_VIRTIO_VHOST_USER_H
+
+#include "chardev/char-fe.h"
+
+typedef struct VhostUser {
+    CharBackend chr;
+} VhostUser;
+
+#endif
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 4c0bcdb788..885c3e84b5 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -19,6 +19,7 @@
 #define VIRTIO_SCSI_SENSE_SIZE 0
 #include "standard-headers/linux/virtio_scsi.h"
 #include "hw/virtio/virtio.h"
+#include "hw/virtio/vhost-user.h"
 #include "hw/pci/pci.h"
 #include "hw/scsi/scsi.h"
 #include "chardev/char-fe.h"
@@ -54,7 +55,10 @@ struct VirtIOSCSIConf {
     char *vhostfd;
     char *wwpn;
 #endif
-    CharBackend chardev;
+    union {
+        VhostUser vhost_user;
+        CharBackend chardev;
+    };
     uint32_t boot_tpgt;
     IOThread *iothread;
 };
diff --git a/net/vhost-user.c b/net/vhost-user.c
index c23927c912..b398294074 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -12,6 +12,7 @@
 #include "clients.h"
 #include "net/vhost_net.h"
 #include "net/vhost-user.h"
+#include "hw/virtio/vhost-user.h"
 #include "chardev/char-fe.h"
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
@@ -20,7 +21,7 @@
 
 typedef struct VhostUserState {
     NetClientState nc;
-    CharBackend chr; /* only queue index 0 */
+    VhostUser vhost_user; /* only queue index 0 */
     VHostNetState *vhost_net;
     guint watch;
     uint64_t acked_features;
@@ -62,7 +63,7 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
     }
 }
 
-static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
+static int vhost_user_start(int queues, NetClientState *ncs[], void *be)
 {
     VhostNetOptions options;
     struct vhost_net *net = NULL;
@@ -155,7 +156,7 @@ static void vhost_user_cleanup(NetClientState *nc)
             g_source_remove(s->watch);
             s->watch = 0;
         }
-        qemu_chr_fe_deinit(&s->chr, true);
+        qemu_chr_fe_deinit(&s->vhost_user.chr, true);
     }
 
     qemu_purge_queued_packets(nc);
@@ -189,7 +190,7 @@ static gboolean net_vhost_user_watch(GIOChannel *chan, GIOCondition cond,
 {
     VhostUserState *s = opaque;
 
-    qemu_chr_fe_disconnect(&s->chr);
+    qemu_chr_fe_disconnect(&s->vhost_user.chr);
 
     return TRUE;
 }
@@ -214,7 +215,8 @@ static void chr_closed_bh(void *opaque)
     qmp_set_link(name, false, &err);
     vhost_user_stop(queues, ncs);
 
-    qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, net_vhost_user_event,
+    qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL,
+                             net_vhost_user_event,
                              NULL, opaque, NULL, true);
 
     if (err) {
@@ -237,15 +239,15 @@ static void net_vhost_user_event(void *opaque, int event)
     assert(queues < MAX_QUEUE_NUM);
 
     s = DO_UPCAST(VhostUserState, nc, ncs[0]);
-    chr = qemu_chr_fe_get_driver(&s->chr);
+    chr = qemu_chr_fe_get_driver(&s->vhost_user.chr);
     trace_vhost_user_event(chr->label, event);
     switch (event) {
     case CHR_EVENT_OPENED:
-        if (vhost_user_start(queues, ncs, &s->chr) < 0) {
-            qemu_chr_fe_disconnect(&s->chr);
+        if (vhost_user_start(queues, ncs, &s->vhost_user) < 0) {
+            qemu_chr_fe_disconnect(&s->vhost_user.chr);
             return;
         }
-        s->watch = qemu_chr_fe_add_watch(&s->chr, G_IO_HUP,
+        s->watch = qemu_chr_fe_add_watch(&s->vhost_user.chr, G_IO_HUP,
                                          net_vhost_user_watch, s);
         qmp_set_link(name, true, &err);
         s->started = true;
@@ -261,8 +263,8 @@ static void net_vhost_user_event(void *opaque, int event)
 
             g_source_remove(s->watch);
             s->watch = 0;
-            qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
-                                     NULL, NULL, false);
+            qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL, NULL,
+                                     NULL, NULL, NULL, false);
 
             aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
         }
@@ -294,7 +296,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
         if (!nc0) {
             nc0 = nc;
             s = DO_UPCAST(VhostUserState, nc, nc);
-            if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
+            if (!qemu_chr_fe_init(&s->vhost_user.chr, chr, &err)) {
                 error_report_err(err);
                 return -1;
             }
@@ -304,11 +306,11 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
 
     s = DO_UPCAST(VhostUserState, nc, nc0);
     do {
-        if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
+        if (qemu_chr_fe_wait_connected(&s->vhost_user.chr, &err) < 0) {
             error_report_err(err);
             return -1;
         }
-        qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
+        qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL,
                                  net_vhost_user_event, NULL, nc0->name, NULL,
                                  true);
     } while (!s->started);
-- 
2.13.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v1 3/6] virtio: support adding sub-regions for notify region
  2018-01-25  4:03 ` [virtio-dev] " Tiwei Bie
@ 2018-01-25  4:03   ` Tiwei Bie
  -1 siblings, 0 replies; 39+ messages in thread
From: Tiwei Bie @ 2018-01-25  4:03 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

Provide APIs to support querying whether the page-per-vq
is enabled and adding sub-regions for notify region.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 Makefile.target            |  4 ++++
 hw/virtio/virtio-pci.c     | 48 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/virtio/virtio-pci.h     |  5 +++++
 hw/virtio/virtio.c         | 39 +++++++++++++++++++++++++++++++++++++
 include/hw/virtio/virtio.h |  5 +++++
 include/qemu/osdep.h       |  1 +
 6 files changed, 102 insertions(+)

diff --git a/Makefile.target b/Makefile.target
index f9a9da7e7c..bdde6f94d2 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -39,6 +39,9 @@ STPFILES=
 config-target.h: config-target.h-timestamp
 config-target.h-timestamp: config-target.mak
 
+config-devices.h: config-devices.h-timestamp
+config-devices.h-timestamp: config-devices.mak
+
 ifdef CONFIG_TRACE_SYSTEMTAP
 stap: $(QEMU_PROG).stp-installed $(QEMU_PROG).stp $(QEMU_PROG)-simpletrace.stp
 
@@ -224,4 +227,5 @@ ifdef CONFIG_TRACE_SYSTEMTAP
 endif
 
 GENERATED_FILES += config-target.h
+GENERATED_FILES += config-devices.h
 Makefile: $(GENERATED_FILES)
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index e92837c42b..a7d790c411 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1534,6 +1534,54 @@ static void virtio_pci_modern_io_region_unmap(VirtIOPCIProxy *proxy,
                                 &region->mr);
 }
 
+static VirtIOPCIProxy *virtio_device_to_virtio_pci_proxy(VirtIODevice *vdev)
+{
+    VirtIOPCIProxy *proxy = NULL;
+
+    if (vdev->device_id == VIRTIO_ID_NET) {
+        VirtIONetPCI *d = container_of(vdev, VirtIONetPCI, vdev.parent_obj);
+        proxy = &d->parent_obj;
+    }
+
+    return proxy;
+}
+
+bool virtio_pci_page_per_vq_enabled(VirtIODevice *vdev)
+{
+    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
+
+    if (proxy == NULL) {
+        return false;
+    }
+
+    return !!(proxy->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ);
+}
+
+int virtio_pci_notify_region_map(VirtIODevice *vdev, int queue_idx,
+                                 MemoryRegion *mr)
+{
+    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
+    int offset;
+
+    if (proxy == NULL || !virtio_pci_modern(proxy)) {
+        return -1;
+    }
+
+    offset = virtio_pci_queue_mem_mult(proxy) * queue_idx;
+    memory_region_add_subregion(&proxy->notify.mr, offset, mr);
+
+    return 0;
+}
+
+void virtio_pci_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr)
+{
+    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
+
+    if (proxy != NULL) {
+        memory_region_del_subregion(&proxy->notify.mr, mr);
+    }
+}
+
 static void virtio_pci_pre_plugged(DeviceState *d, Error **errp)
 {
     VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index 12d3a90686..41dbc3d274 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -209,6 +209,11 @@ static inline void virtio_pci_disable_modern(VirtIOPCIProxy *proxy)
     proxy->disable_modern = true;
 }
 
+bool virtio_pci_page_per_vq_enabled(VirtIODevice *vdev);
+int virtio_pci_notify_region_map(VirtIODevice *vdev, int queue_idx,
+                                 MemoryRegion *mr);
+void virtio_pci_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr);
+
 /*
  * virtio-scsi-pci: This extends VirtioPCIProxy.
  */
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index ad564b0132..325349c8b9 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -22,6 +22,7 @@
 #include "qemu/atomic.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
+#include "hw/virtio/virtio-pci.h"
 #include "sysemu/dma.h"
 
 /*
@@ -2661,6 +2662,44 @@ void virtio_device_release_ioeventfd(VirtIODevice *vdev)
     virtio_bus_release_ioeventfd(vbus);
 }
 
+bool virtio_device_parent_is_pci_device(VirtIODevice *vdev)
+{
+    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
+    const char *typename = object_get_typename(OBJECT(qbus->parent));
+
+    return strstr(typename, "pci") != NULL;
+}
+
+bool virtio_device_page_per_vq_enabled(VirtIODevice *vdev)
+{
+#ifdef CONFIG_VIRTIO_PCI
+    if (virtio_device_parent_is_pci_device(vdev)) {
+        return virtio_pci_page_per_vq_enabled(vdev);
+    }
+#endif
+    return false;
+}
+
+int virtio_device_notify_region_map(VirtIODevice *vdev, int queue_idx,
+                                    MemoryRegion *mr)
+{
+#ifdef CONFIG_VIRTIO_PCI
+    if (virtio_device_parent_is_pci_device(vdev)) {
+        return virtio_pci_notify_region_map(vdev, queue_idx, mr);
+    }
+#endif
+    return -1;
+}
+
+void virtio_device_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr)
+{
+#ifdef CONFIG_VIRTIO_PCI
+    if (virtio_device_parent_is_pci_device(vdev)) {
+        virtio_pci_notify_region_unmap(vdev, mr);
+    }
+#endif
+}
+
 static void virtio_device_class_init(ObjectClass *klass, void *data)
 {
     /* Set the default value here. */
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 098bdaaea3..b14accdb08 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -285,6 +285,11 @@ void virtio_device_stop_ioeventfd(VirtIODevice *vdev);
 int virtio_device_grab_ioeventfd(VirtIODevice *vdev);
 void virtio_device_release_ioeventfd(VirtIODevice *vdev);
 bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev);
+bool virtio_device_parent_is_pci_device(VirtIODevice *vdev);
+bool virtio_device_page_per_vq_enabled(VirtIODevice *vdev);
+int virtio_device_notify_region_map(VirtIODevice *vdev, int queue_idx,
+                                    MemoryRegion *mr);
+void virtio_device_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr);
 EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq);
 void virtio_queue_host_notifier_read(EventNotifier *n);
 void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index e8568a0a54..1975db7dac 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -30,6 +30,7 @@
 #include "config-host.h"
 #ifdef NEED_CPU_H
 #include "config-target.h"
+#include "config-devices.h"
 #else
 #include "exec/poison.h"
 #endif
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [virtio-dev] [PATCH v1 3/6] virtio: support adding sub-regions for notify region
@ 2018-01-25  4:03   ` Tiwei Bie
  0 siblings, 0 replies; 39+ messages in thread
From: Tiwei Bie @ 2018-01-25  4:03 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

Provide APIs to support querying whether the page-per-vq
is enabled and adding sub-regions for notify region.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 Makefile.target            |  4 ++++
 hw/virtio/virtio-pci.c     | 48 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/virtio/virtio-pci.h     |  5 +++++
 hw/virtio/virtio.c         | 39 +++++++++++++++++++++++++++++++++++++
 include/hw/virtio/virtio.h |  5 +++++
 include/qemu/osdep.h       |  1 +
 6 files changed, 102 insertions(+)

diff --git a/Makefile.target b/Makefile.target
index f9a9da7e7c..bdde6f94d2 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -39,6 +39,9 @@ STPFILES=
 config-target.h: config-target.h-timestamp
 config-target.h-timestamp: config-target.mak
 
+config-devices.h: config-devices.h-timestamp
+config-devices.h-timestamp: config-devices.mak
+
 ifdef CONFIG_TRACE_SYSTEMTAP
 stap: $(QEMU_PROG).stp-installed $(QEMU_PROG).stp $(QEMU_PROG)-simpletrace.stp
 
@@ -224,4 +227,5 @@ ifdef CONFIG_TRACE_SYSTEMTAP
 endif
 
 GENERATED_FILES += config-target.h
+GENERATED_FILES += config-devices.h
 Makefile: $(GENERATED_FILES)
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index e92837c42b..a7d790c411 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1534,6 +1534,54 @@ static void virtio_pci_modern_io_region_unmap(VirtIOPCIProxy *proxy,
                                 &region->mr);
 }
 
+static VirtIOPCIProxy *virtio_device_to_virtio_pci_proxy(VirtIODevice *vdev)
+{
+    VirtIOPCIProxy *proxy = NULL;
+
+    if (vdev->device_id == VIRTIO_ID_NET) {
+        VirtIONetPCI *d = container_of(vdev, VirtIONetPCI, vdev.parent_obj);
+        proxy = &d->parent_obj;
+    }
+
+    return proxy;
+}
+
+bool virtio_pci_page_per_vq_enabled(VirtIODevice *vdev)
+{
+    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
+
+    if (proxy == NULL) {
+        return false;
+    }
+
+    return !!(proxy->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ);
+}
+
+int virtio_pci_notify_region_map(VirtIODevice *vdev, int queue_idx,
+                                 MemoryRegion *mr)
+{
+    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
+    int offset;
+
+    if (proxy == NULL || !virtio_pci_modern(proxy)) {
+        return -1;
+    }
+
+    offset = virtio_pci_queue_mem_mult(proxy) * queue_idx;
+    memory_region_add_subregion(&proxy->notify.mr, offset, mr);
+
+    return 0;
+}
+
+void virtio_pci_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr)
+{
+    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
+
+    if (proxy != NULL) {
+        memory_region_del_subregion(&proxy->notify.mr, mr);
+    }
+}
+
 static void virtio_pci_pre_plugged(DeviceState *d, Error **errp)
 {
     VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index 12d3a90686..41dbc3d274 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -209,6 +209,11 @@ static inline void virtio_pci_disable_modern(VirtIOPCIProxy *proxy)
     proxy->disable_modern = true;
 }
 
+bool virtio_pci_page_per_vq_enabled(VirtIODevice *vdev);
+int virtio_pci_notify_region_map(VirtIODevice *vdev, int queue_idx,
+                                 MemoryRegion *mr);
+void virtio_pci_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr);
+
 /*
  * virtio-scsi-pci: This extends VirtioPCIProxy.
  */
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index ad564b0132..325349c8b9 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -22,6 +22,7 @@
 #include "qemu/atomic.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
+#include "hw/virtio/virtio-pci.h"
 #include "sysemu/dma.h"
 
 /*
@@ -2661,6 +2662,44 @@ void virtio_device_release_ioeventfd(VirtIODevice *vdev)
     virtio_bus_release_ioeventfd(vbus);
 }
 
+bool virtio_device_parent_is_pci_device(VirtIODevice *vdev)
+{
+    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
+    const char *typename = object_get_typename(OBJECT(qbus->parent));
+
+    return strstr(typename, "pci") != NULL;
+}
+
+bool virtio_device_page_per_vq_enabled(VirtIODevice *vdev)
+{
+#ifdef CONFIG_VIRTIO_PCI
+    if (virtio_device_parent_is_pci_device(vdev)) {
+        return virtio_pci_page_per_vq_enabled(vdev);
+    }
+#endif
+    return false;
+}
+
+int virtio_device_notify_region_map(VirtIODevice *vdev, int queue_idx,
+                                    MemoryRegion *mr)
+{
+#ifdef CONFIG_VIRTIO_PCI
+    if (virtio_device_parent_is_pci_device(vdev)) {
+        return virtio_pci_notify_region_map(vdev, queue_idx, mr);
+    }
+#endif
+    return -1;
+}
+
+void virtio_device_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr)
+{
+#ifdef CONFIG_VIRTIO_PCI
+    if (virtio_device_parent_is_pci_device(vdev)) {
+        virtio_pci_notify_region_unmap(vdev, mr);
+    }
+#endif
+}
+
 static void virtio_device_class_init(ObjectClass *klass, void *data)
 {
     /* Set the default value here. */
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 098bdaaea3..b14accdb08 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -285,6 +285,11 @@ void virtio_device_stop_ioeventfd(VirtIODevice *vdev);
 int virtio_device_grab_ioeventfd(VirtIODevice *vdev);
 void virtio_device_release_ioeventfd(VirtIODevice *vdev);
 bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev);
+bool virtio_device_parent_is_pci_device(VirtIODevice *vdev);
+bool virtio_device_page_per_vq_enabled(VirtIODevice *vdev);
+int virtio_device_notify_region_map(VirtIODevice *vdev, int queue_idx,
+                                    MemoryRegion *mr);
+void virtio_device_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr);
 EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq);
 void virtio_queue_host_notifier_read(EventNotifier *n);
 void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index e8568a0a54..1975db7dac 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -30,6 +30,7 @@
 #include "config-host.h"
 #ifdef NEED_CPU_H
 #include "config-target.h"
+#include "config-devices.h"
 #else
 #include "exec/poison.h"
 #endif
-- 
2.13.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v1 4/6] vfio: support getting VFIOGroup from groupfd
  2018-01-25  4:03 ` [virtio-dev] " Tiwei Bie
@ 2018-01-25  4:03   ` Tiwei Bie
  -1 siblings, 0 replies; 39+ messages in thread
From: Tiwei Bie @ 2018-01-25  4:03 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

Add an API to support getting VFIOGroup from groupfd. When
groupfd is shared by another process, the VFIOGroup may not
have its container and address space in QEMU.

Besides, add a reference counter to better support getting
VFIOGroup multiple times.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/vfio/common.c              | 96 ++++++++++++++++++++++++++++++++++++++++++-
 include/hw/vfio/vfio-common.h |  2 +
 2 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7b2924c0ef..027fa005c1 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -962,6 +962,11 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     int ret, fd;
     VFIOAddressSpace *space;
 
+    if (as == NULL) {
+        vfio_kvm_device_add_group(group);
+        return 0;
+    }
+
     space = vfio_get_address_space(as);
 
     QLIST_FOREACH(container, &space->containers, next) {
@@ -1153,6 +1158,10 @@ static void vfio_disconnect_container(VFIOGroup *group)
 {
     VFIOContainer *container = group->container;
 
+    if (container == NULL) {
+        return;
+    }
+
     if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, &container->fd)) {
         error_report("vfio: error disconnecting group %d from container",
                      group->groupid);
@@ -1183,6 +1192,85 @@ static void vfio_disconnect_container(VFIOGroup *group)
     }
 }
 
+static int vfio_groupfd_to_groupid(int groupfd)
+{
+    char linkname[PATH_MAX];
+    char pathname[PATH_MAX];
+    char *filename;
+    int groupid, ret;
+
+    snprintf(linkname, sizeof(linkname), "/proc/self/fd/%d", groupfd);
+
+    ret = readlink(linkname, pathname, sizeof(pathname));
+    if (ret < 0) {
+        return -1;
+    }
+
+    filename = g_path_get_basename(pathname);
+    groupid = atoi(filename);
+    g_free(filename);
+
+    return groupid;
+}
+
+/*
+ * The @as param could be NULL. In this case, groupfd is shared by
+ * another process which will setup the DMA mapping for this group,
+ * and this group won't have container and address space in QEMU.
+ */
+VFIOGroup *vfio_get_group_from_fd(int groupfd, AddressSpace *as, Error **errp)
+{
+    VFIOGroup *group;
+    int groupid;
+
+    groupid = vfio_groupfd_to_groupid(groupfd);
+    if (groupid < 0) {
+        return NULL;
+    }
+
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        if (group->groupid == groupid) {
+            /* Found it.  Now is it already in the right context? */
+            if ((group->container == NULL && as == NULL) ||
+                (group->container && group->container->space->as == as)) {
+                    group->refcnt++;
+                    return group;
+            }
+            error_setg(errp, "group %d used in multiple address spaces",
+                       group->groupid);
+            return NULL;
+        }
+    }
+
+    group = g_malloc0(sizeof(*group));
+
+    group->fd = groupfd;
+    group->groupid = groupid;
+    group->refcnt = 1;
+
+    QLIST_INIT(&group->device_list);
+
+    if (vfio_connect_container(group, as, errp)) {
+        error_prepend(errp, "failed to setup container for group %d: ",
+                      groupid);
+        goto free_group_exit;
+    }
+
+    if (QLIST_EMPTY(&vfio_group_list)) {
+        qemu_register_reset(vfio_reset_handler, NULL);
+    }
+
+    QLIST_INSERT_HEAD(&vfio_group_list, group, next);
+
+    return group;
+
+free_group_exit:
+    g_free(group);
+
+    return NULL;
+}
+
+/* The @as param cannot be NULL. */
 VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
 {
     VFIOGroup *group;
@@ -1192,7 +1280,8 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
     QLIST_FOREACH(group, &vfio_group_list, next) {
         if (group->groupid == groupid) {
             /* Found it.  Now is it already in the right context? */
-            if (group->container->space->as == as) {
+            if (group->container && group->container->space->as == as) {
+                group->refcnt++;
                 return group;
             } else {
                 error_setg(errp, "group %d used in multiple address spaces",
@@ -1225,6 +1314,7 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
     }
 
     group->groupid = groupid;
+    group->refcnt = 1;
     QLIST_INIT(&group->device_list);
 
     if (vfio_connect_container(group, as, errp)) {
@@ -1256,6 +1346,10 @@ void vfio_put_group(VFIOGroup *group)
         return;
     }
 
+    if (--group->refcnt > 0) {
+        return;
+    }
+
     vfio_kvm_device_del_group(group);
     vfio_disconnect_container(group);
     QLIST_REMOVE(group, next);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index f3a2ac9fee..2383ded670 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -136,6 +136,7 @@ struct VFIODeviceOps {
 typedef struct VFIOGroup {
     int fd;
     int groupid;
+    int refcnt;
     VFIOContainer *container;
     QLIST_HEAD(, VFIODevice) device_list;
     QLIST_ENTRY(VFIOGroup) next;
@@ -158,6 +159,7 @@ void vfio_region_exit(VFIORegion *region);
 void vfio_region_finalize(VFIORegion *region);
 void vfio_reset_handler(void *opaque);
 VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp);
+VFIOGroup *vfio_get_group_from_fd(int groupfd, AddressSpace *as, Error **errp);
 void vfio_put_group(VFIOGroup *group);
 int vfio_get_device(VFIOGroup *group, const char *name,
                     VFIODevice *vbasedev, Error **errp);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [virtio-dev] [PATCH v1 4/6] vfio: support getting VFIOGroup from groupfd
@ 2018-01-25  4:03   ` Tiwei Bie
  0 siblings, 0 replies; 39+ messages in thread
From: Tiwei Bie @ 2018-01-25  4:03 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

Add an API to support getting VFIOGroup from groupfd. When
groupfd is shared by another process, the VFIOGroup may not
have its container and address space in QEMU.

Besides, add a reference counter to better support getting
VFIOGroup multiple times.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/vfio/common.c              | 96 ++++++++++++++++++++++++++++++++++++++++++-
 include/hw/vfio/vfio-common.h |  2 +
 2 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7b2924c0ef..027fa005c1 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -962,6 +962,11 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     int ret, fd;
     VFIOAddressSpace *space;
 
+    if (as == NULL) {
+        vfio_kvm_device_add_group(group);
+        return 0;
+    }
+
     space = vfio_get_address_space(as);
 
     QLIST_FOREACH(container, &space->containers, next) {
@@ -1153,6 +1158,10 @@ static void vfio_disconnect_container(VFIOGroup *group)
 {
     VFIOContainer *container = group->container;
 
+    if (container == NULL) {
+        return;
+    }
+
     if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, &container->fd)) {
         error_report("vfio: error disconnecting group %d from container",
                      group->groupid);
@@ -1183,6 +1192,85 @@ static void vfio_disconnect_container(VFIOGroup *group)
     }
 }
 
+static int vfio_groupfd_to_groupid(int groupfd)
+{
+    char linkname[PATH_MAX];
+    char pathname[PATH_MAX];
+    char *filename;
+    int groupid, ret;
+
+    snprintf(linkname, sizeof(linkname), "/proc/self/fd/%d", groupfd);
+
+    ret = readlink(linkname, pathname, sizeof(pathname));
+    if (ret < 0) {
+        return -1;
+    }
+
+    filename = g_path_get_basename(pathname);
+    groupid = atoi(filename);
+    g_free(filename);
+
+    return groupid;
+}
+
+/*
+ * The @as param could be NULL. In this case, groupfd is shared by
+ * another process which will setup the DMA mapping for this group,
+ * and this group won't have container and address space in QEMU.
+ */
+VFIOGroup *vfio_get_group_from_fd(int groupfd, AddressSpace *as, Error **errp)
+{
+    VFIOGroup *group;
+    int groupid;
+
+    groupid = vfio_groupfd_to_groupid(groupfd);
+    if (groupid < 0) {
+        return NULL;
+    }
+
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        if (group->groupid == groupid) {
+            /* Found it.  Now is it already in the right context? */
+            if ((group->container == NULL && as == NULL) ||
+                (group->container && group->container->space->as == as)) {
+                    group->refcnt++;
+                    return group;
+            }
+            error_setg(errp, "group %d used in multiple address spaces",
+                       group->groupid);
+            return NULL;
+        }
+    }
+
+    group = g_malloc0(sizeof(*group));
+
+    group->fd = groupfd;
+    group->groupid = groupid;
+    group->refcnt = 1;
+
+    QLIST_INIT(&group->device_list);
+
+    if (vfio_connect_container(group, as, errp)) {
+        error_prepend(errp, "failed to setup container for group %d: ",
+                      groupid);
+        goto free_group_exit;
+    }
+
+    if (QLIST_EMPTY(&vfio_group_list)) {
+        qemu_register_reset(vfio_reset_handler, NULL);
+    }
+
+    QLIST_INSERT_HEAD(&vfio_group_list, group, next);
+
+    return group;
+
+free_group_exit:
+    g_free(group);
+
+    return NULL;
+}
+
+/* The @as param cannot be NULL. */
 VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
 {
     VFIOGroup *group;
@@ -1192,7 +1280,8 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
     QLIST_FOREACH(group, &vfio_group_list, next) {
         if (group->groupid == groupid) {
             /* Found it.  Now is it already in the right context? */
-            if (group->container->space->as == as) {
+            if (group->container && group->container->space->as == as) {
+                group->refcnt++;
                 return group;
             } else {
                 error_setg(errp, "group %d used in multiple address spaces",
@@ -1225,6 +1314,7 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
     }
 
     group->groupid = groupid;
+    group->refcnt = 1;
     QLIST_INIT(&group->device_list);
 
     if (vfio_connect_container(group, as, errp)) {
@@ -1256,6 +1346,10 @@ void vfio_put_group(VFIOGroup *group)
         return;
     }
 
+    if (--group->refcnt > 0) {
+        return;
+    }
+
     vfio_kvm_device_del_group(group);
     vfio_disconnect_container(group);
     QLIST_REMOVE(group, next);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index f3a2ac9fee..2383ded670 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -136,6 +136,7 @@ struct VFIODeviceOps {
 typedef struct VFIOGroup {
     int fd;
     int groupid;
+    int refcnt;
     VFIOContainer *container;
     QLIST_HEAD(, VFIODevice) device_list;
     QLIST_ENTRY(VFIOGroup) next;
@@ -158,6 +159,7 @@ void vfio_region_exit(VFIORegion *region);
 void vfio_region_finalize(VFIORegion *region);
 void vfio_reset_handler(void *opaque);
 VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp);
+VFIOGroup *vfio_get_group_from_fd(int groupfd, AddressSpace *as, Error **errp);
 void vfio_put_group(VFIOGroup *group);
 int vfio_get_device(VFIOGroup *group, const char *name,
                     VFIODevice *vbasedev, Error **errp);
-- 
2.13.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v1 5/6] vfio: remove DPRINTF() definition from vfio-common.h
  2018-01-25  4:03 ` [virtio-dev] " Tiwei Bie
@ 2018-01-25  4:03   ` Tiwei Bie
  -1 siblings, 0 replies; 39+ messages in thread
From: Tiwei Bie @ 2018-01-25  4:03 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

This macro isn't used by any VFIO code. And its name is
too generic. The vfio-common.h (in include/hw/vfio) can
be included by other modules in QEMU. It can introduce
conflicts.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 include/hw/vfio/vfio-common.h | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 2383ded670..3fe99636f3 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -33,15 +33,6 @@
 #define ERR_PREFIX "vfio error: %s: "
 #define WARN_PREFIX "vfio warning: %s: "
 
-/*#define DEBUG_VFIO*/
-#ifdef DEBUG_VFIO
-#define DPRINTF(fmt, ...) \
-    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...) \
-    do { } while (0)
-#endif
-
 enum {
     VFIO_DEVICE_TYPE_PCI = 0,
     VFIO_DEVICE_TYPE_PLATFORM = 1,
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [virtio-dev] [PATCH v1 5/6] vfio: remove DPRINTF() definition from vfio-common.h
@ 2018-01-25  4:03   ` Tiwei Bie
  0 siblings, 0 replies; 39+ messages in thread
From: Tiwei Bie @ 2018-01-25  4:03 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

This macro isn't used by any VFIO code. And its name is
too generic. The vfio-common.h (in include/hw/vfio) can
be included by other modules in QEMU. It can introduce
conflicts.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 include/hw/vfio/vfio-common.h | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 2383ded670..3fe99636f3 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -33,15 +33,6 @@
 #define ERR_PREFIX "vfio error: %s: "
 #define WARN_PREFIX "vfio warning: %s: "
 
-/*#define DEBUG_VFIO*/
-#ifdef DEBUG_VFIO
-#define DPRINTF(fmt, ...) \
-    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...) \
-    do { } while (0)
-#endif
-
 enum {
     VFIO_DEVICE_TYPE_PCI = 0,
     VFIO_DEVICE_TYPE_PLATFORM = 1,
-- 
2.13.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
  2018-01-25  4:03 ` [virtio-dev] " Tiwei Bie
@ 2018-01-25  4:03   ` Tiwei Bie
  -1 siblings, 0 replies; 39+ messages in thread
From: Tiwei Bie @ 2018-01-25  4:03 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

This patch does some small extensions to vhost-user protocol to
support VFIO based accelerators, and makes it possible to get the
similar performance of VFIO based PCI passthru while keeping the
virtio device emulation in QEMU.

Any virtio ring compatible devices potentially can be used as the
vhost data path accelerators. We can setup the accelerator based
on the informations (e.g. memory table, features, ring info, etc)
available on the vhost backend. And accelerator will be able to use
the virtio ring provided by the virtio driver in the VM directly.
So the virtio driver in the VM can exchange e.g. network packets
with the accelerator directly via the virtio ring.

But for vhost-user, the critical issue in this case is that the
data path performance is relatively low and some host threads are
needed for the data path, because some necessary mechanisms are
missing to support:

1) guest driver notifies the device directly;
2) device interrupts the guest directly;

So this patch does some small extensions to vhost-user protocol
to make both of them possible. It leverages the same mechanisms
as the VFIO based PCI passthru.

A new protocol feature bit is added to negotiate the accelerator
feature support. Two new slave message types are added to control
the notify region and queue interrupt passthru for each queue.
>From the view of vhost-user protocol design, it's very flexible.
The passthru can be enabled/disabled for each queue individually,
and it's possible to accelerate each queue by different devices.

The key difference with PCI passthru is that, in this case only
the data path of the device (e.g. DMA ring, notify region and
queue interrupt) is pass-throughed to the VM, the device control
path (e.g. PCI configuration space and MMIO regions) is still
defined and emulated by QEMU.

The benefits of keeping virtio device emulation in QEMU compared
with virtio device PCI passthru include (but not limit to):

- consistent device interface for guest OS in the VM;
- max flexibility on the hardware (i.e. the accelerators) design;
- leveraging the existing virtio live-migration framework;

The virtual IOMMU isn't supported by the accelerators for now.
Because vhost-user currently lacks of an efficient way to share
the IOMMU table in VM to vhost backend. That's why the software
implementation of virtual IOMMU support in vhost-user backend
can't support dynamic mapping well. Once this problem is solved
in vhost-user, virtual IOMMU can be supported by accelerators
too, and the IOMMU feature bit checking in this patch can be
removed.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 docs/interop/vhost-user.txt    |  57 ++++++++++++
 hw/virtio/vhost-user.c         | 201 +++++++++++++++++++++++++++++++++++++++++
 include/hw/virtio/vhost-user.h |  17 ++++
 3 files changed, 275 insertions(+)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index 954771d0d8..15e917019a 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -116,6 +116,15 @@ Depending on the request type, payload can be:
     - 3: IOTLB invalidate
     - 4: IOTLB access fail
 
+ * Vring area description
+   -----------------------
+   | u64 | size | offset |
+   -----------------------
+
+   u64: a 64-bit unsigned integer
+   Size: a 64-bit size
+   Offset: a 64-bit offset
+
 In QEMU the vhost-user message is implemented with the following struct:
 
 typedef struct VhostUserMsg {
@@ -129,6 +138,7 @@ typedef struct VhostUserMsg {
         VhostUserMemory memory;
         VhostUserLog log;
         struct vhost_iotlb_msg iotlb;
+        VhostUserVringArea area;
     };
 } QEMU_PACKED VhostUserMsg;
 
@@ -317,6 +327,17 @@ The fd is provided via VHOST_USER_SET_SLAVE_REQ_FD ancillary data.
 A slave may then send VHOST_USER_SLAVE_* messages to the master
 using this fd communication channel.
 
+VFIO based accelerators
+-----------------------
+
+The VFIO based accelerators feature is a protocol extension. It is supported
+when the protocol feature VHOST_USER_PROTOCOL_F_VFIO (bit 7) is set.
+
+The vhost-user backend will set the accelerator context via slave channel,
+and QEMU just needs to handle those messages passively. The accelerator
+context will be set for each queue independently. So the page-per-vq property
+should also be enabled.
+
 Protocol features
 -----------------
 
@@ -327,6 +348,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_MTU            4
 #define VHOST_USER_PROTOCOL_F_SLAVE_REQ      5
 #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN   6
+#define VHOST_USER_PROTOCOL_F_VFIO           7
 
 Master message types
 --------------------
@@ -614,6 +636,41 @@ Slave message types
       This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
       has been successfully negotiated.
 
+ * VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG
+
+      Id: 2
+      Equivalent ioctl: N/A
+      Slave payload: u64
+      Master payload: N/A
+
+      Sets the VFIO group file descriptor which is passed as ancillary data
+      for a specified queue (queue index is carried in the u64 payload).
+      Slave sends this request to tell QEMU to add or delete a VFIO group.
+      QEMU will delete the current group if any for the specified queue when the
+      message is sent without a file descriptor. A VFIO group will be actually
+      deleted when its reference count reaches zero.
+      This request should be sent only when VHOST_USER_PROTOCOL_F_VFIO protocol
+      feature has been successfully negotiated.
+
+ * VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG
+
+      Id: 3
+      Equivalent ioctl: N/A
+      Slave payload: vring area description
+      Master payload: N/A
+
+      Sets the notify area for a specified queue (queue index is carried
+      in the u64 field of the vring area description). A file descriptor is
+      passed as ancillary data (typically it's a VFIO device fd). QEMU can
+      mmap the file descriptor based on the information carried in the vring
+      area description.
+      Slave sends this request to tell QEMU to add or delete a MemoryRegion
+      for a specified queue's notify MMIO region. QEMU will delete the current
+      MemoryRegion if any for the specified queue when the message is sent
+      without a file descriptor.
+      This request should be sent only when VHOST_USER_PROTOCOL_F_VFIO protocol
+      feature and VIRTIO_F_VERSION_1 feature have been successfully negotiated.
+
 VHOST_USER_PROTOCOL_F_REPLY_ACK:
 -------------------------------
 The original vhost-user specification only demands replies for certain
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 3e308d0a62..ec83746bd5 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -35,6 +35,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_NET_MTU = 4,
     VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5,
     VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6,
+    VHOST_USER_PROTOCOL_F_VFIO = 7,
 
     VHOST_USER_PROTOCOL_F_MAX
 };
@@ -72,6 +73,8 @@ typedef enum VhostUserRequest {
 typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_NONE = 0,
     VHOST_USER_SLAVE_IOTLB_MSG = 1,
+    VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG = 2,
+    VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG = 3,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -93,6 +96,12 @@ typedef struct VhostUserLog {
     uint64_t mmap_offset;
 } VhostUserLog;
 
+typedef struct VhostUserVringArea {
+    uint64_t u64;
+    uint64_t size;
+    uint64_t offset;
+} VhostUserVringArea;
+
 typedef struct VhostUserMsg {
     VhostUserRequest request;
 
@@ -110,6 +119,7 @@ typedef struct VhostUserMsg {
         VhostUserMemory memory;
         VhostUserLog log;
         struct vhost_iotlb_msg iotlb;
+        VhostUserVringArea area;
     } payload;
 } QEMU_PACKED VhostUserMsg;
 
@@ -415,9 +425,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
     return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
 }
 
+static void vhost_user_notify_region_remap(struct vhost_dev *dev, int queue_idx)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserVFIOState *vfio = &u->shared->vfio;
+    VhostUserNotifyCtx *notify = &vfio->notify[queue_idx];
+    VirtIODevice *vdev = dev->vdev;
+
+    if (notify->addr && !notify->mapped) {
+        virtio_device_notify_region_map(vdev, queue_idx, &notify->mr);
+        notify->mapped = true;
+    }
+}
+
+static void vhost_user_notify_region_unmap(struct vhost_dev *dev, int queue_idx)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserVFIOState *vfio = &u->shared->vfio;
+    VhostUserNotifyCtx *notify = &vfio->notify[queue_idx];
+    VirtIODevice *vdev = dev->vdev;
+
+    if (notify->addr && notify->mapped) {
+        virtio_device_notify_region_unmap(vdev, &notify->mr);
+        notify->mapped = false;
+    }
+}
+
 static int vhost_user_set_vring_base(struct vhost_dev *dev,
                                      struct vhost_vring_state *ring)
 {
+    vhost_user_notify_region_remap(dev, ring->index);
+
     return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
 }
 
@@ -451,6 +489,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
         .size = sizeof(msg.payload.state),
     };
 
+    vhost_user_notify_region_unmap(dev, ring->index);
+
     if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
         return -1;
     }
@@ -609,6 +649,136 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
     return 0;
 }
 
+static int vhost_user_handle_vring_vfio_group(struct vhost_dev *dev,
+                                              uint64_t u64,
+                                              int groupfd)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserVFIOState *vfio = &u->shared->vfio;
+    int queue_idx = u64 & VHOST_USER_VRING_IDX_MASK;
+    VirtIODevice *vdev = dev->vdev;
+    VFIOGroup *group;
+    int ret = 0;
+
+    qemu_mutex_lock(&vfio->lock);
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_VFIO) ||
+        vdev == NULL ||
+        virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM) ||
+        queue_idx >= virtio_get_num_queues(vdev)) {
+        ret = -1;
+        goto out;
+    }
+
+    if (vfio->group[queue_idx]) {
+        vfio_put_group(vfio->group[queue_idx]);
+        vfio->group[queue_idx] = NULL;
+    }
+
+    if (u64 & VHOST_USER_VRING_NOFD_MASK) {
+        goto out;
+    }
+
+    group = vfio_get_group_from_fd(groupfd, NULL, NULL);
+    if (group == NULL) {
+        ret = -1;
+        goto out;
+    }
+
+    if (group->fd != groupfd) {
+        close(groupfd);
+    }
+
+    vfio->group[queue_idx] = group;
+
+out:
+    kvm_irqchip_commit_routes(kvm_state);
+    qemu_mutex_unlock(&vfio->lock);
+
+    if (ret != 0 && groupfd != -1) {
+        close(groupfd);
+    }
+
+    return ret;
+}
+
+#define NOTIFY_PAGE_SIZE 0x1000
+
+static int vhost_user_handle_vring_notify_area(struct vhost_dev *dev,
+                                               VhostUserVringArea *area,
+                                               int fd)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserVFIOState *vfio = &u->shared->vfio;
+    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
+    VirtIODevice *vdev = dev->vdev;
+    VhostUserNotifyCtx *notify;
+    void *addr = NULL;
+    int ret = 0;
+    char *name;
+
+    qemu_mutex_lock(&vfio->lock);
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_VFIO) ||
+        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev) ||
+        virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM) ||
+        !virtio_device_page_per_vq_enabled(vdev)) {
+        ret = -1;
+        goto out;
+    }
+
+    notify = &vfio->notify[queue_idx];
+
+    if (notify->addr) {
+        virtio_device_notify_region_unmap(vdev, &notify->mr);
+        munmap(notify->addr, NOTIFY_PAGE_SIZE);
+        object_unparent(OBJECT(&notify->mr));
+        notify->addr = NULL;
+    }
+
+    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
+        goto out;
+    }
+
+    if (area->size < NOTIFY_PAGE_SIZE) {
+        ret = -1;
+        goto out;
+    }
+
+    addr = mmap(NULL, NOTIFY_PAGE_SIZE, PROT_READ | PROT_WRITE,
+                MAP_SHARED, fd, area->offset);
+    if (addr == MAP_FAILED) {
+        error_report("Can't map notify region.");
+        ret = -1;
+        goto out;
+    }
+
+    name = g_strdup_printf("vhost-user/vfio@%p mmaps[%d]", vfio, queue_idx);
+    memory_region_init_ram_device_ptr(&notify->mr, OBJECT(vdev), name,
+                                      NOTIFY_PAGE_SIZE, addr);
+    g_free(name);
+
+    if (virtio_device_notify_region_map(vdev, queue_idx, &notify->mr)) {
+        ret = -1;
+        goto out;
+    }
+
+    notify->addr = addr;
+    notify->mapped = true;
+
+out:
+    if (ret < 0 && addr != NULL) {
+        munmap(addr, NOTIFY_PAGE_SIZE);
+    }
+    if (fd != -1) {
+        close(fd);
+    }
+    qemu_mutex_unlock(&vfio->lock);
+    return ret;
+}
+
 static void slave_read(void *opaque)
 {
     struct vhost_dev *dev = opaque;
@@ -670,6 +840,12 @@ static void slave_read(void *opaque)
     case VHOST_USER_SLAVE_IOTLB_MSG:
         ret = vhost_backend_handle_iotlb_msg(dev, &msg.payload.iotlb);
         break;
+    case VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG:
+        ret = vhost_user_handle_vring_vfio_group(dev, msg.payload.u64, fd);
+        break;
+    case VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG:
+        ret = vhost_user_handle_vring_notify_area(dev, &msg.payload.area, fd);
+        break;
     default:
         error_report("Received unexpected msg type.");
         if (fd != -1) {
@@ -772,6 +948,10 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
     u->slave_fd = -1;
     dev->opaque = u;
 
+    if (dev->vq_index == 0) {
+        qemu_mutex_init(&u->shared->vfio.lock);
+    }
+
     err = vhost_user_get_features(dev, &features);
     if (err < 0) {
         return err;
@@ -832,6 +1012,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 static int vhost_user_cleanup(struct vhost_dev *dev)
 {
     struct vhost_user *u;
+    int i;
 
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
@@ -841,6 +1022,26 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
         close(u->slave_fd);
         u->slave_fd = -1;
     }
+
+    if (dev->vq_index == 0) {
+        VhostUserVFIOState *vfio = &u->shared->vfio;
+
+        for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
+            if (vfio->notify[i].addr) {
+                munmap(vfio->notify[i].addr, NOTIFY_PAGE_SIZE);
+                object_unparent(OBJECT(&vfio->notify[i].mr));
+                vfio->notify[i].addr = NULL;
+            }
+
+            if (vfio->group[i]) {
+                vfio_put_group(vfio->group[i]);
+                vfio->group[i] = NULL;
+            }
+        }
+
+        qemu_mutex_destroy(&u->shared->vfio.lock);
+    }
+
     g_free(u);
     dev->opaque = 0;
 
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index 4f5a1477d1..de8c647962 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -9,9 +9,26 @@
 #define HW_VIRTIO_VHOST_USER_H
 
 #include "chardev/char-fe.h"
+#include "hw/virtio/virtio.h"
+#include "hw/vfio/vfio-common.h"
+
+typedef struct VhostUserNotifyCtx {
+    void *addr;
+    MemoryRegion mr;
+    bool mapped;
+} VhostUserNotifyCtx;
+
+typedef struct VhostUserVFIOState {
+    /* The VFIO group associated with each queue */
+    VFIOGroup *group[VIRTIO_QUEUE_MAX];
+    /* The notify context of each queue */
+    VhostUserNotifyCtx notify[VIRTIO_QUEUE_MAX];
+    QemuMutex lock;
+} VhostUserVFIOState;
 
 typedef struct VhostUser {
     CharBackend chr;
+    VhostUserVFIOState vfio;
 } VhostUser;
 
 #endif
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [virtio-dev] [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
@ 2018-01-25  4:03   ` Tiwei Bie
  0 siblings, 0 replies; 39+ messages in thread
From: Tiwei Bie @ 2018-01-25  4:03 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

This patch does some small extensions to vhost-user protocol to
support VFIO based accelerators, and makes it possible to get the
similar performance of VFIO based PCI passthru while keeping the
virtio device emulation in QEMU.

Any virtio ring compatible devices potentially can be used as the
vhost data path accelerators. We can setup the accelerator based
on the informations (e.g. memory table, features, ring info, etc)
available on the vhost backend. And accelerator will be able to use
the virtio ring provided by the virtio driver in the VM directly.
So the virtio driver in the VM can exchange e.g. network packets
with the accelerator directly via the virtio ring.

But for vhost-user, the critical issue in this case is that the
data path performance is relatively low and some host threads are
needed for the data path, because some necessary mechanisms are
missing to support:

1) guest driver notifies the device directly;
2) device interrupts the guest directly;

So this patch does some small extensions to vhost-user protocol
to make both of them possible. It leverages the same mechanisms
as the VFIO based PCI passthru.

A new protocol feature bit is added to negotiate the accelerator
feature support. Two new slave message types are added to control
the notify region and queue interrupt passthru for each queue.
From the view of vhost-user protocol design, it's very flexible.
The passthru can be enabled/disabled for each queue individually,
and it's possible to accelerate each queue by different devices.

The key difference with PCI passthru is that, in this case only
the data path of the device (e.g. DMA ring, notify region and
queue interrupt) is pass-throughed to the VM, the device control
path (e.g. PCI configuration space and MMIO regions) is still
defined and emulated by QEMU.

The benefits of keeping virtio device emulation in QEMU compared
with virtio device PCI passthru include (but not limit to):

- consistent device interface for guest OS in the VM;
- max flexibility on the hardware (i.e. the accelerators) design;
- leveraging the existing virtio live-migration framework;

The virtual IOMMU isn't supported by the accelerators for now.
Because vhost-user currently lacks of an efficient way to share
the IOMMU table in VM to vhost backend. That's why the software
implementation of virtual IOMMU support in vhost-user backend
can't support dynamic mapping well. Once this problem is solved
in vhost-user, virtual IOMMU can be supported by accelerators
too, and the IOMMU feature bit checking in this patch can be
removed.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 docs/interop/vhost-user.txt    |  57 ++++++++++++
 hw/virtio/vhost-user.c         | 201 +++++++++++++++++++++++++++++++++++++++++
 include/hw/virtio/vhost-user.h |  17 ++++
 3 files changed, 275 insertions(+)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index 954771d0d8..15e917019a 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -116,6 +116,15 @@ Depending on the request type, payload can be:
     - 3: IOTLB invalidate
     - 4: IOTLB access fail
 
+ * Vring area description
+   -----------------------
+   | u64 | size | offset |
+   -----------------------
+
+   u64: a 64-bit unsigned integer
+   Size: a 64-bit size
+   Offset: a 64-bit offset
+
 In QEMU the vhost-user message is implemented with the following struct:
 
 typedef struct VhostUserMsg {
@@ -129,6 +138,7 @@ typedef struct VhostUserMsg {
         VhostUserMemory memory;
         VhostUserLog log;
         struct vhost_iotlb_msg iotlb;
+        VhostUserVringArea area;
     };
 } QEMU_PACKED VhostUserMsg;
 
@@ -317,6 +327,17 @@ The fd is provided via VHOST_USER_SET_SLAVE_REQ_FD ancillary data.
 A slave may then send VHOST_USER_SLAVE_* messages to the master
 using this fd communication channel.
 
+VFIO based accelerators
+-----------------------
+
+The VFIO based accelerators feature is a protocol extension. It is supported
+when the protocol feature VHOST_USER_PROTOCOL_F_VFIO (bit 7) is set.
+
+The vhost-user backend will set the accelerator context via slave channel,
+and QEMU just needs to handle those messages passively. The accelerator
+context will be set for each queue independently. So the page-per-vq property
+should also be enabled.
+
 Protocol features
 -----------------
 
@@ -327,6 +348,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_MTU            4
 #define VHOST_USER_PROTOCOL_F_SLAVE_REQ      5
 #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN   6
+#define VHOST_USER_PROTOCOL_F_VFIO           7
 
 Master message types
 --------------------
@@ -614,6 +636,41 @@ Slave message types
       This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
       has been successfully negotiated.
 
+ * VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG
+
+      Id: 2
+      Equivalent ioctl: N/A
+      Slave payload: u64
+      Master payload: N/A
+
+      Sets the VFIO group file descriptor which is passed as ancillary data
+      for a specified queue (queue index is carried in the u64 payload).
+      Slave sends this request to tell QEMU to add or delete a VFIO group.
+      QEMU will delete the current group if any for the specified queue when the
+      message is sent without a file descriptor. A VFIO group will be actually
+      deleted when its reference count reaches zero.
+      This request should be sent only when VHOST_USER_PROTOCOL_F_VFIO protocol
+      feature has been successfully negotiated.
+
+ * VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG
+
+      Id: 3
+      Equivalent ioctl: N/A
+      Slave payload: vring area description
+      Master payload: N/A
+
+      Sets the notify area for a specified queue (queue index is carried
+      in the u64 field of the vring area description). A file descriptor is
+      passed as ancillary data (typically it's a VFIO device fd). QEMU can
+      mmap the file descriptor based on the information carried in the vring
+      area description.
+      Slave sends this request to tell QEMU to add or delete a MemoryRegion
+      for a specified queue's notify MMIO region. QEMU will delete the current
+      MemoryRegion if any for the specified queue when the message is sent
+      without a file descriptor.
+      This request should be sent only when VHOST_USER_PROTOCOL_F_VFIO protocol
+      feature and VIRTIO_F_VERSION_1 feature have been successfully negotiated.
+
 VHOST_USER_PROTOCOL_F_REPLY_ACK:
 -------------------------------
 The original vhost-user specification only demands replies for certain
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 3e308d0a62..ec83746bd5 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -35,6 +35,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_NET_MTU = 4,
     VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5,
     VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6,
+    VHOST_USER_PROTOCOL_F_VFIO = 7,
 
     VHOST_USER_PROTOCOL_F_MAX
 };
@@ -72,6 +73,8 @@ typedef enum VhostUserRequest {
 typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_NONE = 0,
     VHOST_USER_SLAVE_IOTLB_MSG = 1,
+    VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG = 2,
+    VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG = 3,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -93,6 +96,12 @@ typedef struct VhostUserLog {
     uint64_t mmap_offset;
 } VhostUserLog;
 
+typedef struct VhostUserVringArea {
+    uint64_t u64;
+    uint64_t size;
+    uint64_t offset;
+} VhostUserVringArea;
+
 typedef struct VhostUserMsg {
     VhostUserRequest request;
 
@@ -110,6 +119,7 @@ typedef struct VhostUserMsg {
         VhostUserMemory memory;
         VhostUserLog log;
         struct vhost_iotlb_msg iotlb;
+        VhostUserVringArea area;
     } payload;
 } QEMU_PACKED VhostUserMsg;
 
@@ -415,9 +425,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
     return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
 }
 
+static void vhost_user_notify_region_remap(struct vhost_dev *dev, int queue_idx)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserVFIOState *vfio = &u->shared->vfio;
+    VhostUserNotifyCtx *notify = &vfio->notify[queue_idx];
+    VirtIODevice *vdev = dev->vdev;
+
+    if (notify->addr && !notify->mapped) {
+        virtio_device_notify_region_map(vdev, queue_idx, &notify->mr);
+        notify->mapped = true;
+    }
+}
+
+static void vhost_user_notify_region_unmap(struct vhost_dev *dev, int queue_idx)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserVFIOState *vfio = &u->shared->vfio;
+    VhostUserNotifyCtx *notify = &vfio->notify[queue_idx];
+    VirtIODevice *vdev = dev->vdev;
+
+    if (notify->addr && notify->mapped) {
+        virtio_device_notify_region_unmap(vdev, &notify->mr);
+        notify->mapped = false;
+    }
+}
+
 static int vhost_user_set_vring_base(struct vhost_dev *dev,
                                      struct vhost_vring_state *ring)
 {
+    vhost_user_notify_region_remap(dev, ring->index);
+
     return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
 }
 
@@ -451,6 +489,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
         .size = sizeof(msg.payload.state),
     };
 
+    vhost_user_notify_region_unmap(dev, ring->index);
+
     if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
         return -1;
     }
@@ -609,6 +649,136 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
     return 0;
 }
 
+static int vhost_user_handle_vring_vfio_group(struct vhost_dev *dev,
+                                              uint64_t u64,
+                                              int groupfd)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserVFIOState *vfio = &u->shared->vfio;
+    int queue_idx = u64 & VHOST_USER_VRING_IDX_MASK;
+    VirtIODevice *vdev = dev->vdev;
+    VFIOGroup *group;
+    int ret = 0;
+
+    qemu_mutex_lock(&vfio->lock);
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_VFIO) ||
+        vdev == NULL ||
+        virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM) ||
+        queue_idx >= virtio_get_num_queues(vdev)) {
+        ret = -1;
+        goto out;
+    }
+
+    if (vfio->group[queue_idx]) {
+        vfio_put_group(vfio->group[queue_idx]);
+        vfio->group[queue_idx] = NULL;
+    }
+
+    if (u64 & VHOST_USER_VRING_NOFD_MASK) {
+        goto out;
+    }
+
+    group = vfio_get_group_from_fd(groupfd, NULL, NULL);
+    if (group == NULL) {
+        ret = -1;
+        goto out;
+    }
+
+    if (group->fd != groupfd) {
+        close(groupfd);
+    }
+
+    vfio->group[queue_idx] = group;
+
+out:
+    kvm_irqchip_commit_routes(kvm_state);
+    qemu_mutex_unlock(&vfio->lock);
+
+    if (ret != 0 && groupfd != -1) {
+        close(groupfd);
+    }
+
+    return ret;
+}
+
+#define NOTIFY_PAGE_SIZE 0x1000
+
+static int vhost_user_handle_vring_notify_area(struct vhost_dev *dev,
+                                               VhostUserVringArea *area,
+                                               int fd)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserVFIOState *vfio = &u->shared->vfio;
+    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
+    VirtIODevice *vdev = dev->vdev;
+    VhostUserNotifyCtx *notify;
+    void *addr = NULL;
+    int ret = 0;
+    char *name;
+
+    qemu_mutex_lock(&vfio->lock);
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_VFIO) ||
+        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev) ||
+        virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM) ||
+        !virtio_device_page_per_vq_enabled(vdev)) {
+        ret = -1;
+        goto out;
+    }
+
+    notify = &vfio->notify[queue_idx];
+
+    if (notify->addr) {
+        virtio_device_notify_region_unmap(vdev, &notify->mr);
+        munmap(notify->addr, NOTIFY_PAGE_SIZE);
+        object_unparent(OBJECT(&notify->mr));
+        notify->addr = NULL;
+    }
+
+    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
+        goto out;
+    }
+
+    if (area->size < NOTIFY_PAGE_SIZE) {
+        ret = -1;
+        goto out;
+    }
+
+    addr = mmap(NULL, NOTIFY_PAGE_SIZE, PROT_READ | PROT_WRITE,
+                MAP_SHARED, fd, area->offset);
+    if (addr == MAP_FAILED) {
+        error_report("Can't map notify region.");
+        ret = -1;
+        goto out;
+    }
+
+    name = g_strdup_printf("vhost-user/vfio@%p mmaps[%d]", vfio, queue_idx);
+    memory_region_init_ram_device_ptr(&notify->mr, OBJECT(vdev), name,
+                                      NOTIFY_PAGE_SIZE, addr);
+    g_free(name);
+
+    if (virtio_device_notify_region_map(vdev, queue_idx, &notify->mr)) {
+        ret = -1;
+        goto out;
+    }
+
+    notify->addr = addr;
+    notify->mapped = true;
+
+out:
+    if (ret < 0 && addr != NULL) {
+        munmap(addr, NOTIFY_PAGE_SIZE);
+    }
+    if (fd != -1) {
+        close(fd);
+    }
+    qemu_mutex_unlock(&vfio->lock);
+    return ret;
+}
+
 static void slave_read(void *opaque)
 {
     struct vhost_dev *dev = opaque;
@@ -670,6 +840,12 @@ static void slave_read(void *opaque)
     case VHOST_USER_SLAVE_IOTLB_MSG:
         ret = vhost_backend_handle_iotlb_msg(dev, &msg.payload.iotlb);
         break;
+    case VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG:
+        ret = vhost_user_handle_vring_vfio_group(dev, msg.payload.u64, fd);
+        break;
+    case VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG:
+        ret = vhost_user_handle_vring_notify_area(dev, &msg.payload.area, fd);
+        break;
     default:
         error_report("Received unexpected msg type.");
         if (fd != -1) {
@@ -772,6 +948,10 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
     u->slave_fd = -1;
     dev->opaque = u;
 
+    if (dev->vq_index == 0) {
+        qemu_mutex_init(&u->shared->vfio.lock);
+    }
+
     err = vhost_user_get_features(dev, &features);
     if (err < 0) {
         return err;
@@ -832,6 +1012,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 static int vhost_user_cleanup(struct vhost_dev *dev)
 {
     struct vhost_user *u;
+    int i;
 
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
@@ -841,6 +1022,26 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
         close(u->slave_fd);
         u->slave_fd = -1;
     }
+
+    if (dev->vq_index == 0) {
+        VhostUserVFIOState *vfio = &u->shared->vfio;
+
+        for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
+            if (vfio->notify[i].addr) {
+                munmap(vfio->notify[i].addr, NOTIFY_PAGE_SIZE);
+                object_unparent(OBJECT(&vfio->notify[i].mr));
+                vfio->notify[i].addr = NULL;
+            }
+
+            if (vfio->group[i]) {
+                vfio_put_group(vfio->group[i]);
+                vfio->group[i] = NULL;
+            }
+        }
+
+        qemu_mutex_destroy(&u->shared->vfio.lock);
+    }
+
     g_free(u);
     dev->opaque = 0;
 
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index 4f5a1477d1..de8c647962 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -9,9 +9,26 @@
 #define HW_VIRTIO_VHOST_USER_H
 
 #include "chardev/char-fe.h"
+#include "hw/virtio/virtio.h"
+#include "hw/vfio/vfio-common.h"
+
+typedef struct VhostUserNotifyCtx {
+    void *addr;
+    MemoryRegion mr;
+    bool mapped;
+} VhostUserNotifyCtx;
+
+typedef struct VhostUserVFIOState {
+    /* The VFIO group associated with each queue */
+    VFIOGroup *group[VIRTIO_QUEUE_MAX];
+    /* The notify context of each queue */
+    VhostUserNotifyCtx notify[VIRTIO_QUEUE_MAX];
+    QemuMutex lock;
+} VhostUserVFIOState;
 
 typedef struct VhostUser {
     CharBackend chr;
+    VhostUserVFIOState vfio;
 } VhostUser;
 
 #endif
-- 
2.13.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v1 0/6] Extend vhost-user to support VFIO based accelerators
  2018-01-25  4:03 ` [virtio-dev] " Tiwei Bie
                   ` (6 preceding siblings ...)
  (?)
@ 2018-01-25 14:22 ` Stefan Hajnoczi
  2018-01-25 16:10     ` [virtio-dev] " Liang, Cunming
  2018-01-26  6:28   ` [virtio-dev] " Tiwei Bie
  -1 siblings, 2 replies; 39+ messages in thread
From: Stefan Hajnoczi @ 2018-01-25 14:22 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha, jianfeng.tan, cunming.liang, xiao.w.wang, zhihong.wang,
	dan.daly

[-- Attachment #1: Type: text/plain, Size: 1415 bytes --]

On Thu, Jan 25, 2018 at 12:03:22PM +0800, Tiwei Bie wrote:
> Why extend vhost-user for vDPA
> ==============================
> 
> We have already implemented various virtual switches (e.g. OVS-DPDK)
> based on vhost-user for VMs in the Cloud. They are purely software
> running on CPU cores. When we have accelerators for such NFVi applications,
> it's ideal if the applications could keep using the original interface
> (i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
> when and how to switch between CPU and accelerators within the interface.
> And the switching (i.e. switch between CPU and accelerators) can be done
> flexibly and quickly inside the applications.
> 
> More details about this can be found from the Cunming's discussions on
> the RFC patch set.
> 
> The previous links:
> RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html

Is vDPA also useful in the simpler use case where there is no NFVi
application?

In other words, you have virtio-net inside the guest and vhost-net on
the host.  You'd like to accelerate the virtio-net NIC using the
hardware's vDPA support.

DPDK requires dedicated logical cores and hugepages.  Those resources
will not be used (wasted) if you just want to enable vDPA but have no
DPDK packet processing application.

How can this use case be supported without wasting resources?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v1 0/6] Extend vhost-user to support VFIO based accelerators
  2018-01-25 14:22 ` [Qemu-devel] [PATCH v1 0/6] Extend vhost-user to support VFIO based accelerators Stefan Hajnoczi
@ 2018-01-25 16:10     ` Liang, Cunming
  2018-01-26  6:28   ` [virtio-dev] " Tiwei Bie
  1 sibling, 0 replies; 39+ messages in thread
From: Liang, Cunming @ 2018-01-25 16:10 UTC (permalink / raw)
  To: Stefan Hajnoczi, Bie, Tiwei
  Cc: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha, Tan, Jianfeng, Wang, Xiao W, Wang, Zhihong, Daly, Dan



> -----Original Message-----
> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
> Sent: Thursday, January 25, 2018 10:22 PM
> To: Bie, Tiwei <tiwei.bie@intel.com>
> Cc: qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; 
> mst@redhat.com; alex.williamson@redhat.com; jasowang@redhat.com; 
> pbonzini@redhat.com; stefanha@redhat.com; Tan, Jianfeng 
> <jianfeng.tan@intel.com>; Liang, Cunming <cunming.liang@intel.com>; 
> Wang, Xiao W <xiao.w.wang@intel.com>; Wang, Zhihong 
> <zhihong.wang@intel.com>; Daly, Dan <dan.daly@intel.com>
> Subject: Re: [Qemu-devel] [PATCH v1 0/6] Extend vhost-user to support 
> VFIO based accelerators
> 
> On Thu, Jan 25, 2018 at 12:03:22PM +0800, Tiwei Bie wrote:
> > Why extend vhost-user for vDPA
> > ==============================
> >
> > We have already implemented various virtual switches (e.g. OVS-DPDK) 
> > based on vhost-user for VMs in the Cloud. They are purely software 
> > running on CPU cores. When we have accelerators for such NFVi 
> > applications, it's ideal if the applications could keep using the 
> > original interface (i.e. vhost-user netdev) with QEMU, and 
> > infrastructure is able to decide when and how to switch between CPU 
> > and
> accelerators within the interface.
> > And the switching (i.e. switch between CPU and accelerators) can be 
> > done flexibly and quickly inside the applications.
> >
> > More details about this can be found from the Cunming's discussions 
> > on the RFC patch set.
> >
> > The previous links:
> > RFC:
> > http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.htm
> > l
> 
> Is vDPA also useful in the simpler use case where there is no NFVi application?
There're separate patches to have non-DPDK vDPA(aka. vhost-vfio, a new vhost backend proposed) support, which is the scope of stage-II as we mentioned in community call.
Stay tuned for the RFC patch. Main idea is that the virtio compatible device driver register as a mdev to talk with qemu vhost-vfio. 

> 
> In other words, you have virtio-net inside the guest and vhost-net on the host.
> You'd like to accelerate the virtio-net NIC using the hardware's vDPA support.
> 
> DPDK requires dedicated logical cores and hugepages.  Those resources 
> will not be used (wasted) if you just want to enable vDPA but have no 
> DPDK packet processing application.
> 
> How can this use case be supported without wasting resources?
> 
> Stefan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [virtio-dev] RE: [Qemu-devel] [PATCH v1 0/6] Extend vhost-user to support VFIO based accelerators
@ 2018-01-25 16:10     ` Liang, Cunming
  0 siblings, 0 replies; 39+ messages in thread
From: Liang, Cunming @ 2018-01-25 16:10 UTC (permalink / raw)
  To: Stefan Hajnoczi, Bie, Tiwei
  Cc: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha, Tan, Jianfeng, Wang, Xiao W, Wang, Zhihong, Daly, Dan



> -----Original Message-----
> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
> Sent: Thursday, January 25, 2018 10:22 PM
> To: Bie, Tiwei <tiwei.bie@intel.com>
> Cc: qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; 
> mst@redhat.com; alex.williamson@redhat.com; jasowang@redhat.com; 
> pbonzini@redhat.com; stefanha@redhat.com; Tan, Jianfeng 
> <jianfeng.tan@intel.com>; Liang, Cunming <cunming.liang@intel.com>; 
> Wang, Xiao W <xiao.w.wang@intel.com>; Wang, Zhihong 
> <zhihong.wang@intel.com>; Daly, Dan <dan.daly@intel.com>
> Subject: Re: [Qemu-devel] [PATCH v1 0/6] Extend vhost-user to support 
> VFIO based accelerators
> 
> On Thu, Jan 25, 2018 at 12:03:22PM +0800, Tiwei Bie wrote:
> > Why extend vhost-user for vDPA
> > ==============================
> >
> > We have already implemented various virtual switches (e.g. OVS-DPDK) 
> > based on vhost-user for VMs in the Cloud. They are purely software 
> > running on CPU cores. When we have accelerators for such NFVi 
> > applications, it's ideal if the applications could keep using the 
> > original interface (i.e. vhost-user netdev) with QEMU, and 
> > infrastructure is able to decide when and how to switch between CPU 
> > and
> accelerators within the interface.
> > And the switching (i.e. switch between CPU and accelerators) can be 
> > done flexibly and quickly inside the applications.
> >
> > More details about this can be found from the Cunming's discussions 
> > on the RFC patch set.
> >
> > The previous links:
> > RFC:
> > http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.htm
> > l
> 
> Is vDPA also useful in the simpler use case where there is no NFVi application?
There're separate patches to have non-DPDK vDPA(aka. vhost-vfio, a new vhost backend proposed) support, which is the scope of stage-II as we mentioned in community call.
Stay tuned for the RFC patch. Main idea is that the virtio compatible device driver register as a mdev to talk with qemu vhost-vfio. 

> 
> In other words, you have virtio-net inside the guest and vhost-net on the host.
> You'd like to accelerate the virtio-net NIC using the hardware's vDPA support.
> 
> DPDK requires dedicated logical cores and hugepages.  Those resources 
> will not be used (wasted) if you just want to enable vDPA but have no 
> DPDK packet processing application.
> 
> How can this use case be supported without wasting resources?
> 
> Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
  2018-01-25  4:03   ` [virtio-dev] " Tiwei Bie
@ 2018-01-25 23:59     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2018-01-25 23:59 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Jan 25, 2018 at 12:03:28PM +0800, Tiwei Bie wrote:
> This patch does some small extensions to vhost-user protocol to
> support VFIO based accelerators, and makes it possible to get the
> similar performance of VFIO based PCI passthru while keeping the
> virtio device emulation in QEMU.
> 
> Any virtio ring compatible devices potentially can be used as the
> vhost data path accelerators. We can setup the accelerator based
> on the informations (e.g. memory table, features, ring info, etc)
> available on the vhost backend. And accelerator will be able to use
> the virtio ring provided by the virtio driver in the VM directly.
> So the virtio driver in the VM can exchange e.g. network packets
> with the accelerator directly via the virtio ring.
> 
> But for vhost-user, the critical issue in this case is that the
> data path performance is relatively low and some host threads are
> needed for the data path, because some necessary mechanisms are
> missing to support:
> 
> 1) guest driver notifies the device directly;
> 2) device interrupts the guest directly;
> 
> So this patch does some small extensions to vhost-user protocol
> to make both of them possible. It leverages the same mechanisms
> as the VFIO based PCI passthru.
> 
> A new protocol feature bit is added to negotiate the accelerator
> feature support. Two new slave message types are added to control
> the notify region and queue interrupt passthru for each queue.
> >From the view of vhost-user protocol design, it's very flexible.
> The passthru can be enabled/disabled for each queue individually,
> and it's possible to accelerate each queue by different devices.
> 
> The key difference with PCI passthru is that, in this case only
> the data path of the device (e.g. DMA ring, notify region and
> queue interrupt) is pass-throughed to the VM, the device control
> path (e.g. PCI configuration space and MMIO regions) is still
> defined and emulated by QEMU.
> 
> The benefits of keeping virtio device emulation in QEMU compared
> with virtio device PCI passthru include (but not limit to):
> 
> - consistent device interface for guest OS in the VM;
> - max flexibility on the hardware (i.e. the accelerators) design;
> - leveraging the existing virtio live-migration framework;
> 
> The virtual IOMMU isn't supported by the accelerators for now.
> Because vhost-user currently lacks of an efficient way to share
> the IOMMU table in VM to vhost backend. That's why the software
> implementation of virtual IOMMU support in vhost-user backend
> can't support dynamic mapping well.

What exactly is meant by that? vIOMMU seems to work for people,
it's not that fast if you change mappings all the time,
but e.g. dpdk within guest doesn't.

> Once this problem is solved
> in vhost-user, virtual IOMMU can be supported by accelerators
> too, and the IOMMU feature bit checking in this patch can be
> removed.

Given it works with software backends right now, I suspect
this will be up to you guys to address.

> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>  docs/interop/vhost-user.txt    |  57 ++++++++++++
>  hw/virtio/vhost-user.c         | 201 +++++++++++++++++++++++++++++++++++++++++
>  include/hw/virtio/vhost-user.h |  17 ++++
>  3 files changed, 275 insertions(+)
> 
> diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> index 954771d0d8..15e917019a 100644
> --- a/docs/interop/vhost-user.txt
> +++ b/docs/interop/vhost-user.txt
> @@ -116,6 +116,15 @@ Depending on the request type, payload can be:
>      - 3: IOTLB invalidate
>      - 4: IOTLB access fail
>  
> + * Vring area description
> +   -----------------------
> +   | u64 | size | offset |
> +   -----------------------
> +
> +   u64: a 64-bit unsigned integer
> +   Size: a 64-bit size
> +   Offset: a 64-bit offset
> +
>  In QEMU the vhost-user message is implemented with the following struct:
>  
>  typedef struct VhostUserMsg {
> @@ -129,6 +138,7 @@ typedef struct VhostUserMsg {
>          VhostUserMemory memory;
>          VhostUserLog log;
>          struct vhost_iotlb_msg iotlb;
> +        VhostUserVringArea area;
>      };
>  } QEMU_PACKED VhostUserMsg;
>  
> @@ -317,6 +327,17 @@ The fd is provided via VHOST_USER_SET_SLAVE_REQ_FD ancillary data.
>  A slave may then send VHOST_USER_SLAVE_* messages to the master
>  using this fd communication channel.
>  
> +VFIO based accelerators
> +-----------------------
> +
> +The VFIO based accelerators feature is a protocol extension. It is supported
> +when the protocol feature VHOST_USER_PROTOCOL_F_VFIO (bit 7) is set.
> +
> +The vhost-user backend will set the accelerator context via slave channel,
> +and QEMU just needs to handle those messages passively. The accelerator
> +context will be set for each queue independently. So the page-per-vq property
> +should also be enabled.
> +
>  Protocol features
>  -----------------
>  
> @@ -327,6 +348,7 @@ Protocol features
>  #define VHOST_USER_PROTOCOL_F_MTU            4
>  #define VHOST_USER_PROTOCOL_F_SLAVE_REQ      5
>  #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN   6
> +#define VHOST_USER_PROTOCOL_F_VFIO           7
>  
>  Master message types
>  --------------------
> @@ -614,6 +636,41 @@ Slave message types
>        This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
>        has been successfully negotiated.
>  
> + * VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG
> +
> +      Id: 2
> +      Equivalent ioctl: N/A
> +      Slave payload: u64
> +      Master payload: N/A
> +
> +      Sets the VFIO group file descriptor which is passed as ancillary data
> +      for a specified queue (queue index is carried in the u64 payload).
> +      Slave sends this request to tell QEMU to add or delete a VFIO group.
> +      QEMU will delete the current group if any for the specified queue when the
> +      message is sent without a file descriptor. A VFIO group will be actually
> +      deleted when its reference count reaches zero.
> +      This request should be sent only when VHOST_USER_PROTOCOL_F_VFIO protocol
> +      feature has been successfully negotiated.
> +
> + * VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG
> +
> +      Id: 3
> +      Equivalent ioctl: N/A
> +      Slave payload: vring area description
> +      Master payload: N/A
> +
> +      Sets the notify area for a specified queue (queue index is carried
> +      in the u64 field of the vring area description). A file descriptor is
> +      passed as ancillary data (typically it's a VFIO device fd). QEMU can
> +      mmap the file descriptor based on the information carried in the vring
> +      area description.
> +      Slave sends this request to tell QEMU to add or delete a MemoryRegion
> +      for a specified queue's notify MMIO region. QEMU will delete the current
> +      MemoryRegion if any for the specified queue when the message is sent
> +      without a file descriptor.
> +      This request should be sent only when VHOST_USER_PROTOCOL_F_VFIO protocol
> +      feature and VIRTIO_F_VERSION_1 feature have been successfully negotiated.
> +
>  VHOST_USER_PROTOCOL_F_REPLY_ACK:
>  -------------------------------
>  The original vhost-user specification only demands replies for certain
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 3e308d0a62..ec83746bd5 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -35,6 +35,7 @@ enum VhostUserProtocolFeature {
>      VHOST_USER_PROTOCOL_F_NET_MTU = 4,
>      VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5,
>      VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6,
> +    VHOST_USER_PROTOCOL_F_VFIO = 7,
>  
>      VHOST_USER_PROTOCOL_F_MAX
>  };
> @@ -72,6 +73,8 @@ typedef enum VhostUserRequest {
>  typedef enum VhostUserSlaveRequest {
>      VHOST_USER_SLAVE_NONE = 0,
>      VHOST_USER_SLAVE_IOTLB_MSG = 1,
> +    VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG = 2,
> +    VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG = 3,
>      VHOST_USER_SLAVE_MAX
>  }  VhostUserSlaveRequest;
>  
> @@ -93,6 +96,12 @@ typedef struct VhostUserLog {
>      uint64_t mmap_offset;
>  } VhostUserLog;
>  
> +typedef struct VhostUserVringArea {
> +    uint64_t u64;
> +    uint64_t size;
> +    uint64_t offset;
> +} VhostUserVringArea;
> +
>  typedef struct VhostUserMsg {
>      VhostUserRequest request;
>  
> @@ -110,6 +119,7 @@ typedef struct VhostUserMsg {
>          VhostUserMemory memory;
>          VhostUserLog log;
>          struct vhost_iotlb_msg iotlb;
> +        VhostUserVringArea area;
>      } payload;
>  } QEMU_PACKED VhostUserMsg;
>  
> @@ -415,9 +425,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
>      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
>  }
>  
> +static void vhost_user_notify_region_remap(struct vhost_dev *dev, int queue_idx)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserVFIOState *vfio = &u->shared->vfio;
> +    VhostUserNotifyCtx *notify = &vfio->notify[queue_idx];
> +    VirtIODevice *vdev = dev->vdev;
> +
> +    if (notify->addr && !notify->mapped) {
> +        virtio_device_notify_region_map(vdev, queue_idx, &notify->mr);
> +        notify->mapped = true;
> +    }
> +}
> +
> +static void vhost_user_notify_region_unmap(struct vhost_dev *dev, int queue_idx)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserVFIOState *vfio = &u->shared->vfio;
> +    VhostUserNotifyCtx *notify = &vfio->notify[queue_idx];
> +    VirtIODevice *vdev = dev->vdev;
> +
> +    if (notify->addr && notify->mapped) {
> +        virtio_device_notify_region_unmap(vdev, &notify->mr);
> +        notify->mapped = false;
> +    }
> +}
> +
>  static int vhost_user_set_vring_base(struct vhost_dev *dev,
>                                       struct vhost_vring_state *ring)
>  {
> +    vhost_user_notify_region_remap(dev, ring->index);
> +
>      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
>  }
>  
> @@ -451,6 +489,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
>          .size = sizeof(msg.payload.state),
>      };
>  
> +    vhost_user_notify_region_unmap(dev, ring->index);
> +
>      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
>          return -1;
>      }
> @@ -609,6 +649,136 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
>      return 0;
>  }
>  
> +static int vhost_user_handle_vring_vfio_group(struct vhost_dev *dev,
> +                                              uint64_t u64,
> +                                              int groupfd)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserVFIOState *vfio = &u->shared->vfio;
> +    int queue_idx = u64 & VHOST_USER_VRING_IDX_MASK;
> +    VirtIODevice *vdev = dev->vdev;
> +    VFIOGroup *group;
> +    int ret = 0;
> +
> +    qemu_mutex_lock(&vfio->lock);
> +
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_VFIO) ||
> +        vdev == NULL ||
> +        virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM) ||
> +        queue_idx >= virtio_get_num_queues(vdev)) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    if (vfio->group[queue_idx]) {
> +        vfio_put_group(vfio->group[queue_idx]);
> +        vfio->group[queue_idx] = NULL;
> +    }
> +
> +    if (u64 & VHOST_USER_VRING_NOFD_MASK) {
> +        goto out;
> +    }
> +
> +    group = vfio_get_group_from_fd(groupfd, NULL, NULL);
> +    if (group == NULL) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    if (group->fd != groupfd) {
> +        close(groupfd);
> +    }
> +
> +    vfio->group[queue_idx] = group;
> +
> +out:
> +    kvm_irqchip_commit_routes(kvm_state);
> +    qemu_mutex_unlock(&vfio->lock);
> +
> +    if (ret != 0 && groupfd != -1) {
> +        close(groupfd);
> +    }
> +
> +    return ret;
> +}
> +
> +#define NOTIFY_PAGE_SIZE 0x1000
> +
> +static int vhost_user_handle_vring_notify_area(struct vhost_dev *dev,
> +                                               VhostUserVringArea *area,
> +                                               int fd)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserVFIOState *vfio = &u->shared->vfio;
> +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> +    VirtIODevice *vdev = dev->vdev;
> +    VhostUserNotifyCtx *notify;
> +    void *addr = NULL;
> +    int ret = 0;
> +    char *name;
> +
> +    qemu_mutex_lock(&vfio->lock);
> +
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_VFIO) ||
> +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev) ||
> +        virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM) ||
> +        !virtio_device_page_per_vq_enabled(vdev)) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    notify = &vfio->notify[queue_idx];
> +
> +    if (notify->addr) {
> +        virtio_device_notify_region_unmap(vdev, &notify->mr);
> +        munmap(notify->addr, NOTIFY_PAGE_SIZE);
> +        object_unparent(OBJECT(&notify->mr));
> +        notify->addr = NULL;
> +    }
> +
> +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> +        goto out;
> +    }
> +
> +    if (area->size < NOTIFY_PAGE_SIZE) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    addr = mmap(NULL, NOTIFY_PAGE_SIZE, PROT_READ | PROT_WRITE,
> +                MAP_SHARED, fd, area->offset);
> +    if (addr == MAP_FAILED) {
> +        error_report("Can't map notify region.");
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    name = g_strdup_printf("vhost-user/vfio@%p mmaps[%d]", vfio, queue_idx);
> +    memory_region_init_ram_device_ptr(&notify->mr, OBJECT(vdev), name,
> +                                      NOTIFY_PAGE_SIZE, addr);
> +    g_free(name);
> +
> +    if (virtio_device_notify_region_map(vdev, queue_idx, &notify->mr)) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    notify->addr = addr;
> +    notify->mapped = true;
> +
> +out:
> +    if (ret < 0 && addr != NULL) {
> +        munmap(addr, NOTIFY_PAGE_SIZE);
> +    }
> +    if (fd != -1) {
> +        close(fd);
> +    }
> +    qemu_mutex_unlock(&vfio->lock);
> +    return ret;
> +}
> +
>  static void slave_read(void *opaque)
>  {
>      struct vhost_dev *dev = opaque;
> @@ -670,6 +840,12 @@ static void slave_read(void *opaque)
>      case VHOST_USER_SLAVE_IOTLB_MSG:
>          ret = vhost_backend_handle_iotlb_msg(dev, &msg.payload.iotlb);
>          break;
> +    case VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG:
> +        ret = vhost_user_handle_vring_vfio_group(dev, msg.payload.u64, fd);
> +        break;
> +    case VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG:
> +        ret = vhost_user_handle_vring_notify_area(dev, &msg.payload.area, fd);
> +        break;
>      default:
>          error_report("Received unexpected msg type.");
>          if (fd != -1) {
> @@ -772,6 +948,10 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
>      u->slave_fd = -1;
>      dev->opaque = u;
>  
> +    if (dev->vq_index == 0) {
> +        qemu_mutex_init(&u->shared->vfio.lock);
> +    }
> +
>      err = vhost_user_get_features(dev, &features);
>      if (err < 0) {
>          return err;
> @@ -832,6 +1012,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
>  static int vhost_user_cleanup(struct vhost_dev *dev)
>  {
>      struct vhost_user *u;
> +    int i;
>  
>      assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
>  
> @@ -841,6 +1022,26 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
>          close(u->slave_fd);
>          u->slave_fd = -1;
>      }
> +
> +    if (dev->vq_index == 0) {
> +        VhostUserVFIOState *vfio = &u->shared->vfio;
> +
> +        for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> +            if (vfio->notify[i].addr) {
> +                munmap(vfio->notify[i].addr, NOTIFY_PAGE_SIZE);
> +                object_unparent(OBJECT(&vfio->notify[i].mr));
> +                vfio->notify[i].addr = NULL;
> +            }
> +
> +            if (vfio->group[i]) {
> +                vfio_put_group(vfio->group[i]);
> +                vfio->group[i] = NULL;
> +            }
> +        }
> +
> +        qemu_mutex_destroy(&u->shared->vfio.lock);
> +    }
> +
>      g_free(u);
>      dev->opaque = 0;
>  
> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> index 4f5a1477d1..de8c647962 100644
> --- a/include/hw/virtio/vhost-user.h
> +++ b/include/hw/virtio/vhost-user.h
> @@ -9,9 +9,26 @@
>  #define HW_VIRTIO_VHOST_USER_H
>  
>  #include "chardev/char-fe.h"
> +#include "hw/virtio/virtio.h"
> +#include "hw/vfio/vfio-common.h"
> +
> +typedef struct VhostUserNotifyCtx {
> +    void *addr;
> +    MemoryRegion mr;
> +    bool mapped;
> +} VhostUserNotifyCtx;
> +
> +typedef struct VhostUserVFIOState {
> +    /* The VFIO group associated with each queue */
> +    VFIOGroup *group[VIRTIO_QUEUE_MAX];
> +    /* The notify context of each queue */
> +    VhostUserNotifyCtx notify[VIRTIO_QUEUE_MAX];
> +    QemuMutex lock;
> +} VhostUserVFIOState;
>  
>  typedef struct VhostUser {
>      CharBackend chr;
> +    VhostUserVFIOState vfio;
>  } VhostUser;
>  
>  #endif
> -- 
> 2.13.3

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
@ 2018-01-25 23:59     ` Michael S. Tsirkin
  0 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2018-01-25 23:59 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Jan 25, 2018 at 12:03:28PM +0800, Tiwei Bie wrote:
> This patch does some small extensions to vhost-user protocol to
> support VFIO based accelerators, and makes it possible to get the
> similar performance of VFIO based PCI passthru while keeping the
> virtio device emulation in QEMU.
> 
> Any virtio ring compatible devices potentially can be used as the
> vhost data path accelerators. We can setup the accelerator based
> on the informations (e.g. memory table, features, ring info, etc)
> available on the vhost backend. And accelerator will be able to use
> the virtio ring provided by the virtio driver in the VM directly.
> So the virtio driver in the VM can exchange e.g. network packets
> with the accelerator directly via the virtio ring.
> 
> But for vhost-user, the critical issue in this case is that the
> data path performance is relatively low and some host threads are
> needed for the data path, because some necessary mechanisms are
> missing to support:
> 
> 1) guest driver notifies the device directly;
> 2) device interrupts the guest directly;
> 
> So this patch does some small extensions to vhost-user protocol
> to make both of them possible. It leverages the same mechanisms
> as the VFIO based PCI passthru.
> 
> A new protocol feature bit is added to negotiate the accelerator
> feature support. Two new slave message types are added to control
> the notify region and queue interrupt passthru for each queue.
> >From the view of vhost-user protocol design, it's very flexible.
> The passthru can be enabled/disabled for each queue individually,
> and it's possible to accelerate each queue by different devices.
> 
> The key difference with PCI passthru is that, in this case only
> the data path of the device (e.g. DMA ring, notify region and
> queue interrupt) is pass-throughed to the VM, the device control
> path (e.g. PCI configuration space and MMIO regions) is still
> defined and emulated by QEMU.
> 
> The benefits of keeping virtio device emulation in QEMU compared
> with virtio device PCI passthru include (but not limit to):
> 
> - consistent device interface for guest OS in the VM;
> - max flexibility on the hardware (i.e. the accelerators) design;
> - leveraging the existing virtio live-migration framework;
> 
> The virtual IOMMU isn't supported by the accelerators for now.
> Because vhost-user currently lacks of an efficient way to share
> the IOMMU table in VM to vhost backend. That's why the software
> implementation of virtual IOMMU support in vhost-user backend
> can't support dynamic mapping well.

What exactly is meant by that? vIOMMU seems to work for people,
it's not that fast if you change mappings all the time,
but e.g. dpdk within guest doesn't.

> Once this problem is solved
> in vhost-user, virtual IOMMU can be supported by accelerators
> too, and the IOMMU feature bit checking in this patch can be
> removed.

Given it works with software backends right now, I suspect
this will be up to you guys to address.

> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>  docs/interop/vhost-user.txt    |  57 ++++++++++++
>  hw/virtio/vhost-user.c         | 201 +++++++++++++++++++++++++++++++++++++++++
>  include/hw/virtio/vhost-user.h |  17 ++++
>  3 files changed, 275 insertions(+)
> 
> diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> index 954771d0d8..15e917019a 100644
> --- a/docs/interop/vhost-user.txt
> +++ b/docs/interop/vhost-user.txt
> @@ -116,6 +116,15 @@ Depending on the request type, payload can be:
>      - 3: IOTLB invalidate
>      - 4: IOTLB access fail
>  
> + * Vring area description
> +   -----------------------
> +   | u64 | size | offset |
> +   -----------------------
> +
> +   u64: a 64-bit unsigned integer
> +   Size: a 64-bit size
> +   Offset: a 64-bit offset
> +
>  In QEMU the vhost-user message is implemented with the following struct:
>  
>  typedef struct VhostUserMsg {
> @@ -129,6 +138,7 @@ typedef struct VhostUserMsg {
>          VhostUserMemory memory;
>          VhostUserLog log;
>          struct vhost_iotlb_msg iotlb;
> +        VhostUserVringArea area;
>      };
>  } QEMU_PACKED VhostUserMsg;
>  
> @@ -317,6 +327,17 @@ The fd is provided via VHOST_USER_SET_SLAVE_REQ_FD ancillary data.
>  A slave may then send VHOST_USER_SLAVE_* messages to the master
>  using this fd communication channel.
>  
> +VFIO based accelerators
> +-----------------------
> +
> +The VFIO based accelerators feature is a protocol extension. It is supported
> +when the protocol feature VHOST_USER_PROTOCOL_F_VFIO (bit 7) is set.
> +
> +The vhost-user backend will set the accelerator context via slave channel,
> +and QEMU just needs to handle those messages passively. The accelerator
> +context will be set for each queue independently. So the page-per-vq property
> +should also be enabled.
> +
>  Protocol features
>  -----------------
>  
> @@ -327,6 +348,7 @@ Protocol features
>  #define VHOST_USER_PROTOCOL_F_MTU            4
>  #define VHOST_USER_PROTOCOL_F_SLAVE_REQ      5
>  #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN   6
> +#define VHOST_USER_PROTOCOL_F_VFIO           7
>  
>  Master message types
>  --------------------
> @@ -614,6 +636,41 @@ Slave message types
>        This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
>        has been successfully negotiated.
>  
> + * VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG
> +
> +      Id: 2
> +      Equivalent ioctl: N/A
> +      Slave payload: u64
> +      Master payload: N/A
> +
> +      Sets the VFIO group file descriptor which is passed as ancillary data
> +      for a specified queue (queue index is carried in the u64 payload).
> +      Slave sends this request to tell QEMU to add or delete a VFIO group.
> +      QEMU will delete the current group if any for the specified queue when the
> +      message is sent without a file descriptor. A VFIO group will be actually
> +      deleted when its reference count reaches zero.
> +      This request should be sent only when VHOST_USER_PROTOCOL_F_VFIO protocol
> +      feature has been successfully negotiated.
> +
> + * VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG
> +
> +      Id: 3
> +      Equivalent ioctl: N/A
> +      Slave payload: vring area description
> +      Master payload: N/A
> +
> +      Sets the notify area for a specified queue (queue index is carried
> +      in the u64 field of the vring area description). A file descriptor is
> +      passed as ancillary data (typically it's a VFIO device fd). QEMU can
> +      mmap the file descriptor based on the information carried in the vring
> +      area description.
> +      Slave sends this request to tell QEMU to add or delete a MemoryRegion
> +      for a specified queue's notify MMIO region. QEMU will delete the current
> +      MemoryRegion if any for the specified queue when the message is sent
> +      without a file descriptor.
> +      This request should be sent only when VHOST_USER_PROTOCOL_F_VFIO protocol
> +      feature and VIRTIO_F_VERSION_1 feature have been successfully negotiated.
> +
>  VHOST_USER_PROTOCOL_F_REPLY_ACK:
>  -------------------------------
>  The original vhost-user specification only demands replies for certain
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 3e308d0a62..ec83746bd5 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -35,6 +35,7 @@ enum VhostUserProtocolFeature {
>      VHOST_USER_PROTOCOL_F_NET_MTU = 4,
>      VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5,
>      VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6,
> +    VHOST_USER_PROTOCOL_F_VFIO = 7,
>  
>      VHOST_USER_PROTOCOL_F_MAX
>  };
> @@ -72,6 +73,8 @@ typedef enum VhostUserRequest {
>  typedef enum VhostUserSlaveRequest {
>      VHOST_USER_SLAVE_NONE = 0,
>      VHOST_USER_SLAVE_IOTLB_MSG = 1,
> +    VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG = 2,
> +    VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG = 3,
>      VHOST_USER_SLAVE_MAX
>  }  VhostUserSlaveRequest;
>  
> @@ -93,6 +96,12 @@ typedef struct VhostUserLog {
>      uint64_t mmap_offset;
>  } VhostUserLog;
>  
> +typedef struct VhostUserVringArea {
> +    uint64_t u64;
> +    uint64_t size;
> +    uint64_t offset;
> +} VhostUserVringArea;
> +
>  typedef struct VhostUserMsg {
>      VhostUserRequest request;
>  
> @@ -110,6 +119,7 @@ typedef struct VhostUserMsg {
>          VhostUserMemory memory;
>          VhostUserLog log;
>          struct vhost_iotlb_msg iotlb;
> +        VhostUserVringArea area;
>      } payload;
>  } QEMU_PACKED VhostUserMsg;
>  
> @@ -415,9 +425,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
>      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
>  }
>  
> +static void vhost_user_notify_region_remap(struct vhost_dev *dev, int queue_idx)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserVFIOState *vfio = &u->shared->vfio;
> +    VhostUserNotifyCtx *notify = &vfio->notify[queue_idx];
> +    VirtIODevice *vdev = dev->vdev;
> +
> +    if (notify->addr && !notify->mapped) {
> +        virtio_device_notify_region_map(vdev, queue_idx, &notify->mr);
> +        notify->mapped = true;
> +    }
> +}
> +
> +static void vhost_user_notify_region_unmap(struct vhost_dev *dev, int queue_idx)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserVFIOState *vfio = &u->shared->vfio;
> +    VhostUserNotifyCtx *notify = &vfio->notify[queue_idx];
> +    VirtIODevice *vdev = dev->vdev;
> +
> +    if (notify->addr && notify->mapped) {
> +        virtio_device_notify_region_unmap(vdev, &notify->mr);
> +        notify->mapped = false;
> +    }
> +}
> +
>  static int vhost_user_set_vring_base(struct vhost_dev *dev,
>                                       struct vhost_vring_state *ring)
>  {
> +    vhost_user_notify_region_remap(dev, ring->index);
> +
>      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
>  }
>  
> @@ -451,6 +489,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
>          .size = sizeof(msg.payload.state),
>      };
>  
> +    vhost_user_notify_region_unmap(dev, ring->index);
> +
>      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
>          return -1;
>      }
> @@ -609,6 +649,136 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
>      return 0;
>  }
>  
> +static int vhost_user_handle_vring_vfio_group(struct vhost_dev *dev,
> +                                              uint64_t u64,
> +                                              int groupfd)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserVFIOState *vfio = &u->shared->vfio;
> +    int queue_idx = u64 & VHOST_USER_VRING_IDX_MASK;
> +    VirtIODevice *vdev = dev->vdev;
> +    VFIOGroup *group;
> +    int ret = 0;
> +
> +    qemu_mutex_lock(&vfio->lock);
> +
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_VFIO) ||
> +        vdev == NULL ||
> +        virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM) ||
> +        queue_idx >= virtio_get_num_queues(vdev)) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    if (vfio->group[queue_idx]) {
> +        vfio_put_group(vfio->group[queue_idx]);
> +        vfio->group[queue_idx] = NULL;
> +    }
> +
> +    if (u64 & VHOST_USER_VRING_NOFD_MASK) {
> +        goto out;
> +    }
> +
> +    group = vfio_get_group_from_fd(groupfd, NULL, NULL);
> +    if (group == NULL) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    if (group->fd != groupfd) {
> +        close(groupfd);
> +    }
> +
> +    vfio->group[queue_idx] = group;
> +
> +out:
> +    kvm_irqchip_commit_routes(kvm_state);
> +    qemu_mutex_unlock(&vfio->lock);
> +
> +    if (ret != 0 && groupfd != -1) {
> +        close(groupfd);
> +    }
> +
> +    return ret;
> +}
> +
> +#define NOTIFY_PAGE_SIZE 0x1000
> +
> +static int vhost_user_handle_vring_notify_area(struct vhost_dev *dev,
> +                                               VhostUserVringArea *area,
> +                                               int fd)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserVFIOState *vfio = &u->shared->vfio;
> +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> +    VirtIODevice *vdev = dev->vdev;
> +    VhostUserNotifyCtx *notify;
> +    void *addr = NULL;
> +    int ret = 0;
> +    char *name;
> +
> +    qemu_mutex_lock(&vfio->lock);
> +
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_VFIO) ||
> +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev) ||
> +        virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM) ||
> +        !virtio_device_page_per_vq_enabled(vdev)) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    notify = &vfio->notify[queue_idx];
> +
> +    if (notify->addr) {
> +        virtio_device_notify_region_unmap(vdev, &notify->mr);
> +        munmap(notify->addr, NOTIFY_PAGE_SIZE);
> +        object_unparent(OBJECT(&notify->mr));
> +        notify->addr = NULL;
> +    }
> +
> +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> +        goto out;
> +    }
> +
> +    if (area->size < NOTIFY_PAGE_SIZE) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    addr = mmap(NULL, NOTIFY_PAGE_SIZE, PROT_READ | PROT_WRITE,
> +                MAP_SHARED, fd, area->offset);
> +    if (addr == MAP_FAILED) {
> +        error_report("Can't map notify region.");
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    name = g_strdup_printf("vhost-user/vfio@%p mmaps[%d]", vfio, queue_idx);
> +    memory_region_init_ram_device_ptr(&notify->mr, OBJECT(vdev), name,
> +                                      NOTIFY_PAGE_SIZE, addr);
> +    g_free(name);
> +
> +    if (virtio_device_notify_region_map(vdev, queue_idx, &notify->mr)) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    notify->addr = addr;
> +    notify->mapped = true;
> +
> +out:
> +    if (ret < 0 && addr != NULL) {
> +        munmap(addr, NOTIFY_PAGE_SIZE);
> +    }
> +    if (fd != -1) {
> +        close(fd);
> +    }
> +    qemu_mutex_unlock(&vfio->lock);
> +    return ret;
> +}
> +
>  static void slave_read(void *opaque)
>  {
>      struct vhost_dev *dev = opaque;
> @@ -670,6 +840,12 @@ static void slave_read(void *opaque)
>      case VHOST_USER_SLAVE_IOTLB_MSG:
>          ret = vhost_backend_handle_iotlb_msg(dev, &msg.payload.iotlb);
>          break;
> +    case VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG:
> +        ret = vhost_user_handle_vring_vfio_group(dev, msg.payload.u64, fd);
> +        break;
> +    case VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG:
> +        ret = vhost_user_handle_vring_notify_area(dev, &msg.payload.area, fd);
> +        break;
>      default:
>          error_report("Received unexpected msg type.");
>          if (fd != -1) {
> @@ -772,6 +948,10 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
>      u->slave_fd = -1;
>      dev->opaque = u;
>  
> +    if (dev->vq_index == 0) {
> +        qemu_mutex_init(&u->shared->vfio.lock);
> +    }
> +
>      err = vhost_user_get_features(dev, &features);
>      if (err < 0) {
>          return err;
> @@ -832,6 +1012,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
>  static int vhost_user_cleanup(struct vhost_dev *dev)
>  {
>      struct vhost_user *u;
> +    int i;
>  
>      assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
>  
> @@ -841,6 +1022,26 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
>          close(u->slave_fd);
>          u->slave_fd = -1;
>      }
> +
> +    if (dev->vq_index == 0) {
> +        VhostUserVFIOState *vfio = &u->shared->vfio;
> +
> +        for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> +            if (vfio->notify[i].addr) {
> +                munmap(vfio->notify[i].addr, NOTIFY_PAGE_SIZE);
> +                object_unparent(OBJECT(&vfio->notify[i].mr));
> +                vfio->notify[i].addr = NULL;
> +            }
> +
> +            if (vfio->group[i]) {
> +                vfio_put_group(vfio->group[i]);
> +                vfio->group[i] = NULL;
> +            }
> +        }
> +
> +        qemu_mutex_destroy(&u->shared->vfio.lock);
> +    }
> +
>      g_free(u);
>      dev->opaque = 0;
>  
> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> index 4f5a1477d1..de8c647962 100644
> --- a/include/hw/virtio/vhost-user.h
> +++ b/include/hw/virtio/vhost-user.h
> @@ -9,9 +9,26 @@
>  #define HW_VIRTIO_VHOST_USER_H
>  
>  #include "chardev/char-fe.h"
> +#include "hw/virtio/virtio.h"
> +#include "hw/vfio/vfio-common.h"
> +
> +typedef struct VhostUserNotifyCtx {
> +    void *addr;
> +    MemoryRegion mr;
> +    bool mapped;
> +} VhostUserNotifyCtx;
> +
> +typedef struct VhostUserVFIOState {
> +    /* The VFIO group associated with each queue */
> +    VFIOGroup *group[VIRTIO_QUEUE_MAX];
> +    /* The notify context of each queue */
> +    VhostUserNotifyCtx notify[VIRTIO_QUEUE_MAX];
> +    QemuMutex lock;
> +} VhostUserVFIOState;
>  
>  typedef struct VhostUser {
>      CharBackend chr;
> +    VhostUserVFIOState vfio;
>  } VhostUser;
>  
>  #endif
> -- 
> 2.13.3

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
  2018-01-25 23:59     ` [virtio-dev] " Michael S. Tsirkin
@ 2018-01-26  3:41       ` Jason Wang
  -1 siblings, 0 replies; 39+ messages in thread
From: Jason Wang @ 2018-01-26  3:41 UTC (permalink / raw)
  To: Michael S. Tsirkin, Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, pbonzini, stefanha,
	cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang



On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
>> The virtual IOMMU isn't supported by the accelerators for now.
>> Because vhost-user currently lacks of an efficient way to share
>> the IOMMU table in VM to vhost backend. That's why the software
>> implementation of virtual IOMMU support in vhost-user backend
>> can't support dynamic mapping well.
> What exactly is meant by that? vIOMMU seems to work for people,
> it's not that fast if you change mappings all the time,
> but e.g. dpdk within guest doesn't.

Yes, software implementation support dynamic mapping for sure. I think 
the point is, current vhost-user backend can not program hardware IOMMU. 
So it can not let hardware accelerator to cowork with software vIOMMU. I 
think that's another call to implement the offloaded path inside qemu 
which has complete support for vIOMMU co-operated VFIO.

Thanks

>
>> Once this problem is solved
>> in vhost-user, virtual IOMMU can be supported by accelerators
>> too, and the IOMMU feature bit checking in this patch can be
>> removed.
> Given it works with software backends right now, I suspect
> this will be up to you guys to address.
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
@ 2018-01-26  3:41       ` Jason Wang
  0 siblings, 0 replies; 39+ messages in thread
From: Jason Wang @ 2018-01-26  3:41 UTC (permalink / raw)
  To: Michael S. Tsirkin, Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, pbonzini, stefanha,
	cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang



On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
>> The virtual IOMMU isn't supported by the accelerators for now.
>> Because vhost-user currently lacks of an efficient way to share
>> the IOMMU table in VM to vhost backend. That's why the software
>> implementation of virtual IOMMU support in vhost-user backend
>> can't support dynamic mapping well.
> What exactly is meant by that? vIOMMU seems to work for people,
> it's not that fast if you change mappings all the time,
> but e.g. dpdk within guest doesn't.

Yes, software implementation support dynamic mapping for sure. I think 
the point is, current vhost-user backend can not program hardware IOMMU. 
So it can not let hardware accelerator to cowork with software vIOMMU. I 
think that's another call to implement the offloaded path inside qemu 
which has complete support for vIOMMU co-operated VFIO.

Thanks

>
>> Once this problem is solved
>> in vhost-user, virtual IOMMU can be supported by accelerators
>> too, and the IOMMU feature bit checking in this patch can be
>> removed.
> Given it works with software backends right now, I suspect
> this will be up to you guys to address.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
  2018-01-26  3:41       ` Jason Wang
@ 2018-01-26  5:57         ` Tiwei Bie
  -1 siblings, 0 replies; 39+ messages in thread
From: Tiwei Bie @ 2018-01-26  5:57 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, qemu-devel, virtio-dev, alex.williamson,
	pbonzini, stefanha, cunming.liang, dan.daly, jianfeng.tan,
	zhihong.wang, xiao.w.wang

On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote:
> On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
> > > The virtual IOMMU isn't supported by the accelerators for now.
> > > Because vhost-user currently lacks of an efficient way to share
> > > the IOMMU table in VM to vhost backend. That's why the software
> > > implementation of virtual IOMMU support in vhost-user backend
> > > can't support dynamic mapping well.
> > What exactly is meant by that? vIOMMU seems to work for people,
> > it's not that fast if you change mappings all the time,
> > but e.g. dpdk within guest doesn't.
> 
> Yes, software implementation support dynamic mapping for sure. I think the
> point is, current vhost-user backend can not program hardware IOMMU. So it
> can not let hardware accelerator to cowork with software vIOMMU.

Vhost-user backend can program hardware IOMMU. Currently
vhost-user backend (or more precisely the vDPA driver in
vhost-user backend) will use the memory table (delivered
by the VHOST_USER_SET_MEM_TABLE message) to program the
IOMMU via vfio, and that's why accelerators can use the
GPA (guest physical address) in descriptors directly.

Theoretically, we can use the IOVA mapping info (delivered
by the VHOST_USER_IOTLB_MSG message) to program the IOMMU,
and accelerators will be able to use IOVA. But the problem
is that in vhost-user QEMU won't push all the IOVA mappings
to backend directly. Backend needs to ask for those info
when it meets a new IOVA. Such design and implementation
won't work well for dynamic mappings anyway and couldn't
be supported by hardware accelerators.

> I think
> that's another call to implement the offloaded path inside qemu which has
> complete support for vIOMMU co-operated VFIO.

Yes, that's exactly what we want. After revisiting the
last paragraph in the commit message, I found it's not
really accurate. The practicability of dynamic mappings
support is a common issue for QEMU. It also exists for
vfio (hw/vfio in QEMU). If QEMU needs to trap all the
map/unmap events, the data path performance couldn't be
high. If we want to thoroughly fix this issue especially
for vfio (hw/vfio in QEMU), we need to have the offload
path Jason mentioned in QEMU. And I think accelerators
could use it too.

Best regards,
Tiwei Bie

> 
> Thanks
> 
> > 
> > > Once this problem is solved
> > > in vhost-user, virtual IOMMU can be supported by accelerators
> > > too, and the IOMMU feature bit checking in this patch can be
> > > removed.
> > Given it works with software backends right now, I suspect
> > this will be up to you guys to address.
> > 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
@ 2018-01-26  5:57         ` Tiwei Bie
  0 siblings, 0 replies; 39+ messages in thread
From: Tiwei Bie @ 2018-01-26  5:57 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, qemu-devel, virtio-dev, alex.williamson,
	pbonzini, stefanha, cunming.liang, dan.daly, jianfeng.tan,
	zhihong.wang, xiao.w.wang

On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote:
> On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
> > > The virtual IOMMU isn't supported by the accelerators for now.
> > > Because vhost-user currently lacks of an efficient way to share
> > > the IOMMU table in VM to vhost backend. That's why the software
> > > implementation of virtual IOMMU support in vhost-user backend
> > > can't support dynamic mapping well.
> > What exactly is meant by that? vIOMMU seems to work for people,
> > it's not that fast if you change mappings all the time,
> > but e.g. dpdk within guest doesn't.
> 
> Yes, software implementation support dynamic mapping for sure. I think the
> point is, current vhost-user backend can not program hardware IOMMU. So it
> can not let hardware accelerator to cowork with software vIOMMU.

Vhost-user backend can program hardware IOMMU. Currently
vhost-user backend (or more precisely the vDPA driver in
vhost-user backend) will use the memory table (delivered
by the VHOST_USER_SET_MEM_TABLE message) to program the
IOMMU via vfio, and that's why accelerators can use the
GPA (guest physical address) in descriptors directly.

Theoretically, we can use the IOVA mapping info (delivered
by the VHOST_USER_IOTLB_MSG message) to program the IOMMU,
and accelerators will be able to use IOVA. But the problem
is that in vhost-user QEMU won't push all the IOVA mappings
to backend directly. Backend needs to ask for those info
when it meets a new IOVA. Such design and implementation
won't work well for dynamic mappings anyway and couldn't
be supported by hardware accelerators.

> I think
> that's another call to implement the offloaded path inside qemu which has
> complete support for vIOMMU co-operated VFIO.

Yes, that's exactly what we want. After revisiting the
last paragraph in the commit message, I found it's not
really accurate. The practicability of dynamic mappings
support is a common issue for QEMU. It also exists for
vfio (hw/vfio in QEMU). If QEMU needs to trap all the
map/unmap events, the data path performance couldn't be
high. If we want to thoroughly fix this issue especially
for vfio (hw/vfio in QEMU), we need to have the offload
path Jason mentioned in QEMU. And I think accelerators
could use it too.

Best regards,
Tiwei Bie

> 
> Thanks
> 
> > 
> > > Once this problem is solved
> > > in vhost-user, virtual IOMMU can be supported by accelerators
> > > too, and the IOMMU feature bit checking in this patch can be
> > > removed.
> > Given it works with software backends right now, I suspect
> > this will be up to you guys to address.
> > 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v1 0/6] Extend vhost-user to support VFIO based accelerators
  2018-01-25 14:22 ` [Qemu-devel] [PATCH v1 0/6] Extend vhost-user to support VFIO based accelerators Stefan Hajnoczi
  2018-01-25 16:10     ` [virtio-dev] " Liang, Cunming
@ 2018-01-26  6:28   ` Tiwei Bie
  1 sibling, 0 replies; 39+ messages in thread
From: Tiwei Bie @ 2018-01-26  6:28 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha, jianfeng.tan, cunming.liang, xiao.w.wang, zhihong.wang,
	dan.daly

On Thu, Jan 25, 2018 at 02:22:20PM +0000, Stefan Hajnoczi wrote:
> On Thu, Jan 25, 2018 at 12:03:22PM +0800, Tiwei Bie wrote:
> > Why extend vhost-user for vDPA
> > ==============================
> > 
> > We have already implemented various virtual switches (e.g. OVS-DPDK)
> > based on vhost-user for VMs in the Cloud. They are purely software
> > running on CPU cores. When we have accelerators for such NFVi applications,
> > it's ideal if the applications could keep using the original interface
> > (i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
> > when and how to switch between CPU and accelerators within the interface.
> > And the switching (i.e. switch between CPU and accelerators) can be done
> > flexibly and quickly inside the applications.
> > 
> > More details about this can be found from the Cunming's discussions on
> > the RFC patch set.
> > 
> > The previous links:
> > RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
> 
> Is vDPA also useful in the simpler use case where there is no NFVi
> application?
> 
> In other words, you have virtio-net inside the guest and vhost-net on
> the host.  You'd like to accelerate the virtio-net NIC using the
> hardware's vDPA support.
> 
> DPDK requires dedicated logical cores and hugepages.  Those resources
> will not be used (wasted) if you just want to enable vDPA but have no
> DPDK packet processing application.
> 
> How can this use case be supported without wasting resources?

DPDK applications don't have to keep polling. In the case
you described, the DPDK process just need to listen on the
vhost socket, basically its CPU utilization is 0% if the
notify passthrough is setup correctly.

DPDK applications also don't have to allocate a lot of
memory. In the case you described, the DPDK process just
need to allocate some memory to initialize eal. The memory
table provided by QEMU will be used to program the IOMMU
for the accelerator device. If the accelerator device has
control vq, the DPDK process may need to allocate some
extra memory to talk with the device via control vq.

Best regards,
Tiwei Bie

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v1 0/6] Extend vhost-user to support VFIO based accelerators
  2018-01-25 16:10     ` [virtio-dev] " Liang, Cunming
  (?)
@ 2018-01-26  7:17     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 39+ messages in thread
From: Stefan Hajnoczi @ 2018-01-26  7:17 UTC (permalink / raw)
  To: Liang, Cunming
  Cc: Bie, Tiwei, qemu-devel, virtio-dev, mst, alex.williamson,
	jasowang, pbonzini, stefanha, Tan, Jianfeng, Wang, Xiao W, Wang,
	Zhihong, Daly, Dan

On Thu, Jan 25, 2018 at 4:10 PM, Liang, Cunming <cunming.liang@intel.com> wrote:
>> -----Original Message-----
>> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
>> Sent: Thursday, January 25, 2018 10:22 PM
>> To: Bie, Tiwei <tiwei.bie@intel.com>
>> Cc: qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org;
>> mst@redhat.com; alex.williamson@redhat.com; jasowang@redhat.com;
>> pbonzini@redhat.com; stefanha@redhat.com; Tan, Jianfeng
>> <jianfeng.tan@intel.com>; Liang, Cunming <cunming.liang@intel.com>;
>> Wang, Xiao W <xiao.w.wang@intel.com>; Wang, Zhihong
>> <zhihong.wang@intel.com>; Daly, Dan <dan.daly@intel.com>
>> Subject: Re: [Qemu-devel] [PATCH v1 0/6] Extend vhost-user to support
>> VFIO based accelerators
>>
>> On Thu, Jan 25, 2018 at 12:03:22PM +0800, Tiwei Bie wrote:
>> > Why extend vhost-user for vDPA
>> > ==============================
>> >
>> > We have already implemented various virtual switches (e.g. OVS-DPDK)
>> > based on vhost-user for VMs in the Cloud. They are purely software
>> > running on CPU cores. When we have accelerators for such NFVi
>> > applications, it's ideal if the applications could keep using the
>> > original interface (i.e. vhost-user netdev) with QEMU, and
>> > infrastructure is able to decide when and how to switch between CPU
>> > and
>> accelerators within the interface.
>> > And the switching (i.e. switch between CPU and accelerators) can be
>> > done flexibly and quickly inside the applications.
>> >
>> > More details about this can be found from the Cunming's discussions
>> > on the RFC patch set.
>> >
>> > The previous links:
>> > RFC:
>> > http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.htm
>> > l
>>
>> Is vDPA also useful in the simpler use case where there is no NFVi application?
> There're separate patches to have non-DPDK vDPA(aka. vhost-vfio, a new vhost backend proposed) support, which is the scope of stage-II as we mentioned in community call.
> Stay tuned for the RFC patch. Main idea is that the virtio compatible device driver register as a mdev to talk with qemu vhost-vfio.

Thanks!

Stefan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
  2018-01-26  5:57         ` Tiwei Bie
@ 2018-02-04 21:49           ` Alexander Duyck
  -1 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2018-02-04 21:49 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: Jason Wang, Michael S. Tsirkin, qemu-devel, virtio-dev,
	Alex Williamson, Paolo Bonzini, stefanha, cunming.liang,
	dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang

On Thu, Jan 25, 2018 at 9:57 PM, Tiwei Bie <tiwei.bie@intel.com> wrote:
> On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote:
>> On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
>> > > The virtual IOMMU isn't supported by the accelerators for now.
>> > > Because vhost-user currently lacks of an efficient way to share
>> > > the IOMMU table in VM to vhost backend. That's why the software
>> > > implementation of virtual IOMMU support in vhost-user backend
>> > > can't support dynamic mapping well.
>> > What exactly is meant by that? vIOMMU seems to work for people,
>> > it's not that fast if you change mappings all the time,
>> > but e.g. dpdk within guest doesn't.
>>
>> Yes, software implementation support dynamic mapping for sure. I think the
>> point is, current vhost-user backend can not program hardware IOMMU. So it
>> can not let hardware accelerator to cowork with software vIOMMU.
>
> Vhost-user backend can program hardware IOMMU. Currently
> vhost-user backend (or more precisely the vDPA driver in
> vhost-user backend) will use the memory table (delivered
> by the VHOST_USER_SET_MEM_TABLE message) to program the
> IOMMU via vfio, and that's why accelerators can use the
> GPA (guest physical address) in descriptors directly.
>
> Theoretically, we can use the IOVA mapping info (delivered
> by the VHOST_USER_IOTLB_MSG message) to program the IOMMU,
> and accelerators will be able to use IOVA. But the problem
> is that in vhost-user QEMU won't push all the IOVA mappings
> to backend directly. Backend needs to ask for those info
> when it meets a new IOVA. Such design and implementation
> won't work well for dynamic mappings anyway and couldn't
> be supported by hardware accelerators.
>
>> I think
>> that's another call to implement the offloaded path inside qemu which has
>> complete support for vIOMMU co-operated VFIO.
>
> Yes, that's exactly what we want. After revisiting the
> last paragraph in the commit message, I found it's not
> really accurate. The practicability of dynamic mappings
> support is a common issue for QEMU. It also exists for
> vfio (hw/vfio in QEMU). If QEMU needs to trap all the
> map/unmap events, the data path performance couldn't be
> high. If we want to thoroughly fix this issue especially
> for vfio (hw/vfio in QEMU), we need to have the offload
> path Jason mentioned in QEMU. And I think accelerators
> could use it too.
>
> Best regards,
> Tiwei Bie

I wonder if we couldn't look at coming up with an altered security
model for the IOMMU drivers to address some of the performance issues
seen with typical hardware IOMMU?

In the case of most network devices, we seem to be moving toward a
model where the Rx pages are mapped for an extended period of time and
see a fairly high rate of reuse. As such pages mapped as being
writable or read/write by the device are left mapped for an extended
period of time while Tx pages, which are read only, are often
mapped/unmapped since they are coming from some other location in the
kernel beyond the driver's control.

If we were to somehow come up with a model where the read-only(Tx)
pages had access to a pre-allocated memory mapped address, and the
read/write(descriptor rings), write-only(Rx) pages were provided with
dynamic addresses we might be able to come up with a solution that
would allow for fairly high network performance while at least
protecting from memory corruption. The only issue it would open up is
that the device would have the ability to read any/all memory on the
guest. I was wondering about doing something like this with the vIOMMU
with VFIO for the Intel NICs this way since an interface like igb,
ixgbe, ixgbevf, i40e, or i40evf would probably show pretty good
performance under such a model and as long as the writable pages were
being tracked by the vIOMMU. It could even allow for live migration
support if the vIOMMU provided the info needed for migratable/dirty
page tracking and we held off on migrating any of the dynamically
mapped pages until after they were either unmapped or an FLR reset the
device.

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
@ 2018-02-04 21:49           ` Alexander Duyck
  0 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2018-02-04 21:49 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: Jason Wang, Michael S. Tsirkin, qemu-devel, virtio-dev,
	Alex Williamson, Paolo Bonzini, stefanha, cunming.liang,
	dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang

On Thu, Jan 25, 2018 at 9:57 PM, Tiwei Bie <tiwei.bie@intel.com> wrote:
> On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote:
>> On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
>> > > The virtual IOMMU isn't supported by the accelerators for now.
>> > > Because vhost-user currently lacks of an efficient way to share
>> > > the IOMMU table in VM to vhost backend. That's why the software
>> > > implementation of virtual IOMMU support in vhost-user backend
>> > > can't support dynamic mapping well.
>> > What exactly is meant by that? vIOMMU seems to work for people,
>> > it's not that fast if you change mappings all the time,
>> > but e.g. dpdk within guest doesn't.
>>
>> Yes, software implementation support dynamic mapping for sure. I think the
>> point is, current vhost-user backend can not program hardware IOMMU. So it
>> can not let hardware accelerator to cowork with software vIOMMU.
>
> Vhost-user backend can program hardware IOMMU. Currently
> vhost-user backend (or more precisely the vDPA driver in
> vhost-user backend) will use the memory table (delivered
> by the VHOST_USER_SET_MEM_TABLE message) to program the
> IOMMU via vfio, and that's why accelerators can use the
> GPA (guest physical address) in descriptors directly.
>
> Theoretically, we can use the IOVA mapping info (delivered
> by the VHOST_USER_IOTLB_MSG message) to program the IOMMU,
> and accelerators will be able to use IOVA. But the problem
> is that in vhost-user QEMU won't push all the IOVA mappings
> to backend directly. Backend needs to ask for those info
> when it meets a new IOVA. Such design and implementation
> won't work well for dynamic mappings anyway and couldn't
> be supported by hardware accelerators.
>
>> I think
>> that's another call to implement the offloaded path inside qemu which has
>> complete support for vIOMMU co-operated VFIO.
>
> Yes, that's exactly what we want. After revisiting the
> last paragraph in the commit message, I found it's not
> really accurate. The practicability of dynamic mappings
> support is a common issue for QEMU. It also exists for
> vfio (hw/vfio in QEMU). If QEMU needs to trap all the
> map/unmap events, the data path performance couldn't be
> high. If we want to thoroughly fix this issue especially
> for vfio (hw/vfio in QEMU), we need to have the offload
> path Jason mentioned in QEMU. And I think accelerators
> could use it too.
>
> Best regards,
> Tiwei Bie

I wonder if we couldn't look at coming up with an altered security
model for the IOMMU drivers to address some of the performance issues
seen with typical hardware IOMMU?

In the case of most network devices, we seem to be moving toward a
model where the Rx pages are mapped for an extended period of time and
see a fairly high rate of reuse. As such pages mapped as being
writable or read/write by the device are left mapped for an extended
period of time while Tx pages, which are read only, are often
mapped/unmapped since they are coming from some other location in the
kernel beyond the driver's control.

If we were to somehow come up with a model where the read-only(Tx)
pages had access to a pre-allocated memory mapped address, and the
read/write(descriptor rings), write-only(Rx) pages were provided with
dynamic addresses we might be able to come up with a solution that
would allow for fairly high network performance while at least
protecting from memory corruption. The only issue it would open up is
that the device would have the ability to read any/all memory on the
guest. I was wondering about doing something like this with the vIOMMU
with VFIO for the Intel NICs this way since an interface like igb,
ixgbe, ixgbevf, i40e, or i40evf would probably show pretty good
performance under such a model and as long as the writable pages were
being tracked by the vIOMMU. It could even allow for live migration
support if the vIOMMU provided the info needed for migratable/dirty
page tracking and we held off on migrating any of the dynamically
mapped pages until after they were either unmapped or an FLR reset the
device.

Thanks.

- Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [virtio-dev] [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
  2018-01-25  4:03   ` [virtio-dev] " Tiwei Bie
@ 2018-02-05 17:47     ` Paolo Bonzini
  -1 siblings, 0 replies; 39+ messages in thread
From: Paolo Bonzini @ 2018-02-05 17:47 UTC (permalink / raw)
  To: Tiwei Bie, qemu-devel, virtio-dev, mst, alex.williamson,
	jasowang, stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang

On 25/01/2018 05:03, Tiwei Bie wrote:
> The key difference with PCI passthru is that, in this case only
> the data path of the device (e.g. DMA ring, notify region and
> queue interrupt) is pass-throughed to the VM, the device control
> path (e.g. PCI configuration space and MMIO regions) is still
> defined and emulated by QEMU.
> 
> The benefits of keeping virtio device emulation in QEMU compared
> with virtio device PCI passthru include (but not limit to):
> 
> - consistent device interface for guest OS in the VM;
> - max flexibility on the hardware (i.e. the accelerators) design;
> - leveraging the existing virtio live-migration framework;
> 
> The virtual IOMMU isn't supported by the accelerators for now.
> Because vhost-user currently lacks of an efficient way to share
> the IOMMU table in VM to vhost backend. That's why the software
> implementation of virtual IOMMU support in vhost-user backend
> can't support dynamic mapping well. Once this problem is solved
> in vhost-user, virtual IOMMU can be supported by accelerators
> too, and the IOMMU feature bit checking in this patch can be
> removed.

I don't understand why this would use vhost-user.  vhost-user is meant
for connecting to e.g. a user-space switch that is shared between
multiple virtual machines.

In this case, there would be one VFIO device per VM (because different
VM must be in different VFIO groups).  So I don't understand the benefit
of configuring the control path of the VFIO device outside QEMU.

Paolo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [virtio-dev] [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
@ 2018-02-05 17:47     ` Paolo Bonzini
  0 siblings, 0 replies; 39+ messages in thread
From: Paolo Bonzini @ 2018-02-05 17:47 UTC (permalink / raw)
  To: Tiwei Bie, qemu-devel, virtio-dev, mst, alex.williamson,
	jasowang, stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang

On 25/01/2018 05:03, Tiwei Bie wrote:
> The key difference with PCI passthru is that, in this case only
> the data path of the device (e.g. DMA ring, notify region and
> queue interrupt) is pass-throughed to the VM, the device control
> path (e.g. PCI configuration space and MMIO regions) is still
> defined and emulated by QEMU.
> 
> The benefits of keeping virtio device emulation in QEMU compared
> with virtio device PCI passthru include (but not limit to):
> 
> - consistent device interface for guest OS in the VM;
> - max flexibility on the hardware (i.e. the accelerators) design;
> - leveraging the existing virtio live-migration framework;
> 
> The virtual IOMMU isn't supported by the accelerators for now.
> Because vhost-user currently lacks of an efficient way to share
> the IOMMU table in VM to vhost backend. That's why the software
> implementation of virtual IOMMU support in vhost-user backend
> can't support dynamic mapping well. Once this problem is solved
> in vhost-user, virtual IOMMU can be supported by accelerators
> too, and the IOMMU feature bit checking in this patch can be
> removed.

I don't understand why this would use vhost-user.  vhost-user is meant
for connecting to e.g. a user-space switch that is shared between
multiple virtual machines.

In this case, there would be one VFIO device per VM (because different
VM must be in different VFIO groups).  So I don't understand the benefit
of configuring the control path of the VFIO device outside QEMU.

Paolo

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [virtio-dev] [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
  2018-02-05 17:47     ` Paolo Bonzini
@ 2018-02-06  4:40       ` Tiwei Bie
  -1 siblings, 0 replies; 39+ messages in thread
From: Tiwei Bie @ 2018-02-06  4:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, stefanha,
	cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang

On Mon, Feb 05, 2018 at 06:47:51PM +0100, Paolo Bonzini wrote:
> On 25/01/2018 05:03, Tiwei Bie wrote:
> > The key difference with PCI passthru is that, in this case only
> > the data path of the device (e.g. DMA ring, notify region and
> > queue interrupt) is pass-throughed to the VM, the device control
> > path (e.g. PCI configuration space and MMIO regions) is still
> > defined and emulated by QEMU.
> > 
> > The benefits of keeping virtio device emulation in QEMU compared
> > with virtio device PCI passthru include (but not limit to):
> > 
> > - consistent device interface for guest OS in the VM;
> > - max flexibility on the hardware (i.e. the accelerators) design;
> > - leveraging the existing virtio live-migration framework;
> > 
> > The virtual IOMMU isn't supported by the accelerators for now.
> > Because vhost-user currently lacks of an efficient way to share
> > the IOMMU table in VM to vhost backend. That's why the software
> > implementation of virtual IOMMU support in vhost-user backend
> > can't support dynamic mapping well. Once this problem is solved
> > in vhost-user, virtual IOMMU can be supported by accelerators
> > too, and the IOMMU feature bit checking in this patch can be
> > removed.
> 
> I don't understand why this would use vhost-user.  vhost-user is meant
> for connecting to e.g. a user-space switch that is shared between
> multiple virtual machines.

Yeah, you're right!

The commit log you quoted is talking about the benefits
of vDPA (i.e. only passthru the data path), which is not
related to vhost-user.

The usage of vhost-user you described is exactly why we
want to use vhost-user. In our case, the accelerator for
each VM is a PCI VF device and the PCI card has vswitch
logic (the VFs are the ports of switch to connect VMs).
So the card is a vswitch accelerator which will be shared
between multiple VMs. If we extend vhost-user, QEMU can
keep using the vhost-user interface to connect to the
user-space switch which has an optional accelerator.

More details can be found in the "Why extend vhost-user
for vDPA" section of the cover letter:

----- START -----

Why extend vhost-user for vDPA
==============================

We have already implemented various virtual switches (e.g. OVS-DPDK)
based on vhost-user for VMs in the Cloud. They are purely software
running on CPU cores. When we have accelerators for such NFVi applications,
it's ideal if the applications could keep using the original interface
(i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
when and how to switch between CPU and accelerators within the interface.
And the switching (i.e. switch between CPU and accelerators) can be done
flexibly and quickly inside the applications.

----- END -----

I'll try to add these infos into the commit log. Thanks!

Best regards,
Tiwei Bie

> 
> In this case, there would be one VFIO device per VM (because different
> VM must be in different VFIO groups).  So I don't understand the benefit
> of configuring the control path of the VFIO device outside QEMU.
> 
> Paolo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [virtio-dev] [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
@ 2018-02-06  4:40       ` Tiwei Bie
  0 siblings, 0 replies; 39+ messages in thread
From: Tiwei Bie @ 2018-02-06  4:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, stefanha,
	cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang

On Mon, Feb 05, 2018 at 06:47:51PM +0100, Paolo Bonzini wrote:
> On 25/01/2018 05:03, Tiwei Bie wrote:
> > The key difference with PCI passthru is that, in this case only
> > the data path of the device (e.g. DMA ring, notify region and
> > queue interrupt) is pass-throughed to the VM, the device control
> > path (e.g. PCI configuration space and MMIO regions) is still
> > defined and emulated by QEMU.
> > 
> > The benefits of keeping virtio device emulation in QEMU compared
> > with virtio device PCI passthru include (but not limit to):
> > 
> > - consistent device interface for guest OS in the VM;
> > - max flexibility on the hardware (i.e. the accelerators) design;
> > - leveraging the existing virtio live-migration framework;
> > 
> > The virtual IOMMU isn't supported by the accelerators for now.
> > Because vhost-user currently lacks of an efficient way to share
> > the IOMMU table in VM to vhost backend. That's why the software
> > implementation of virtual IOMMU support in vhost-user backend
> > can't support dynamic mapping well. Once this problem is solved
> > in vhost-user, virtual IOMMU can be supported by accelerators
> > too, and the IOMMU feature bit checking in this patch can be
> > removed.
> 
> I don't understand why this would use vhost-user.  vhost-user is meant
> for connecting to e.g. a user-space switch that is shared between
> multiple virtual machines.

Yeah, you're right!

The commit log you quoted is talking about the benefits
of vDPA (i.e. only passthru the data path), which is not
related to vhost-user.

The usage of vhost-user you described is exactly why we
want to use vhost-user. In our case, the accelerator for
each VM is a PCI VF device and the PCI card has vswitch
logic (the VFs are the ports of switch to connect VMs).
So the card is a vswitch accelerator which will be shared
between multiple VMs. If we extend vhost-user, QEMU can
keep using the vhost-user interface to connect to the
user-space switch which has an optional accelerator.

More details can be found in the "Why extend vhost-user
for vDPA" section of the cover letter:

----- START -----

Why extend vhost-user for vDPA
==============================

We have already implemented various virtual switches (e.g. OVS-DPDK)
based on vhost-user for VMs in the Cloud. They are purely software
running on CPU cores. When we have accelerators for such NFVi applications,
it's ideal if the applications could keep using the original interface
(i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
when and how to switch between CPU and accelerators within the interface.
And the switching (i.e. switch between CPU and accelerators) can be done
flexibly and quickly inside the applications.

----- END -----

I'll try to add these infos into the commit log. Thanks!

Best regards,
Tiwei Bie

> 
> In this case, there would be one VFIO device per VM (because different
> VM must be in different VFIO groups).  So I don't understand the benefit
> of configuring the control path of the VFIO device outside QEMU.
> 
> Paolo

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [virtio-dev] [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
  2018-02-06  4:40       ` Tiwei Bie
@ 2018-02-07 15:23         ` Paolo Bonzini
  -1 siblings, 0 replies; 39+ messages in thread
From: Paolo Bonzini @ 2018-02-07 15:23 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, stefanha,
	cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang

On 06/02/2018 05:40, Tiwei Bie wrote:
> In our case, the accelerator for
> each VM is a PCI VF device and the PCI card has vswitch
> logic (the VFs are the ports of switch to connect VMs).

Ok, this makes a lot more sense now. :)

Paolo

> So the card is a vswitch accelerator which will be shared
> between multiple VMs.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [virtio-dev] [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
@ 2018-02-07 15:23         ` Paolo Bonzini
  0 siblings, 0 replies; 39+ messages in thread
From: Paolo Bonzini @ 2018-02-07 15:23 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, stefanha,
	cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang

On 06/02/2018 05:40, Tiwei Bie wrote:
> In our case, the accelerator for
> each VM is a PCI VF device and the PCI card has vswitch
> logic (the VFs are the ports of switch to connect VMs).

Ok, this makes a lot more sense now. :)

Paolo

> So the card is a vswitch accelerator which will be shared
> between multiple VMs.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
  2018-02-04 21:49           ` Alexander Duyck
@ 2018-02-07 16:43             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2018-02-07 16:43 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Tiwei Bie, Jason Wang, qemu-devel, virtio-dev, Alex Williamson,
	Paolo Bonzini, stefanha, cunming.liang, dan.daly, jianfeng.tan,
	zhihong.wang, xiao.w.wang

On Sun, Feb 04, 2018 at 01:49:46PM -0800, Alexander Duyck wrote:
> On Thu, Jan 25, 2018 at 9:57 PM, Tiwei Bie <tiwei.bie@intel.com> wrote:
> > On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote:
> >> On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
> >> > > The virtual IOMMU isn't supported by the accelerators for now.
> >> > > Because vhost-user currently lacks of an efficient way to share
> >> > > the IOMMU table in VM to vhost backend. That's why the software
> >> > > implementation of virtual IOMMU support in vhost-user backend
> >> > > can't support dynamic mapping well.
> >> > What exactly is meant by that? vIOMMU seems to work for people,
> >> > it's not that fast if you change mappings all the time,
> >> > but e.g. dpdk within guest doesn't.
> >>
> >> Yes, software implementation support dynamic mapping for sure. I think the
> >> point is, current vhost-user backend can not program hardware IOMMU. So it
> >> can not let hardware accelerator to cowork with software vIOMMU.
> >
> > Vhost-user backend can program hardware IOMMU. Currently
> > vhost-user backend (or more precisely the vDPA driver in
> > vhost-user backend) will use the memory table (delivered
> > by the VHOST_USER_SET_MEM_TABLE message) to program the
> > IOMMU via vfio, and that's why accelerators can use the
> > GPA (guest physical address) in descriptors directly.
> >
> > Theoretically, we can use the IOVA mapping info (delivered
> > by the VHOST_USER_IOTLB_MSG message) to program the IOMMU,
> > and accelerators will be able to use IOVA. But the problem
> > is that in vhost-user QEMU won't push all the IOVA mappings
> > to backend directly. Backend needs to ask for those info
> > when it meets a new IOVA. Such design and implementation
> > won't work well for dynamic mappings anyway and couldn't
> > be supported by hardware accelerators.
> >
> >> I think
> >> that's another call to implement the offloaded path inside qemu which has
> >> complete support for vIOMMU co-operated VFIO.
> >
> > Yes, that's exactly what we want. After revisiting the
> > last paragraph in the commit message, I found it's not
> > really accurate. The practicability of dynamic mappings
> > support is a common issue for QEMU. It also exists for
> > vfio (hw/vfio in QEMU). If QEMU needs to trap all the
> > map/unmap events, the data path performance couldn't be
> > high. If we want to thoroughly fix this issue especially
> > for vfio (hw/vfio in QEMU), we need to have the offload
> > path Jason mentioned in QEMU. And I think accelerators
> > could use it too.
> >
> > Best regards,
> > Tiwei Bie
> 
> I wonder if we couldn't look at coming up with an altered security
> model for the IOMMU drivers to address some of the performance issues
> seen with typical hardware IOMMU?
> 
> In the case of most network devices, we seem to be moving toward a
> model where the Rx pages are mapped for an extended period of time and
> see a fairly high rate of reuse. As such pages mapped as being
> writable or read/write by the device are left mapped for an extended
> period of time while Tx pages, which are read only, are often
> mapped/unmapped since they are coming from some other location in the
> kernel beyond the driver's control.
> 
> If we were to somehow come up with a model where the read-only(Tx)
> pages had access to a pre-allocated memory mapped address, and the
> read/write(descriptor rings), write-only(Rx) pages were provided with
> dynamic addresses we might be able to come up with a solution that
> would allow for fairly high network performance while at least
> protecting from memory corruption. The only issue it would open up is
> that the device would have the ability to read any/all memory on the
> guest. I was wondering about doing something like this with the vIOMMU
> with VFIO for the Intel NICs this way since an interface like igb,
> ixgbe, ixgbevf, i40e, or i40evf would probably show pretty good
> performance under such a model and as long as the writable pages were
> being tracked by the vIOMMU. It could even allow for live migration
> support if the vIOMMU provided the info needed for migratable/dirty
> page tracking and we held off on migrating any of the dynamically
> mapped pages until after they were either unmapped or an FLR reset the
> device.
> 
> Thanks.
> 
> - Alex



It might be a good idea to change the iommu instead - how about a
variant of strict in intel iommu which forces an IOTLB flush after
invalidating a writeable mapping but not a RO mapping?  Not sure what the
name would be - relaxed-ro?

This is probably easier than poking at the drivers and net core.

Keeping the RX pages mapped in the IOMMU was envisioned for XDP.
That might be a good place to start.

-- 
MST

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
@ 2018-02-07 16:43             ` Michael S. Tsirkin
  0 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2018-02-07 16:43 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Tiwei Bie, Jason Wang, qemu-devel, virtio-dev, Alex Williamson,
	Paolo Bonzini, stefanha, cunming.liang, dan.daly, jianfeng.tan,
	zhihong.wang, xiao.w.wang

On Sun, Feb 04, 2018 at 01:49:46PM -0800, Alexander Duyck wrote:
> On Thu, Jan 25, 2018 at 9:57 PM, Tiwei Bie <tiwei.bie@intel.com> wrote:
> > On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote:
> >> On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
> >> > > The virtual IOMMU isn't supported by the accelerators for now.
> >> > > Because vhost-user currently lacks of an efficient way to share
> >> > > the IOMMU table in VM to vhost backend. That's why the software
> >> > > implementation of virtual IOMMU support in vhost-user backend
> >> > > can't support dynamic mapping well.
> >> > What exactly is meant by that? vIOMMU seems to work for people,
> >> > it's not that fast if you change mappings all the time,
> >> > but e.g. dpdk within guest doesn't.
> >>
> >> Yes, software implementation support dynamic mapping for sure. I think the
> >> point is, current vhost-user backend can not program hardware IOMMU. So it
> >> can not let hardware accelerator to cowork with software vIOMMU.
> >
> > Vhost-user backend can program hardware IOMMU. Currently
> > vhost-user backend (or more precisely the vDPA driver in
> > vhost-user backend) will use the memory table (delivered
> > by the VHOST_USER_SET_MEM_TABLE message) to program the
> > IOMMU via vfio, and that's why accelerators can use the
> > GPA (guest physical address) in descriptors directly.
> >
> > Theoretically, we can use the IOVA mapping info (delivered
> > by the VHOST_USER_IOTLB_MSG message) to program the IOMMU,
> > and accelerators will be able to use IOVA. But the problem
> > is that in vhost-user QEMU won't push all the IOVA mappings
> > to backend directly. Backend needs to ask for those info
> > when it meets a new IOVA. Such design and implementation
> > won't work well for dynamic mappings anyway and couldn't
> > be supported by hardware accelerators.
> >
> >> I think
> >> that's another call to implement the offloaded path inside qemu which has
> >> complete support for vIOMMU co-operated VFIO.
> >
> > Yes, that's exactly what we want. After revisiting the
> > last paragraph in the commit message, I found it's not
> > really accurate. The practicability of dynamic mappings
> > support is a common issue for QEMU. It also exists for
> > vfio (hw/vfio in QEMU). If QEMU needs to trap all the
> > map/unmap events, the data path performance couldn't be
> > high. If we want to thoroughly fix this issue especially
> > for vfio (hw/vfio in QEMU), we need to have the offload
> > path Jason mentioned in QEMU. And I think accelerators
> > could use it too.
> >
> > Best regards,
> > Tiwei Bie
> 
> I wonder if we couldn't look at coming up with an altered security
> model for the IOMMU drivers to address some of the performance issues
> seen with typical hardware IOMMU?
> 
> In the case of most network devices, we seem to be moving toward a
> model where the Rx pages are mapped for an extended period of time and
> see a fairly high rate of reuse. As such pages mapped as being
> writable or read/write by the device are left mapped for an extended
> period of time while Tx pages, which are read only, are often
> mapped/unmapped since they are coming from some other location in the
> kernel beyond the driver's control.
> 
> If we were to somehow come up with a model where the read-only(Tx)
> pages had access to a pre-allocated memory mapped address, and the
> read/write(descriptor rings), write-only(Rx) pages were provided with
> dynamic addresses we might be able to come up with a solution that
> would allow for fairly high network performance while at least
> protecting from memory corruption. The only issue it would open up is
> that the device would have the ability to read any/all memory on the
> guest. I was wondering about doing something like this with the vIOMMU
> with VFIO for the Intel NICs this way since an interface like igb,
> ixgbe, ixgbevf, i40e, or i40evf would probably show pretty good
> performance under such a model and as long as the writable pages were
> being tracked by the vIOMMU. It could even allow for live migration
> support if the vIOMMU provided the info needed for migratable/dirty
> page tracking and we held off on migrating any of the dynamically
> mapped pages until after they were either unmapped or an FLR reset the
> device.
> 
> Thanks.
> 
> - Alex



It might be a good idea to change the iommu instead - how about a
variant of strict in intel iommu which forces an IOTLB flush after
invalidating a writeable mapping but not a RO mapping?  Not sure what the
name would be - relaxed-ro?

This is probably easier than poking at the drivers and net core.

Keeping the RX pages mapped in the IOMMU was envisioned for XDP.
That might be a good place to start.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
  2018-02-07 16:43             ` Michael S. Tsirkin
@ 2018-02-07 18:02               ` Alexander Duyck
  -1 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2018-02-07 18:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Tiwei Bie, Jason Wang, qemu-devel, virtio-dev, Alex Williamson,
	Paolo Bonzini, stefanha, cunming.liang, dan.daly, jianfeng.tan,
	zhihong.wang, xiao.w.wang

On Wed, Feb 7, 2018 at 8:43 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Sun, Feb 04, 2018 at 01:49:46PM -0800, Alexander Duyck wrote:
>> On Thu, Jan 25, 2018 at 9:57 PM, Tiwei Bie <tiwei.bie@intel.com> wrote:
>> > On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote:
>> >> On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
>> >> > > The virtual IOMMU isn't supported by the accelerators for now.
>> >> > > Because vhost-user currently lacks of an efficient way to share
>> >> > > the IOMMU table in VM to vhost backend. That's why the software
>> >> > > implementation of virtual IOMMU support in vhost-user backend
>> >> > > can't support dynamic mapping well.
>> >> > What exactly is meant by that? vIOMMU seems to work for people,
>> >> > it's not that fast if you change mappings all the time,
>> >> > but e.g. dpdk within guest doesn't.
>> >>
>> >> Yes, software implementation support dynamic mapping for sure. I think the
>> >> point is, current vhost-user backend can not program hardware IOMMU. So it
>> >> can not let hardware accelerator to cowork with software vIOMMU.
>> >
>> > Vhost-user backend can program hardware IOMMU. Currently
>> > vhost-user backend (or more precisely the vDPA driver in
>> > vhost-user backend) will use the memory table (delivered
>> > by the VHOST_USER_SET_MEM_TABLE message) to program the
>> > IOMMU via vfio, and that's why accelerators can use the
>> > GPA (guest physical address) in descriptors directly.
>> >
>> > Theoretically, we can use the IOVA mapping info (delivered
>> > by the VHOST_USER_IOTLB_MSG message) to program the IOMMU,
>> > and accelerators will be able to use IOVA. But the problem
>> > is that in vhost-user QEMU won't push all the IOVA mappings
>> > to backend directly. Backend needs to ask for those info
>> > when it meets a new IOVA. Such design and implementation
>> > won't work well for dynamic mappings anyway and couldn't
>> > be supported by hardware accelerators.
>> >
>> >> I think
>> >> that's another call to implement the offloaded path inside qemu which has
>> >> complete support for vIOMMU co-operated VFIO.
>> >
>> > Yes, that's exactly what we want. After revisiting the
>> > last paragraph in the commit message, I found it's not
>> > really accurate. The practicability of dynamic mappings
>> > support is a common issue for QEMU. It also exists for
>> > vfio (hw/vfio in QEMU). If QEMU needs to trap all the
>> > map/unmap events, the data path performance couldn't be
>> > high. If we want to thoroughly fix this issue especially
>> > for vfio (hw/vfio in QEMU), we need to have the offload
>> > path Jason mentioned in QEMU. And I think accelerators
>> > could use it too.
>> >
>> > Best regards,
>> > Tiwei Bie
>>
>> I wonder if we couldn't look at coming up with an altered security
>> model for the IOMMU drivers to address some of the performance issues
>> seen with typical hardware IOMMU?
>>
>> In the case of most network devices, we seem to be moving toward a
>> model where the Rx pages are mapped for an extended period of time and
>> see a fairly high rate of reuse. As such pages mapped as being
>> writable or read/write by the device are left mapped for an extended
>> period of time while Tx pages, which are read only, are often
>> mapped/unmapped since they are coming from some other location in the
>> kernel beyond the driver's control.
>>
>> If we were to somehow come up with a model where the read-only(Tx)
>> pages had access to a pre-allocated memory mapped address, and the
>> read/write(descriptor rings), write-only(Rx) pages were provided with
>> dynamic addresses we might be able to come up with a solution that
>> would allow for fairly high network performance while at least
>> protecting from memory corruption. The only issue it would open up is
>> that the device would have the ability to read any/all memory on the
>> guest. I was wondering about doing something like this with the vIOMMU
>> with VFIO for the Intel NICs this way since an interface like igb,
>> ixgbe, ixgbevf, i40e, or i40evf would probably show pretty good
>> performance under such a model and as long as the writable pages were
>> being tracked by the vIOMMU. It could even allow for live migration
>> support if the vIOMMU provided the info needed for migratable/dirty
>> page tracking and we held off on migrating any of the dynamically
>> mapped pages until after they were either unmapped or an FLR reset the
>> device.
>>
>> Thanks.
>>
>> - Alex
>
>
>
> It might be a good idea to change the iommu instead - how about a
> variant of strict in intel iommu which forces an IOTLB flush after
> invalidating a writeable mapping but not a RO mapping?  Not sure what the
> name would be - relaxed-ro?
>
> This is probably easier than poking at the drivers and net core.
>
> Keeping the RX pages mapped in the IOMMU was envisioned for XDP.
> That might be a good place to start.

My plan is to update the Intel IOMMU driver first since it seems like
something that shouldn't require too much expertise in the operation
of the IOMMU to accomplish. My idea was more along the lines of
something like a "iommu=read-only-pt" or maybe "iommu=pt-ro" where the
Tx data would be identity mapped, and the descriptor rings and Rx data
could be in the dynamic mapping setup. The idea is loosely based on
the existing "iommu=pt" option that is normally used on the host if
you want to avoid the cost for dynamic mapping. Basically we just need
to keep an eye on the number of mappings that the device can write to.
Ideally if we leave the Tx as identity mapped that means we never have
to actually write to update any mapping which would mean no having to
jump into the hypervisor to deal with the update. The fact that most
of the drivers already leave the Rx buffers and descriptor rings
statically mapped should essentially take care of the rest for us.
What this would become is a version of "iommu=pt" where the user cares
about preventing the device from possibly corrupting memory, but would
still like better performance at the cost of the device being able to
ready and/all memory on the system.

As far as if it is strict or not I don't know how much we would need
to worry about that for the migration case. Essentially a deferred
IOTLB flush would result in us having extra pages marked as dirty and
non-migratable, but we would need to see how much overhead there is in
the migration to deal with those extra pages versus the cost of having
to do an IOTLB flush on every unmap call.

Anyway this is an idea that just occurred to me the other day so I
still need to do some more research into how easy/difficult
implementing a solution like this would be.

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
@ 2018-02-07 18:02               ` Alexander Duyck
  0 siblings, 0 replies; 39+ messages in thread
From: Alexander Duyck @ 2018-02-07 18:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Tiwei Bie, Jason Wang, qemu-devel, virtio-dev, Alex Williamson,
	Paolo Bonzini, stefanha, cunming.liang, dan.daly, jianfeng.tan,
	zhihong.wang, xiao.w.wang

On Wed, Feb 7, 2018 at 8:43 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Sun, Feb 04, 2018 at 01:49:46PM -0800, Alexander Duyck wrote:
>> On Thu, Jan 25, 2018 at 9:57 PM, Tiwei Bie <tiwei.bie@intel.com> wrote:
>> > On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote:
>> >> On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
>> >> > > The virtual IOMMU isn't supported by the accelerators for now.
>> >> > > Because vhost-user currently lacks of an efficient way to share
>> >> > > the IOMMU table in VM to vhost backend. That's why the software
>> >> > > implementation of virtual IOMMU support in vhost-user backend
>> >> > > can't support dynamic mapping well.
>> >> > What exactly is meant by that? vIOMMU seems to work for people,
>> >> > it's not that fast if you change mappings all the time,
>> >> > but e.g. dpdk within guest doesn't.
>> >>
>> >> Yes, software implementation support dynamic mapping for sure. I think the
>> >> point is, current vhost-user backend can not program hardware IOMMU. So it
>> >> can not let hardware accelerator to cowork with software vIOMMU.
>> >
>> > Vhost-user backend can program hardware IOMMU. Currently
>> > vhost-user backend (or more precisely the vDPA driver in
>> > vhost-user backend) will use the memory table (delivered
>> > by the VHOST_USER_SET_MEM_TABLE message) to program the
>> > IOMMU via vfio, and that's why accelerators can use the
>> > GPA (guest physical address) in descriptors directly.
>> >
>> > Theoretically, we can use the IOVA mapping info (delivered
>> > by the VHOST_USER_IOTLB_MSG message) to program the IOMMU,
>> > and accelerators will be able to use IOVA. But the problem
>> > is that in vhost-user QEMU won't push all the IOVA mappings
>> > to backend directly. Backend needs to ask for those info
>> > when it meets a new IOVA. Such design and implementation
>> > won't work well for dynamic mappings anyway and couldn't
>> > be supported by hardware accelerators.
>> >
>> >> I think
>> >> that's another call to implement the offloaded path inside qemu which has
>> >> complete support for vIOMMU co-operated VFIO.
>> >
>> > Yes, that's exactly what we want. After revisiting the
>> > last paragraph in the commit message, I found it's not
>> > really accurate. The practicability of dynamic mappings
>> > support is a common issue for QEMU. It also exists for
>> > vfio (hw/vfio in QEMU). If QEMU needs to trap all the
>> > map/unmap events, the data path performance couldn't be
>> > high. If we want to thoroughly fix this issue especially
>> > for vfio (hw/vfio in QEMU), we need to have the offload
>> > path Jason mentioned in QEMU. And I think accelerators
>> > could use it too.
>> >
>> > Best regards,
>> > Tiwei Bie
>>
>> I wonder if we couldn't look at coming up with an altered security
>> model for the IOMMU drivers to address some of the performance issues
>> seen with typical hardware IOMMU?
>>
>> In the case of most network devices, we seem to be moving toward a
>> model where the Rx pages are mapped for an extended period of time and
>> see a fairly high rate of reuse. As such pages mapped as being
>> writable or read/write by the device are left mapped for an extended
>> period of time while Tx pages, which are read only, are often
>> mapped/unmapped since they are coming from some other location in the
>> kernel beyond the driver's control.
>>
>> If we were to somehow come up with a model where the read-only(Tx)
>> pages had access to a pre-allocated memory mapped address, and the
>> read/write(descriptor rings), write-only(Rx) pages were provided with
>> dynamic addresses we might be able to come up with a solution that
>> would allow for fairly high network performance while at least
>> protecting from memory corruption. The only issue it would open up is
>> that the device would have the ability to read any/all memory on the
>> guest. I was wondering about doing something like this with the vIOMMU
>> with VFIO for the Intel NICs this way since an interface like igb,
>> ixgbe, ixgbevf, i40e, or i40evf would probably show pretty good
>> performance under such a model and as long as the writable pages were
>> being tracked by the vIOMMU. It could even allow for live migration
>> support if the vIOMMU provided the info needed for migratable/dirty
>> page tracking and we held off on migrating any of the dynamically
>> mapped pages until after they were either unmapped or an FLR reset the
>> device.
>>
>> Thanks.
>>
>> - Alex
>
>
>
> It might be a good idea to change the iommu instead - how about a
> variant of strict in intel iommu which forces an IOTLB flush after
> invalidating a writeable mapping but not a RO mapping?  Not sure what the
> name would be - relaxed-ro?
>
> This is probably easier than poking at the drivers and net core.
>
> Keeping the RX pages mapped in the IOMMU was envisioned for XDP.
> That might be a good place to start.

My plan is to update the Intel IOMMU driver first since it seems like
something that shouldn't require too much expertise in the operation
of the IOMMU to accomplish. My idea was more along the lines of
something like a "iommu=read-only-pt" or maybe "iommu=pt-ro" where the
Tx data would be identity mapped, and the descriptor rings and Rx data
could be in the dynamic mapping setup. The idea is loosely based on
the existing "iommu=pt" option that is normally used on the host if
you want to avoid the cost for dynamic mapping. Basically we just need
to keep an eye on the number of mappings that the device can write to.
Ideally if we leave the Tx as identity mapped that means we never have
to actually write to update any mapping which would mean no having to
jump into the hypervisor to deal with the update. The fact that most
of the drivers already leave the Rx buffers and descriptor rings
statically mapped should essentially take care of the rest for us.
What this would become is a version of "iommu=pt" where the user cares
about preventing the device from possibly corrupting memory, but would
still like better performance at the cost of the device being able to
ready and/all memory on the system.

As far as if it is strict or not I don't know how much we would need
to worry about that for the migration case. Essentially a deferred
IOTLB flush would result in us having extra pages marked as dirty and
non-migratable, but we would need to see how much overhead there is in
the migration to deal with those extra pages versus the cost of having
to do an IOTLB flush on every unmap call.

Anyway this is an idea that just occurred to me the other day so I
still need to do some more research into how easy/difficult
implementing a solution like this would be.

Thanks.

- Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
  2018-02-07 18:02               ` Alexander Duyck
@ 2018-02-07 21:59                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2018-02-07 21:59 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Tiwei Bie, Jason Wang, qemu-devel, virtio-dev, Alex Williamson,
	Paolo Bonzini, stefanha, cunming.liang, dan.daly, jianfeng.tan,
	zhihong.wang, xiao.w.wang

On Wed, Feb 07, 2018 at 10:02:24AM -0800, Alexander Duyck wrote:
> On Wed, Feb 7, 2018 at 8:43 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Sun, Feb 04, 2018 at 01:49:46PM -0800, Alexander Duyck wrote:
> >> On Thu, Jan 25, 2018 at 9:57 PM, Tiwei Bie <tiwei.bie@intel.com> wrote:
> >> > On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote:
> >> >> On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
> >> >> > > The virtual IOMMU isn't supported by the accelerators for now.
> >> >> > > Because vhost-user currently lacks of an efficient way to share
> >> >> > > the IOMMU table in VM to vhost backend. That's why the software
> >> >> > > implementation of virtual IOMMU support in vhost-user backend
> >> >> > > can't support dynamic mapping well.
> >> >> > What exactly is meant by that? vIOMMU seems to work for people,
> >> >> > it's not that fast if you change mappings all the time,
> >> >> > but e.g. dpdk within guest doesn't.
> >> >>
> >> >> Yes, software implementation support dynamic mapping for sure. I think the
> >> >> point is, current vhost-user backend can not program hardware IOMMU. So it
> >> >> can not let hardware accelerator to cowork with software vIOMMU.
> >> >
> >> > Vhost-user backend can program hardware IOMMU. Currently
> >> > vhost-user backend (or more precisely the vDPA driver in
> >> > vhost-user backend) will use the memory table (delivered
> >> > by the VHOST_USER_SET_MEM_TABLE message) to program the
> >> > IOMMU via vfio, and that's why accelerators can use the
> >> > GPA (guest physical address) in descriptors directly.
> >> >
> >> > Theoretically, we can use the IOVA mapping info (delivered
> >> > by the VHOST_USER_IOTLB_MSG message) to program the IOMMU,
> >> > and accelerators will be able to use IOVA. But the problem
> >> > is that in vhost-user QEMU won't push all the IOVA mappings
> >> > to backend directly. Backend needs to ask for those info
> >> > when it meets a new IOVA. Such design and implementation
> >> > won't work well for dynamic mappings anyway and couldn't
> >> > be supported by hardware accelerators.
> >> >
> >> >> I think
> >> >> that's another call to implement the offloaded path inside qemu which has
> >> >> complete support for vIOMMU co-operated VFIO.
> >> >
> >> > Yes, that's exactly what we want. After revisiting the
> >> > last paragraph in the commit message, I found it's not
> >> > really accurate. The practicability of dynamic mappings
> >> > support is a common issue for QEMU. It also exists for
> >> > vfio (hw/vfio in QEMU). If QEMU needs to trap all the
> >> > map/unmap events, the data path performance couldn't be
> >> > high. If we want to thoroughly fix this issue especially
> >> > for vfio (hw/vfio in QEMU), we need to have the offload
> >> > path Jason mentioned in QEMU. And I think accelerators
> >> > could use it too.
> >> >
> >> > Best regards,
> >> > Tiwei Bie
> >>
> >> I wonder if we couldn't look at coming up with an altered security
> >> model for the IOMMU drivers to address some of the performance issues
> >> seen with typical hardware IOMMU?
> >>
> >> In the case of most network devices, we seem to be moving toward a
> >> model where the Rx pages are mapped for an extended period of time and
> >> see a fairly high rate of reuse. As such pages mapped as being
> >> writable or read/write by the device are left mapped for an extended
> >> period of time while Tx pages, which are read only, are often
> >> mapped/unmapped since they are coming from some other location in the
> >> kernel beyond the driver's control.
> >>
> >> If we were to somehow come up with a model where the read-only(Tx)
> >> pages had access to a pre-allocated memory mapped address, and the
> >> read/write(descriptor rings), write-only(Rx) pages were provided with
> >> dynamic addresses we might be able to come up with a solution that
> >> would allow for fairly high network performance while at least
> >> protecting from memory corruption. The only issue it would open up is
> >> that the device would have the ability to read any/all memory on the
> >> guest. I was wondering about doing something like this with the vIOMMU
> >> with VFIO for the Intel NICs this way since an interface like igb,
> >> ixgbe, ixgbevf, i40e, or i40evf would probably show pretty good
> >> performance under such a model and as long as the writable pages were
> >> being tracked by the vIOMMU. It could even allow for live migration
> >> support if the vIOMMU provided the info needed for migratable/dirty
> >> page tracking and we held off on migrating any of the dynamically
> >> mapped pages until after they were either unmapped or an FLR reset the
> >> device.
> >>
> >> Thanks.
> >>
> >> - Alex
> >
> >
> >
> > It might be a good idea to change the iommu instead - how about a
> > variant of strict in intel iommu which forces an IOTLB flush after
> > invalidating a writeable mapping but not a RO mapping?  Not sure what the
> > name would be - relaxed-ro?
> >
> > This is probably easier than poking at the drivers and net core.
> >
> > Keeping the RX pages mapped in the IOMMU was envisioned for XDP.
> > That might be a good place to start.
> 
> My plan is to update the Intel IOMMU driver first since it seems like
> something that shouldn't require too much expertise in the operation
> of the IOMMU to accomplish. My idea was more along the lines of
> something like a "iommu=read-only-pt" or maybe "iommu=pt-ro" where the
> Tx data would be identity mapped, and the descriptor rings and Rx data
> could be in the dynamic mapping setup. The idea is loosely based on
> the existing "iommu=pt" option that is normally used on the host if
> you want to avoid the cost for dynamic mapping. Basically we just need
> to keep an eye on the number of mappings that the device can write to.
> Ideally if we leave the Tx as identity mapped that means we never have
> to actually write to update any mapping which would mean no having to
> jump into the hypervisor to deal with the update.

Just noting that updating page tables does not require jumping
to the hypervisor by itself. Only invalidation requires that.

> The fact that most
> of the drivers already leave the Rx buffers and descriptor rings
> statically mapped should essentially take care of the rest for us.
> What this would become is a version of "iommu=pt" where the user cares
> about preventing the device from possibly corrupting memory, but would
> still like better performance at the cost of the device being able to
> ready and/all memory on the system.
> 
> As far as if it is strict or not I don't know how much we would need
> to worry about that for the migration case. Essentially a deferred
> IOTLB flush would result in us having extra pages marked as dirty and
> non-migratable, but we would need to see how much overhead there is in
> the migration to deal with those extra pages versus the cost of having
> to do an IOTLB flush on every unmap call.
> 
> Anyway this is an idea that just occurred to me the other day so I
> still need to do some more research into how easy/difficult
> implementing a solution like this would be.
> 
> Thanks.
> 
> - Alex

Right. And I think if you do a straight pt, then this is not
a security as much as a robustness feature. I guess both
have a place under the sun.

-- 
MST

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
@ 2018-02-07 21:59                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2018-02-07 21:59 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Tiwei Bie, Jason Wang, qemu-devel, virtio-dev, Alex Williamson,
	Paolo Bonzini, stefanha, cunming.liang, dan.daly, jianfeng.tan,
	zhihong.wang, xiao.w.wang

On Wed, Feb 07, 2018 at 10:02:24AM -0800, Alexander Duyck wrote:
> On Wed, Feb 7, 2018 at 8:43 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Sun, Feb 04, 2018 at 01:49:46PM -0800, Alexander Duyck wrote:
> >> On Thu, Jan 25, 2018 at 9:57 PM, Tiwei Bie <tiwei.bie@intel.com> wrote:
> >> > On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote:
> >> >> On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
> >> >> > > The virtual IOMMU isn't supported by the accelerators for now.
> >> >> > > Because vhost-user currently lacks of an efficient way to share
> >> >> > > the IOMMU table in VM to vhost backend. That's why the software
> >> >> > > implementation of virtual IOMMU support in vhost-user backend
> >> >> > > can't support dynamic mapping well.
> >> >> > What exactly is meant by that? vIOMMU seems to work for people,
> >> >> > it's not that fast if you change mappings all the time,
> >> >> > but e.g. dpdk within guest doesn't.
> >> >>
> >> >> Yes, software implementation support dynamic mapping for sure. I think the
> >> >> point is, current vhost-user backend can not program hardware IOMMU. So it
> >> >> can not let hardware accelerator to cowork with software vIOMMU.
> >> >
> >> > Vhost-user backend can program hardware IOMMU. Currently
> >> > vhost-user backend (or more precisely the vDPA driver in
> >> > vhost-user backend) will use the memory table (delivered
> >> > by the VHOST_USER_SET_MEM_TABLE message) to program the
> >> > IOMMU via vfio, and that's why accelerators can use the
> >> > GPA (guest physical address) in descriptors directly.
> >> >
> >> > Theoretically, we can use the IOVA mapping info (delivered
> >> > by the VHOST_USER_IOTLB_MSG message) to program the IOMMU,
> >> > and accelerators will be able to use IOVA. But the problem
> >> > is that in vhost-user QEMU won't push all the IOVA mappings
> >> > to backend directly. Backend needs to ask for those info
> >> > when it meets a new IOVA. Such design and implementation
> >> > won't work well for dynamic mappings anyway and couldn't
> >> > be supported by hardware accelerators.
> >> >
> >> >> I think
> >> >> that's another call to implement the offloaded path inside qemu which has
> >> >> complete support for vIOMMU co-operated VFIO.
> >> >
> >> > Yes, that's exactly what we want. After revisiting the
> >> > last paragraph in the commit message, I found it's not
> >> > really accurate. The practicability of dynamic mappings
> >> > support is a common issue for QEMU. It also exists for
> >> > vfio (hw/vfio in QEMU). If QEMU needs to trap all the
> >> > map/unmap events, the data path performance couldn't be
> >> > high. If we want to thoroughly fix this issue especially
> >> > for vfio (hw/vfio in QEMU), we need to have the offload
> >> > path Jason mentioned in QEMU. And I think accelerators
> >> > could use it too.
> >> >
> >> > Best regards,
> >> > Tiwei Bie
> >>
> >> I wonder if we couldn't look at coming up with an altered security
> >> model for the IOMMU drivers to address some of the performance issues
> >> seen with typical hardware IOMMU?
> >>
> >> In the case of most network devices, we seem to be moving toward a
> >> model where the Rx pages are mapped for an extended period of time and
> >> see a fairly high rate of reuse. As such pages mapped as being
> >> writable or read/write by the device are left mapped for an extended
> >> period of time while Tx pages, which are read only, are often
> >> mapped/unmapped since they are coming from some other location in the
> >> kernel beyond the driver's control.
> >>
> >> If we were to somehow come up with a model where the read-only(Tx)
> >> pages had access to a pre-allocated memory mapped address, and the
> >> read/write(descriptor rings), write-only(Rx) pages were provided with
> >> dynamic addresses we might be able to come up with a solution that
> >> would allow for fairly high network performance while at least
> >> protecting from memory corruption. The only issue it would open up is
> >> that the device would have the ability to read any/all memory on the
> >> guest. I was wondering about doing something like this with the vIOMMU
> >> with VFIO for the Intel NICs this way since an interface like igb,
> >> ixgbe, ixgbevf, i40e, or i40evf would probably show pretty good
> >> performance under such a model and as long as the writable pages were
> >> being tracked by the vIOMMU. It could even allow for live migration
> >> support if the vIOMMU provided the info needed for migratable/dirty
> >> page tracking and we held off on migrating any of the dynamically
> >> mapped pages until after they were either unmapped or an FLR reset the
> >> device.
> >>
> >> Thanks.
> >>
> >> - Alex
> >
> >
> >
> > It might be a good idea to change the iommu instead - how about a
> > variant of strict in intel iommu which forces an IOTLB flush after
> > invalidating a writeable mapping but not a RO mapping?  Not sure what the
> > name would be - relaxed-ro?
> >
> > This is probably easier than poking at the drivers and net core.
> >
> > Keeping the RX pages mapped in the IOMMU was envisioned for XDP.
> > That might be a good place to start.
> 
> My plan is to update the Intel IOMMU driver first since it seems like
> something that shouldn't require too much expertise in the operation
> of the IOMMU to accomplish. My idea was more along the lines of
> something like a "iommu=read-only-pt" or maybe "iommu=pt-ro" where the
> Tx data would be identity mapped, and the descriptor rings and Rx data
> could be in the dynamic mapping setup. The idea is loosely based on
> the existing "iommu=pt" option that is normally used on the host if
> you want to avoid the cost for dynamic mapping. Basically we just need
> to keep an eye on the number of mappings that the device can write to.
> Ideally if we leave the Tx as identity mapped that means we never have
> to actually write to update any mapping which would mean no having to
> jump into the hypervisor to deal with the update.

Just noting that updating page tables does not require jumping
to the hypervisor by itself. Only invalidation requires that.

> The fact that most
> of the drivers already leave the Rx buffers and descriptor rings
> statically mapped should essentially take care of the rest for us.
> What this would become is a version of "iommu=pt" where the user cares
> about preventing the device from possibly corrupting memory, but would
> still like better performance at the cost of the device being able to
> ready and/all memory on the system.
> 
> As far as if it is strict or not I don't know how much we would need
> to worry about that for the migration case. Essentially a deferred
> IOTLB flush would result in us having extra pages marked as dirty and
> non-migratable, but we would need to see how much overhead there is in
> the migration to deal with those extra pages versus the cost of having
> to do an IOTLB flush on every unmap call.
> 
> Anyway this is an idea that just occurred to me the other day so I
> still need to do some more research into how easy/difficult
> implementing a solution like this would be.
> 
> Thanks.
> 
> - Alex

Right. And I think if you do a straight pt, then this is not
a security as much as a robustness feature. I guess both
have a place under the sun.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2018-02-07 22:00 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-25  4:03 [Qemu-devel] [PATCH v1 0/6] Extend vhost-user to support VFIO based accelerators Tiwei Bie
2018-01-25  4:03 ` [virtio-dev] " Tiwei Bie
2018-01-25  4:03 ` [Qemu-devel] [PATCH v1 1/6] vhost-user: support receiving file descriptors in slave_read Tiwei Bie
2018-01-25  4:03   ` [virtio-dev] " Tiwei Bie
2018-01-25  4:03 ` [Qemu-devel] [PATCH v1 2/6] vhost-user: introduce shared vhost-user state Tiwei Bie
2018-01-25  4:03   ` [virtio-dev] " Tiwei Bie
2018-01-25  4:03 ` [Qemu-devel] [PATCH v1 3/6] virtio: support adding sub-regions for notify region Tiwei Bie
2018-01-25  4:03   ` [virtio-dev] " Tiwei Bie
2018-01-25  4:03 ` [Qemu-devel] [PATCH v1 4/6] vfio: support getting VFIOGroup from groupfd Tiwei Bie
2018-01-25  4:03   ` [virtio-dev] " Tiwei Bie
2018-01-25  4:03 ` [Qemu-devel] [PATCH v1 5/6] vfio: remove DPRINTF() definition from vfio-common.h Tiwei Bie
2018-01-25  4:03   ` [virtio-dev] " Tiwei Bie
2018-01-25  4:03 ` [Qemu-devel] [PATCH v1 6/6] vhost-user: add VFIO based accelerators support Tiwei Bie
2018-01-25  4:03   ` [virtio-dev] " Tiwei Bie
2018-01-25 23:59   ` [Qemu-devel] " Michael S. Tsirkin
2018-01-25 23:59     ` [virtio-dev] " Michael S. Tsirkin
2018-01-26  3:41     ` [Qemu-devel] " Jason Wang
2018-01-26  3:41       ` Jason Wang
2018-01-26  5:57       ` [Qemu-devel] " Tiwei Bie
2018-01-26  5:57         ` Tiwei Bie
2018-02-04 21:49         ` [Qemu-devel] " Alexander Duyck
2018-02-04 21:49           ` Alexander Duyck
2018-02-07 16:43           ` [Qemu-devel] " Michael S. Tsirkin
2018-02-07 16:43             ` Michael S. Tsirkin
2018-02-07 18:02             ` [Qemu-devel] " Alexander Duyck
2018-02-07 18:02               ` Alexander Duyck
2018-02-07 21:59               ` [Qemu-devel] " Michael S. Tsirkin
2018-02-07 21:59                 ` Michael S. Tsirkin
2018-02-05 17:47   ` [Qemu-devel] [virtio-dev] " Paolo Bonzini
2018-02-05 17:47     ` Paolo Bonzini
2018-02-06  4:40     ` [Qemu-devel] " Tiwei Bie
2018-02-06  4:40       ` Tiwei Bie
2018-02-07 15:23       ` [Qemu-devel] " Paolo Bonzini
2018-02-07 15:23         ` Paolo Bonzini
2018-01-25 14:22 ` [Qemu-devel] [PATCH v1 0/6] Extend vhost-user to support VFIO based accelerators Stefan Hajnoczi
2018-01-25 16:10   ` Liang, Cunming
2018-01-25 16:10     ` [virtio-dev] " Liang, Cunming
2018-01-26  7:17     ` Stefan Hajnoczi
2018-01-26  6:28   ` [virtio-dev] " Tiwei Bie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.