All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators
@ 2018-03-19  7:15 ` Tiwei Bie
  0 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-19  7:15 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

This patch set does some small extensions to vhost-user protocol
to support VFIO based accelerators, and makes it possible to get
the similar performance of VFIO based PCI passthru while keeping
the virtio device emulation in QEMU.

How does accelerator accelerate vhost (data path)
=================================================

Any virtio ring compatible devices potentially can be used as the
vhost data path accelerators. We can setup the accelerator based
on the informations (e.g. memory table, features, ring info, etc)
available on the vhost backend. And accelerator will be able to use
the virtio ring provided by the virtio driver in the VM directly.
So the virtio driver in the VM can exchange e.g. network packets
with the accelerator directly via the virtio ring. That is to say,
we will be able to use the accelerator to accelerate the vhost
data path. We call it vDPA: vhost Data Path Acceleration.

Notice: Although the accelerator can talk with the virtio driver
in the VM via the virtio ring directly. The control path events
(e.g. device start/stop) in the VM will still be trapped and handled
by QEMU, and QEMU will deliver such events to the vhost backend
via standard vhost protocol.

Below link is an example showing how to setup a such environment
via nested VM. In this case, the virtio device in the outer VM is
the accelerator. It will be used to accelerate the virtio device
in the inner VM. In reality, we could use virtio ring compatible
hardware device as the accelerators.

http://dpdk.org/ml/archives/dev/2017-December/085044.html

In above example, it doesn't require any changes to QEMU, but
it has lower performance compared with the traditional VFIO
based PCI passthru. And that's the problem this patch set wants
to solve.

The performance issue of vDPA/vhost-user and solutions
======================================================

For vhost-user backend, the critical issue in vDPA is that the
data path performance is relatively low and some host threads are
needed for the data path, because some necessary mechanisms are
missing to support:

1) guest driver notifies the device directly;
2) device interrupts the guest directly;

So this patch set does some small extensions to the vhost-user
protocol to make both of them possible. It leverages the same
mechanisms (e.g. EPT and Posted-Interrupt on Intel platform) as
the PCI passthru.

A new protocol feature bit is added to negotiate the accelerator
feature support. Two new slave message types are added to control
the notify region and queue interrupt passthru for each queue.
>From the view of vhost-user protocol design, it's very flexible.
The passthru can be enabled/disabled for each queue individually,
and it's possible to accelerate each queue by different devices.
More design and implementation details can be found from the last
patch.

Difference between vDPA and PCI passthru
========================================

The key difference between PCI passthru and vDPA is that, in vDPA
only the data path of the device (e.g. DMA ring, notify region and
queue interrupt) is pass-throughed to the VM, the device control
path (e.g. PCI configuration space and MMIO regions) is still
defined and emulated by QEMU.

The benefits of keeping virtio device emulation in QEMU compared
with virtio device PCI passthru include (but not limit to):

- consistent device interface for guest OS in the VM;
- max flexibility on the hardware (i.e. the accelerators) design;
- leveraging the existing virtio live-migration framework;

Why extend vhost-user for vDPA
==============================

We have already implemented various virtual switches (e.g. OVS-DPDK)
based on vhost-user for VMs in the Cloud. They are purely software
running on CPU cores. When we have accelerators for such NFVi applications,
it's ideal if the applications could keep using the original interface
(i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
when and how to switch between CPU and accelerators within the interface.
And the switching (i.e. switch between CPU and accelerators) can be done
flexibly and quickly inside the applications.

More details about this can be found from the Cunming's discussions on
the RFC patch set.

Update notes
============

IOMMU feature bit check is removed in this version, because:

The IOMMU feature is negotiable, when an accelerator is used and
it doesn't support virtual IOMMU, its driver just won't provide
this feature bit when vhost library querying its features. And if
it supports the virtual IOMMU, its driver can provide this feature
bit. It's not reasonable to add this limitation in this patch set.

The previous links:
RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html

v1 -> v2:
- Add some explanations about why extend vhost-user in commit log (Paolo);
- Bug fix in slave_read() according to Stefan's fix in DPDK;
- Remove IOMMU feature check and related commit log;
- Some minor refinements;
- Rebase to the latest QEMU;

RFC -> v1:
- Add some details about how vDPA works in cover letter (Alexey)
- Add some details about the OVS offload use-case in cover letter (Jason)
- Move PCI specific stuffs out of vhost-user (Jason)
- Handle the virtual IOMMU case (Jason)
- Move VFIO group management code into vfio/common.c (Alex)
- Various refinements;
(approximately sorted by comment posting time)

Tiwei Bie (6):
  vhost-user: support receiving file descriptors in slave_read
  vhost-user: introduce shared vhost-user state
  virtio: support adding sub-regions for notify region
  vfio: support getting VFIOGroup from groupfd
  vfio: remove DPRINTF() definition from vfio-common.h
  vhost-user: add VFIO based accelerators support

 Makefile.target                 |   4 +
 docs/interop/vhost-user.txt     |  57 +++++++++
 hw/scsi/vhost-user-scsi.c       |   6 +-
 hw/vfio/common.c                |  97 +++++++++++++++-
 hw/virtio/vhost-user.c          | 248 +++++++++++++++++++++++++++++++++++++++-
 hw/virtio/virtio-pci.c          |  48 ++++++++
 hw/virtio/virtio-pci.h          |   5 +
 hw/virtio/virtio.c              |  39 +++++++
 include/hw/vfio/vfio-common.h   |  11 +-
 include/hw/virtio/vhost-user.h  |  34 ++++++
 include/hw/virtio/virtio-scsi.h |   6 +-
 include/hw/virtio/virtio.h      |   5 +
 include/qemu/osdep.h            |   1 +
 net/vhost-user.c                |  30 ++---
 scripts/create_config           |   3 +
 15 files changed, 561 insertions(+), 33 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user.h

-- 
2.11.0

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [virtio-dev] [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators
@ 2018-03-19  7:15 ` Tiwei Bie
  0 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-19  7:15 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

This patch set does some small extensions to vhost-user protocol
to support VFIO based accelerators, and makes it possible to get
the similar performance of VFIO based PCI passthru while keeping
the virtio device emulation in QEMU.

How does accelerator accelerate vhost (data path)
=================================================

Any virtio ring compatible devices potentially can be used as the
vhost data path accelerators. We can setup the accelerator based
on the informations (e.g. memory table, features, ring info, etc)
available on the vhost backend. And accelerator will be able to use
the virtio ring provided by the virtio driver in the VM directly.
So the virtio driver in the VM can exchange e.g. network packets
with the accelerator directly via the virtio ring. That is to say,
we will be able to use the accelerator to accelerate the vhost
data path. We call it vDPA: vhost Data Path Acceleration.

Notice: Although the accelerator can talk with the virtio driver
in the VM via the virtio ring directly. The control path events
(e.g. device start/stop) in the VM will still be trapped and handled
by QEMU, and QEMU will deliver such events to the vhost backend
via standard vhost protocol.

Below link is an example showing how to setup a such environment
via nested VM. In this case, the virtio device in the outer VM is
the accelerator. It will be used to accelerate the virtio device
in the inner VM. In reality, we could use virtio ring compatible
hardware device as the accelerators.

http://dpdk.org/ml/archives/dev/2017-December/085044.html

In above example, it doesn't require any changes to QEMU, but
it has lower performance compared with the traditional VFIO
based PCI passthru. And that's the problem this patch set wants
to solve.

The performance issue of vDPA/vhost-user and solutions
======================================================

For vhost-user backend, the critical issue in vDPA is that the
data path performance is relatively low and some host threads are
needed for the data path, because some necessary mechanisms are
missing to support:

1) guest driver notifies the device directly;
2) device interrupts the guest directly;

So this patch set does some small extensions to the vhost-user
protocol to make both of them possible. It leverages the same
mechanisms (e.g. EPT and Posted-Interrupt on Intel platform) as
the PCI passthru.

A new protocol feature bit is added to negotiate the accelerator
feature support. Two new slave message types are added to control
the notify region and queue interrupt passthru for each queue.
From the view of vhost-user protocol design, it's very flexible.
The passthru can be enabled/disabled for each queue individually,
and it's possible to accelerate each queue by different devices.
More design and implementation details can be found from the last
patch.

Difference between vDPA and PCI passthru
========================================

The key difference between PCI passthru and vDPA is that, in vDPA
only the data path of the device (e.g. DMA ring, notify region and
queue interrupt) is pass-throughed to the VM, the device control
path (e.g. PCI configuration space and MMIO regions) is still
defined and emulated by QEMU.

The benefits of keeping virtio device emulation in QEMU compared
with virtio device PCI passthru include (but not limit to):

- consistent device interface for guest OS in the VM;
- max flexibility on the hardware (i.e. the accelerators) design;
- leveraging the existing virtio live-migration framework;

Why extend vhost-user for vDPA
==============================

We have already implemented various virtual switches (e.g. OVS-DPDK)
based on vhost-user for VMs in the Cloud. They are purely software
running on CPU cores. When we have accelerators for such NFVi applications,
it's ideal if the applications could keep using the original interface
(i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
when and how to switch between CPU and accelerators within the interface.
And the switching (i.e. switch between CPU and accelerators) can be done
flexibly and quickly inside the applications.

More details about this can be found from the Cunming's discussions on
the RFC patch set.

Update notes
============

IOMMU feature bit check is removed in this version, because:

The IOMMU feature is negotiable, when an accelerator is used and
it doesn't support virtual IOMMU, its driver just won't provide
this feature bit when vhost library querying its features. And if
it supports the virtual IOMMU, its driver can provide this feature
bit. It's not reasonable to add this limitation in this patch set.

The previous links:
RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html

v1 -> v2:
- Add some explanations about why extend vhost-user in commit log (Paolo);
- Bug fix in slave_read() according to Stefan's fix in DPDK;
- Remove IOMMU feature check and related commit log;
- Some minor refinements;
- Rebase to the latest QEMU;

RFC -> v1:
- Add some details about how vDPA works in cover letter (Alexey)
- Add some details about the OVS offload use-case in cover letter (Jason)
- Move PCI specific stuffs out of vhost-user (Jason)
- Handle the virtual IOMMU case (Jason)
- Move VFIO group management code into vfio/common.c (Alex)
- Various refinements;
(approximately sorted by comment posting time)

Tiwei Bie (6):
  vhost-user: support receiving file descriptors in slave_read
  vhost-user: introduce shared vhost-user state
  virtio: support adding sub-regions for notify region
  vfio: support getting VFIOGroup from groupfd
  vfio: remove DPRINTF() definition from vfio-common.h
  vhost-user: add VFIO based accelerators support

 Makefile.target                 |   4 +
 docs/interop/vhost-user.txt     |  57 +++++++++
 hw/scsi/vhost-user-scsi.c       |   6 +-
 hw/vfio/common.c                |  97 +++++++++++++++-
 hw/virtio/vhost-user.c          | 248 +++++++++++++++++++++++++++++++++++++++-
 hw/virtio/virtio-pci.c          |  48 ++++++++
 hw/virtio/virtio-pci.h          |   5 +
 hw/virtio/virtio.c              |  39 +++++++
 include/hw/vfio/vfio-common.h   |  11 +-
 include/hw/virtio/vhost-user.h  |  34 ++++++
 include/hw/virtio/virtio-scsi.h |   6 +-
 include/hw/virtio/virtio.h      |   5 +
 include/qemu/osdep.h            |   1 +
 net/vhost-user.c                |  30 ++---
 scripts/create_config           |   3 +
 15 files changed, 561 insertions(+), 33 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user.h

-- 
2.11.0


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 1/6] vhost-user: support receiving file descriptors in slave_read
  2018-03-19  7:15 ` [virtio-dev] " Tiwei Bie
@ 2018-03-19  7:15   ` Tiwei Bie
  -1 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-19  7:15 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/virtio/vhost-user.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 41ff5cff41..1ad6caa6a3 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -674,14 +674,44 @@ static void slave_read(void *opaque)
     VhostUserHeader hdr = { 0, };
     VhostUserPayload payload = { 0, };
     int size, ret = 0;
+    struct iovec iov;
+    struct msghdr msgh;
+    int fd = -1;
+    size_t fdsize = sizeof(fd);
+    char control[CMSG_SPACE(fdsize)];
+    struct cmsghdr *cmsg;
+
+    memset(&msgh, 0, sizeof(msgh));
+    msgh.msg_iov = &iov;
+    msgh.msg_iovlen = 1;
+    msgh.msg_control = control;
+    msgh.msg_controllen = sizeof(control);
 
     /* Read header */
-    size = read(u->slave_fd, &hdr, VHOST_USER_HDR_SIZE);
+    iov.iov_base = &hdr;
+    iov.iov_len = VHOST_USER_HDR_SIZE;
+
+    size = recvmsg(u->slave_fd, &msgh, 0);
     if (size != VHOST_USER_HDR_SIZE) {
         error_report("Failed to read from slave.");
         goto err;
     }
 
+    if (msgh.msg_flags & MSG_CTRUNC) {
+        error_report("Truncated message.");
+        goto err;
+    }
+
+    for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
+         cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
+            if (cmsg->cmsg_level == SOL_SOCKET &&
+                cmsg->cmsg_type == SCM_RIGHTS) {
+                    fdsize = cmsg->cmsg_len - CMSG_LEN(0);
+                    memcpy(&fd, CMSG_DATA(cmsg), fdsize);
+                    break;
+            }
+    }
+
     if (hdr.size > VHOST_USER_PAYLOAD_SIZE) {
         error_report("Failed to read msg header."
                 " Size %d exceeds the maximum %zu.", hdr.size,
@@ -705,9 +735,15 @@ static void slave_read(void *opaque)
         break;
     default:
         error_report("Received unexpected msg type.");
+        if (fd != -1) {
+            close(fd);
+        }
         ret = -EINVAL;
     }
 
+    /* Message handlers need to make sure that fd will be consumed. */
+    fd = -1;
+
     /*
      * REPLY_ACK feature handling. Other reply types has to be managed
      * directly in their request handlers.
@@ -740,6 +776,9 @@ err:
     qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
     close(u->slave_fd);
     u->slave_fd = -1;
+    if (fd != -1) {
+        close(fd);
+    }
     return;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [virtio-dev] [PATCH v2 1/6] vhost-user: support receiving file descriptors in slave_read
@ 2018-03-19  7:15   ` Tiwei Bie
  0 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-19  7:15 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/virtio/vhost-user.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 41ff5cff41..1ad6caa6a3 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -674,14 +674,44 @@ static void slave_read(void *opaque)
     VhostUserHeader hdr = { 0, };
     VhostUserPayload payload = { 0, };
     int size, ret = 0;
+    struct iovec iov;
+    struct msghdr msgh;
+    int fd = -1;
+    size_t fdsize = sizeof(fd);
+    char control[CMSG_SPACE(fdsize)];
+    struct cmsghdr *cmsg;
+
+    memset(&msgh, 0, sizeof(msgh));
+    msgh.msg_iov = &iov;
+    msgh.msg_iovlen = 1;
+    msgh.msg_control = control;
+    msgh.msg_controllen = sizeof(control);
 
     /* Read header */
-    size = read(u->slave_fd, &hdr, VHOST_USER_HDR_SIZE);
+    iov.iov_base = &hdr;
+    iov.iov_len = VHOST_USER_HDR_SIZE;
+
+    size = recvmsg(u->slave_fd, &msgh, 0);
     if (size != VHOST_USER_HDR_SIZE) {
         error_report("Failed to read from slave.");
         goto err;
     }
 
+    if (msgh.msg_flags & MSG_CTRUNC) {
+        error_report("Truncated message.");
+        goto err;
+    }
+
+    for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
+         cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
+            if (cmsg->cmsg_level == SOL_SOCKET &&
+                cmsg->cmsg_type == SCM_RIGHTS) {
+                    fdsize = cmsg->cmsg_len - CMSG_LEN(0);
+                    memcpy(&fd, CMSG_DATA(cmsg), fdsize);
+                    break;
+            }
+    }
+
     if (hdr.size > VHOST_USER_PAYLOAD_SIZE) {
         error_report("Failed to read msg header."
                 " Size %d exceeds the maximum %zu.", hdr.size,
@@ -705,9 +735,15 @@ static void slave_read(void *opaque)
         break;
     default:
         error_report("Received unexpected msg type.");
+        if (fd != -1) {
+            close(fd);
+        }
         ret = -EINVAL;
     }
 
+    /* Message handlers need to make sure that fd will be consumed. */
+    fd = -1;
+
     /*
      * REPLY_ACK feature handling. Other reply types has to be managed
      * directly in their request handlers.
@@ -740,6 +776,9 @@ err:
     qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
     close(u->slave_fd);
     u->slave_fd = -1;
+    if (fd != -1) {
+        close(fd);
+    }
     return;
 }
 
-- 
2.11.0


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 2/6] vhost-user: introduce shared vhost-user state
  2018-03-19  7:15 ` [virtio-dev] " Tiwei Bie
@ 2018-03-19  7:15   ` Tiwei Bie
  -1 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-19  7:15 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

When multi-queue is enabled for virtio-net, each virtio
queue pair will have a vhost_dev, and the only thing they
share currently is the chardev. This patch introduces a
vhost-user state structure which will be shared by all
virtio queue pairs of the same virtio device.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/scsi/vhost-user-scsi.c       |  6 +++---
 hw/virtio/vhost-user.c          |  9 +++++----
 include/hw/virtio/vhost-user.h  | 17 +++++++++++++++++
 include/hw/virtio/virtio-scsi.h |  6 +++++-
 net/vhost-user.c                | 30 ++++++++++++++++--------------
 5 files changed, 46 insertions(+), 22 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user.h

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index 9389ed48e0..64972bdd7d 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -72,7 +72,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
     Error *err = NULL;
     int ret;
 
-    if (!vs->conf.chardev.chr) {
+    if (!vs->conf.vhost_user.chr.chr) {
         error_setg(errp, "vhost-user-scsi: missing chardev");
         return;
     }
@@ -90,7 +90,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
     vsc->dev.vq_index = 0;
     vsc->dev.backend_features = 0;
 
-    ret = vhost_dev_init(&vsc->dev, (void *)&vs->conf.chardev,
+    ret = vhost_dev_init(&vsc->dev, (void *)&vs->conf.vhost_user,
                          VHOST_BACKEND_TYPE_USER, 0);
     if (ret < 0) {
         error_setg(errp, "vhost-user-scsi: vhost initialization failed: %s",
@@ -131,7 +131,7 @@ static uint64_t vhost_user_scsi_get_features(VirtIODevice *vdev,
 }
 
 static Property vhost_user_scsi_properties[] = {
-    DEFINE_PROP_CHR("chardev", VirtIOSCSICommon, conf.chardev),
+    DEFINE_PROP_CHR("chardev", VirtIOSCSICommon, conf.vhost_user.chr),
     DEFINE_PROP_UINT32("boot_tpgt", VirtIOSCSICommon, conf.boot_tpgt, 0),
     DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues, 1),
     DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSICommon, conf.virtqueue_size,
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 1ad6caa6a3..b228994ffd 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -12,6 +12,7 @@
 #include "qapi/error.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-backend.h"
+#include "hw/virtio/vhost-user.h"
 #include "hw/virtio/virtio-net.h"
 #include "chardev/char-fe.h"
 #include "sysemu/kvm.h"
@@ -164,7 +165,7 @@ static VhostUserMsg m __attribute__ ((unused));
 #define VHOST_USER_VERSION    (0x1)
 
 struct vhost_user {
-    CharBackend *chr;
+    VhostUser *shared;
     int slave_fd;
 };
 
@@ -176,7 +177,7 @@ static bool ioeventfd_enabled(void)
 static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
 {
     struct vhost_user *u = dev->opaque;
-    CharBackend *chr = u->chr;
+    CharBackend *chr = &u->shared->chr;
     uint8_t *p = (uint8_t *) msg;
     int r, size = VHOST_USER_HDR_SIZE;
 
@@ -262,7 +263,7 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
                             int *fds, int fd_num)
 {
     struct vhost_user *u = dev->opaque;
-    CharBackend *chr = u->chr;
+    CharBackend *chr = &u->shared->chr;
     int ret, size = VHOST_USER_HDR_SIZE + msg->hdr.size;
 
     /*
@@ -839,7 +840,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
     u = g_new0(struct vhost_user, 1);
-    u->chr = opaque;
+    u->shared = opaque;
     u->slave_fd = -1;
     dev->opaque = u;
 
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
new file mode 100644
index 0000000000..4f5a1477d1
--- /dev/null
+++ b/include/hw/virtio/vhost-user.h
@@ -0,0 +1,17 @@
+/*
+ * Copyright (c) 2017-2018 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_VIRTIO_VHOST_USER_H
+#define HW_VIRTIO_VHOST_USER_H
+
+#include "chardev/char-fe.h"
+
+typedef struct VhostUser {
+    CharBackend chr;
+} VhostUser;
+
+#endif
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 4c0bcdb788..885c3e84b5 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -19,6 +19,7 @@
 #define VIRTIO_SCSI_SENSE_SIZE 0
 #include "standard-headers/linux/virtio_scsi.h"
 #include "hw/virtio/virtio.h"
+#include "hw/virtio/vhost-user.h"
 #include "hw/pci/pci.h"
 #include "hw/scsi/scsi.h"
 #include "chardev/char-fe.h"
@@ -54,7 +55,10 @@ struct VirtIOSCSIConf {
     char *vhostfd;
     char *wwpn;
 #endif
-    CharBackend chardev;
+    union {
+        VhostUser vhost_user;
+        CharBackend chardev;
+    };
     uint32_t boot_tpgt;
     IOThread *iothread;
 };
diff --git a/net/vhost-user.c b/net/vhost-user.c
index e0f16c895b..49ee72bd42 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -12,6 +12,7 @@
 #include "clients.h"
 #include "net/vhost_net.h"
 #include "net/vhost-user.h"
+#include "hw/virtio/vhost-user.h"
 #include "chardev/char-fe.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-net.h"
@@ -22,7 +23,7 @@
 
 typedef struct VhostUserState {
     NetClientState nc;
-    CharBackend chr; /* only queue index 0 */
+    VhostUser vhost_user; /* only queue index 0 */
     VHostNetState *vhost_net;
     guint watch;
     uint64_t acked_features;
@@ -64,7 +65,7 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
     }
 }
 
-static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
+static int vhost_user_start(int queues, NetClientState *ncs[], void *be)
 {
     VhostNetOptions options;
     struct vhost_net *net = NULL;
@@ -158,7 +159,7 @@ static void vhost_user_cleanup(NetClientState *nc)
             g_source_remove(s->watch);
             s->watch = 0;
         }
-        qemu_chr_fe_deinit(&s->chr, true);
+        qemu_chr_fe_deinit(&s->vhost_user.chr, true);
     }
 
     qemu_purge_queued_packets(nc);
@@ -192,7 +193,7 @@ static gboolean net_vhost_user_watch(GIOChannel *chan, GIOCondition cond,
 {
     VhostUserState *s = opaque;
 
-    qemu_chr_fe_disconnect(&s->chr);
+    qemu_chr_fe_disconnect(&s->vhost_user.chr);
 
     return TRUE;
 }
@@ -217,7 +218,8 @@ static void chr_closed_bh(void *opaque)
     qmp_set_link(name, false, &err);
     vhost_user_stop(queues, ncs);
 
-    qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, net_vhost_user_event,
+    qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL,
+                             net_vhost_user_event,
                              NULL, opaque, NULL, true);
 
     if (err) {
@@ -240,15 +242,15 @@ static void net_vhost_user_event(void *opaque, int event)
     assert(queues < MAX_QUEUE_NUM);
 
     s = DO_UPCAST(VhostUserState, nc, ncs[0]);
-    chr = qemu_chr_fe_get_driver(&s->chr);
+    chr = qemu_chr_fe_get_driver(&s->vhost_user.chr);
     trace_vhost_user_event(chr->label, event);
     switch (event) {
     case CHR_EVENT_OPENED:
-        if (vhost_user_start(queues, ncs, &s->chr) < 0) {
-            qemu_chr_fe_disconnect(&s->chr);
+        if (vhost_user_start(queues, ncs, &s->vhost_user) < 0) {
+            qemu_chr_fe_disconnect(&s->vhost_user.chr);
             return;
         }
-        s->watch = qemu_chr_fe_add_watch(&s->chr, G_IO_HUP,
+        s->watch = qemu_chr_fe_add_watch(&s->vhost_user.chr, G_IO_HUP,
                                          net_vhost_user_watch, s);
         qmp_set_link(name, true, &err);
         s->started = true;
@@ -264,8 +266,8 @@ static void net_vhost_user_event(void *opaque, int event)
 
             g_source_remove(s->watch);
             s->watch = 0;
-            qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
-                                     NULL, NULL, false);
+            qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL, NULL,
+                                     NULL, NULL, NULL, false);
 
             aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
         }
@@ -297,7 +299,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
         if (!nc0) {
             nc0 = nc;
             s = DO_UPCAST(VhostUserState, nc, nc);
-            if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
+            if (!qemu_chr_fe_init(&s->vhost_user.chr, chr, &err)) {
                 error_report_err(err);
                 return -1;
             }
@@ -307,11 +309,11 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
 
     s = DO_UPCAST(VhostUserState, nc, nc0);
     do {
-        if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
+        if (qemu_chr_fe_wait_connected(&s->vhost_user.chr, &err) < 0) {
             error_report_err(err);
             return -1;
         }
-        qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
+        qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL,
                                  net_vhost_user_event, NULL, nc0->name, NULL,
                                  true);
     } while (!s->started);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [virtio-dev] [PATCH v2 2/6] vhost-user: introduce shared vhost-user state
@ 2018-03-19  7:15   ` Tiwei Bie
  0 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-19  7:15 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

When multi-queue is enabled for virtio-net, each virtio
queue pair will have a vhost_dev, and the only thing they
share currently is the chardev. This patch introduces a
vhost-user state structure which will be shared by all
virtio queue pairs of the same virtio device.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/scsi/vhost-user-scsi.c       |  6 +++---
 hw/virtio/vhost-user.c          |  9 +++++----
 include/hw/virtio/vhost-user.h  | 17 +++++++++++++++++
 include/hw/virtio/virtio-scsi.h |  6 +++++-
 net/vhost-user.c                | 30 ++++++++++++++++--------------
 5 files changed, 46 insertions(+), 22 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user.h

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index 9389ed48e0..64972bdd7d 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -72,7 +72,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
     Error *err = NULL;
     int ret;
 
-    if (!vs->conf.chardev.chr) {
+    if (!vs->conf.vhost_user.chr.chr) {
         error_setg(errp, "vhost-user-scsi: missing chardev");
         return;
     }
@@ -90,7 +90,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
     vsc->dev.vq_index = 0;
     vsc->dev.backend_features = 0;
 
-    ret = vhost_dev_init(&vsc->dev, (void *)&vs->conf.chardev,
+    ret = vhost_dev_init(&vsc->dev, (void *)&vs->conf.vhost_user,
                          VHOST_BACKEND_TYPE_USER, 0);
     if (ret < 0) {
         error_setg(errp, "vhost-user-scsi: vhost initialization failed: %s",
@@ -131,7 +131,7 @@ static uint64_t vhost_user_scsi_get_features(VirtIODevice *vdev,
 }
 
 static Property vhost_user_scsi_properties[] = {
-    DEFINE_PROP_CHR("chardev", VirtIOSCSICommon, conf.chardev),
+    DEFINE_PROP_CHR("chardev", VirtIOSCSICommon, conf.vhost_user.chr),
     DEFINE_PROP_UINT32("boot_tpgt", VirtIOSCSICommon, conf.boot_tpgt, 0),
     DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues, 1),
     DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSICommon, conf.virtqueue_size,
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 1ad6caa6a3..b228994ffd 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -12,6 +12,7 @@
 #include "qapi/error.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-backend.h"
+#include "hw/virtio/vhost-user.h"
 #include "hw/virtio/virtio-net.h"
 #include "chardev/char-fe.h"
 #include "sysemu/kvm.h"
@@ -164,7 +165,7 @@ static VhostUserMsg m __attribute__ ((unused));
 #define VHOST_USER_VERSION    (0x1)
 
 struct vhost_user {
-    CharBackend *chr;
+    VhostUser *shared;
     int slave_fd;
 };
 
@@ -176,7 +177,7 @@ static bool ioeventfd_enabled(void)
 static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
 {
     struct vhost_user *u = dev->opaque;
-    CharBackend *chr = u->chr;
+    CharBackend *chr = &u->shared->chr;
     uint8_t *p = (uint8_t *) msg;
     int r, size = VHOST_USER_HDR_SIZE;
 
@@ -262,7 +263,7 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
                             int *fds, int fd_num)
 {
     struct vhost_user *u = dev->opaque;
-    CharBackend *chr = u->chr;
+    CharBackend *chr = &u->shared->chr;
     int ret, size = VHOST_USER_HDR_SIZE + msg->hdr.size;
 
     /*
@@ -839,7 +840,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
     u = g_new0(struct vhost_user, 1);
-    u->chr = opaque;
+    u->shared = opaque;
     u->slave_fd = -1;
     dev->opaque = u;
 
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
new file mode 100644
index 0000000000..4f5a1477d1
--- /dev/null
+++ b/include/hw/virtio/vhost-user.h
@@ -0,0 +1,17 @@
+/*
+ * Copyright (c) 2017-2018 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_VIRTIO_VHOST_USER_H
+#define HW_VIRTIO_VHOST_USER_H
+
+#include "chardev/char-fe.h"
+
+typedef struct VhostUser {
+    CharBackend chr;
+} VhostUser;
+
+#endif
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 4c0bcdb788..885c3e84b5 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -19,6 +19,7 @@
 #define VIRTIO_SCSI_SENSE_SIZE 0
 #include "standard-headers/linux/virtio_scsi.h"
 #include "hw/virtio/virtio.h"
+#include "hw/virtio/vhost-user.h"
 #include "hw/pci/pci.h"
 #include "hw/scsi/scsi.h"
 #include "chardev/char-fe.h"
@@ -54,7 +55,10 @@ struct VirtIOSCSIConf {
     char *vhostfd;
     char *wwpn;
 #endif
-    CharBackend chardev;
+    union {
+        VhostUser vhost_user;
+        CharBackend chardev;
+    };
     uint32_t boot_tpgt;
     IOThread *iothread;
 };
diff --git a/net/vhost-user.c b/net/vhost-user.c
index e0f16c895b..49ee72bd42 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -12,6 +12,7 @@
 #include "clients.h"
 #include "net/vhost_net.h"
 #include "net/vhost-user.h"
+#include "hw/virtio/vhost-user.h"
 #include "chardev/char-fe.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-net.h"
@@ -22,7 +23,7 @@
 
 typedef struct VhostUserState {
     NetClientState nc;
-    CharBackend chr; /* only queue index 0 */
+    VhostUser vhost_user; /* only queue index 0 */
     VHostNetState *vhost_net;
     guint watch;
     uint64_t acked_features;
@@ -64,7 +65,7 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
     }
 }
 
-static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
+static int vhost_user_start(int queues, NetClientState *ncs[], void *be)
 {
     VhostNetOptions options;
     struct vhost_net *net = NULL;
@@ -158,7 +159,7 @@ static void vhost_user_cleanup(NetClientState *nc)
             g_source_remove(s->watch);
             s->watch = 0;
         }
-        qemu_chr_fe_deinit(&s->chr, true);
+        qemu_chr_fe_deinit(&s->vhost_user.chr, true);
     }
 
     qemu_purge_queued_packets(nc);
@@ -192,7 +193,7 @@ static gboolean net_vhost_user_watch(GIOChannel *chan, GIOCondition cond,
 {
     VhostUserState *s = opaque;
 
-    qemu_chr_fe_disconnect(&s->chr);
+    qemu_chr_fe_disconnect(&s->vhost_user.chr);
 
     return TRUE;
 }
@@ -217,7 +218,8 @@ static void chr_closed_bh(void *opaque)
     qmp_set_link(name, false, &err);
     vhost_user_stop(queues, ncs);
 
-    qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, net_vhost_user_event,
+    qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL,
+                             net_vhost_user_event,
                              NULL, opaque, NULL, true);
 
     if (err) {
@@ -240,15 +242,15 @@ static void net_vhost_user_event(void *opaque, int event)
     assert(queues < MAX_QUEUE_NUM);
 
     s = DO_UPCAST(VhostUserState, nc, ncs[0]);
-    chr = qemu_chr_fe_get_driver(&s->chr);
+    chr = qemu_chr_fe_get_driver(&s->vhost_user.chr);
     trace_vhost_user_event(chr->label, event);
     switch (event) {
     case CHR_EVENT_OPENED:
-        if (vhost_user_start(queues, ncs, &s->chr) < 0) {
-            qemu_chr_fe_disconnect(&s->chr);
+        if (vhost_user_start(queues, ncs, &s->vhost_user) < 0) {
+            qemu_chr_fe_disconnect(&s->vhost_user.chr);
             return;
         }
-        s->watch = qemu_chr_fe_add_watch(&s->chr, G_IO_HUP,
+        s->watch = qemu_chr_fe_add_watch(&s->vhost_user.chr, G_IO_HUP,
                                          net_vhost_user_watch, s);
         qmp_set_link(name, true, &err);
         s->started = true;
@@ -264,8 +266,8 @@ static void net_vhost_user_event(void *opaque, int event)
 
             g_source_remove(s->watch);
             s->watch = 0;
-            qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
-                                     NULL, NULL, false);
+            qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL, NULL,
+                                     NULL, NULL, NULL, false);
 
             aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
         }
@@ -297,7 +299,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
         if (!nc0) {
             nc0 = nc;
             s = DO_UPCAST(VhostUserState, nc, nc);
-            if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
+            if (!qemu_chr_fe_init(&s->vhost_user.chr, chr, &err)) {
                 error_report_err(err);
                 return -1;
             }
@@ -307,11 +309,11 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
 
     s = DO_UPCAST(VhostUserState, nc, nc0);
     do {
-        if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
+        if (qemu_chr_fe_wait_connected(&s->vhost_user.chr, &err) < 0) {
             error_report_err(err);
             return -1;
         }
-        qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
+        qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL,
                                  net_vhost_user_event, NULL, nc0->name, NULL,
                                  true);
     } while (!s->started);
-- 
2.11.0


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 3/6] virtio: support adding sub-regions for notify region
  2018-03-19  7:15 ` [virtio-dev] " Tiwei Bie
@ 2018-03-19  7:15   ` Tiwei Bie
  -1 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-19  7:15 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

Provide APIs to support querying whether the page-per-vq
is enabled and adding sub-regions for notify region.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 Makefile.target            |  4 ++++
 hw/virtio/virtio-pci.c     | 48 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/virtio/virtio-pci.h     |  5 +++++
 hw/virtio/virtio.c         | 39 +++++++++++++++++++++++++++++++++++++
 include/hw/virtio/virtio.h |  5 +++++
 include/qemu/osdep.h       |  1 +
 scripts/create_config      |  3 +++
 7 files changed, 105 insertions(+)

diff --git a/Makefile.target b/Makefile.target
index 6549481096..b2cf618dc9 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -39,6 +39,9 @@ STPFILES=
 config-target.h: config-target.h-timestamp
 config-target.h-timestamp: config-target.mak
 
+config-devices.h: config-devices.h-timestamp
+config-devices.h-timestamp: config-devices.mak
+
 ifdef CONFIG_TRACE_SYSTEMTAP
 stap: $(QEMU_PROG).stp-installed $(QEMU_PROG).stp $(QEMU_PROG)-simpletrace.stp
 
@@ -224,4 +227,5 @@ ifdef CONFIG_TRACE_SYSTEMTAP
 endif
 
 GENERATED_FILES += config-target.h
+GENERATED_FILES += config-devices.h
 Makefile: $(GENERATED_FILES)
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 1e8ab7bbc5..b17471092a 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1534,6 +1534,54 @@ static void virtio_pci_modern_io_region_unmap(VirtIOPCIProxy *proxy,
                                 &region->mr);
 }
 
+static VirtIOPCIProxy *virtio_device_to_virtio_pci_proxy(VirtIODevice *vdev)
+{
+    VirtIOPCIProxy *proxy = NULL;
+
+    if (vdev->device_id == VIRTIO_ID_NET) {
+        VirtIONetPCI *d = container_of(vdev, VirtIONetPCI, vdev.parent_obj);
+        proxy = &d->parent_obj;
+    }
+
+    return proxy;
+}
+
+bool virtio_pci_page_per_vq_enabled(VirtIODevice *vdev)
+{
+    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
+
+    if (proxy == NULL) {
+        return false;
+    }
+
+    return !!(proxy->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ);
+}
+
+int virtio_pci_notify_region_map(VirtIODevice *vdev, int queue_idx,
+                                 MemoryRegion *mr)
+{
+    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
+    int offset;
+
+    if (proxy == NULL || !virtio_pci_modern(proxy)) {
+        return -1;
+    }
+
+    offset = virtio_pci_queue_mem_mult(proxy) * queue_idx;
+    memory_region_add_subregion(&proxy->notify.mr, offset, mr);
+
+    return 0;
+}
+
+void virtio_pci_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr)
+{
+    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
+
+    if (proxy != NULL) {
+        memory_region_del_subregion(&proxy->notify.mr, mr);
+    }
+}
+
 static void virtio_pci_pre_plugged(DeviceState *d, Error **errp)
 {
     VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index 813082b0d7..8061133741 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -213,6 +213,11 @@ static inline void virtio_pci_disable_modern(VirtIOPCIProxy *proxy)
     proxy->disable_modern = true;
 }
 
+bool virtio_pci_page_per_vq_enabled(VirtIODevice *vdev);
+int virtio_pci_notify_region_map(VirtIODevice *vdev, int queue_idx,
+                                 MemoryRegion *mr);
+void virtio_pci_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr);
+
 /*
  * virtio-scsi-pci: This extends VirtioPCIProxy.
  */
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 006d3d1148..90ee72984c 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -22,6 +22,7 @@
 #include "qemu/atomic.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
+#include "hw/virtio/virtio-pci.h"
 #include "sysemu/dma.h"
 
 /*
@@ -2681,6 +2682,44 @@ void virtio_device_release_ioeventfd(VirtIODevice *vdev)
     virtio_bus_release_ioeventfd(vbus);
 }
 
+bool virtio_device_parent_is_pci_device(VirtIODevice *vdev)
+{
+    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
+    const char *typename = object_get_typename(OBJECT(qbus->parent));
+
+    return strstr(typename, "pci") != NULL;
+}
+
+bool virtio_device_page_per_vq_enabled(VirtIODevice *vdev)
+{
+#ifdef CONFIG_VIRTIO_PCI
+    if (virtio_device_parent_is_pci_device(vdev)) {
+        return virtio_pci_page_per_vq_enabled(vdev);
+    }
+#endif
+    return false;
+}
+
+int virtio_device_notify_region_map(VirtIODevice *vdev, int queue_idx,
+                                    MemoryRegion *mr)
+{
+#ifdef CONFIG_VIRTIO_PCI
+    if (virtio_device_parent_is_pci_device(vdev)) {
+        return virtio_pci_notify_region_map(vdev, queue_idx, mr);
+    }
+#endif
+    return -1;
+}
+
+void virtio_device_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr)
+{
+#ifdef CONFIG_VIRTIO_PCI
+    if (virtio_device_parent_is_pci_device(vdev)) {
+        virtio_pci_notify_region_unmap(vdev, mr);
+    }
+#endif
+}
+
 static void virtio_device_class_init(ObjectClass *klass, void *data)
 {
     /* Set the default value here. */
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 098bdaaea3..b14accdb08 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -285,6 +285,11 @@ void virtio_device_stop_ioeventfd(VirtIODevice *vdev);
 int virtio_device_grab_ioeventfd(VirtIODevice *vdev);
 void virtio_device_release_ioeventfd(VirtIODevice *vdev);
 bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev);
+bool virtio_device_parent_is_pci_device(VirtIODevice *vdev);
+bool virtio_device_page_per_vq_enabled(VirtIODevice *vdev);
+int virtio_device_notify_region_map(VirtIODevice *vdev, int queue_idx,
+                                    MemoryRegion *mr);
+void virtio_device_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr);
 EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq);
 void virtio_queue_host_notifier_read(EventNotifier *n);
 void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 41658060a7..2532c278ef 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -30,6 +30,7 @@
 #include "config-host.h"
 #ifdef NEED_CPU_H
 #include "config-target.h"
+#include "config-devices.h"
 #else
 #include "exec/poison.h"
 #endif
diff --git a/scripts/create_config b/scripts/create_config
index d727e5e36e..e4541a51ed 100755
--- a/scripts/create_config
+++ b/scripts/create_config
@@ -58,6 +58,9 @@ case $line in
     name=${line%=*}
     echo "#define $name 1"
     ;;
+ CONFIG_*='$(CONFIG_'*')') # configuration
+    continue
+    ;;
  CONFIG_*=*) # configuration
     name=${line%=*}
     value=${line#*=}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [virtio-dev] [PATCH v2 3/6] virtio: support adding sub-regions for notify region
@ 2018-03-19  7:15   ` Tiwei Bie
  0 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-19  7:15 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

Provide APIs to support querying whether the page-per-vq
is enabled and adding sub-regions for notify region.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 Makefile.target            |  4 ++++
 hw/virtio/virtio-pci.c     | 48 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/virtio/virtio-pci.h     |  5 +++++
 hw/virtio/virtio.c         | 39 +++++++++++++++++++++++++++++++++++++
 include/hw/virtio/virtio.h |  5 +++++
 include/qemu/osdep.h       |  1 +
 scripts/create_config      |  3 +++
 7 files changed, 105 insertions(+)

diff --git a/Makefile.target b/Makefile.target
index 6549481096..b2cf618dc9 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -39,6 +39,9 @@ STPFILES=
 config-target.h: config-target.h-timestamp
 config-target.h-timestamp: config-target.mak
 
+config-devices.h: config-devices.h-timestamp
+config-devices.h-timestamp: config-devices.mak
+
 ifdef CONFIG_TRACE_SYSTEMTAP
 stap: $(QEMU_PROG).stp-installed $(QEMU_PROG).stp $(QEMU_PROG)-simpletrace.stp
 
@@ -224,4 +227,5 @@ ifdef CONFIG_TRACE_SYSTEMTAP
 endif
 
 GENERATED_FILES += config-target.h
+GENERATED_FILES += config-devices.h
 Makefile: $(GENERATED_FILES)
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 1e8ab7bbc5..b17471092a 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1534,6 +1534,54 @@ static void virtio_pci_modern_io_region_unmap(VirtIOPCIProxy *proxy,
                                 &region->mr);
 }
 
+static VirtIOPCIProxy *virtio_device_to_virtio_pci_proxy(VirtIODevice *vdev)
+{
+    VirtIOPCIProxy *proxy = NULL;
+
+    if (vdev->device_id == VIRTIO_ID_NET) {
+        VirtIONetPCI *d = container_of(vdev, VirtIONetPCI, vdev.parent_obj);
+        proxy = &d->parent_obj;
+    }
+
+    return proxy;
+}
+
+bool virtio_pci_page_per_vq_enabled(VirtIODevice *vdev)
+{
+    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
+
+    if (proxy == NULL) {
+        return false;
+    }
+
+    return !!(proxy->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ);
+}
+
+int virtio_pci_notify_region_map(VirtIODevice *vdev, int queue_idx,
+                                 MemoryRegion *mr)
+{
+    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
+    int offset;
+
+    if (proxy == NULL || !virtio_pci_modern(proxy)) {
+        return -1;
+    }
+
+    offset = virtio_pci_queue_mem_mult(proxy) * queue_idx;
+    memory_region_add_subregion(&proxy->notify.mr, offset, mr);
+
+    return 0;
+}
+
+void virtio_pci_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr)
+{
+    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
+
+    if (proxy != NULL) {
+        memory_region_del_subregion(&proxy->notify.mr, mr);
+    }
+}
+
 static void virtio_pci_pre_plugged(DeviceState *d, Error **errp)
 {
     VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index 813082b0d7..8061133741 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -213,6 +213,11 @@ static inline void virtio_pci_disable_modern(VirtIOPCIProxy *proxy)
     proxy->disable_modern = true;
 }
 
+bool virtio_pci_page_per_vq_enabled(VirtIODevice *vdev);
+int virtio_pci_notify_region_map(VirtIODevice *vdev, int queue_idx,
+                                 MemoryRegion *mr);
+void virtio_pci_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr);
+
 /*
  * virtio-scsi-pci: This extends VirtioPCIProxy.
  */
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 006d3d1148..90ee72984c 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -22,6 +22,7 @@
 #include "qemu/atomic.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
+#include "hw/virtio/virtio-pci.h"
 #include "sysemu/dma.h"
 
 /*
@@ -2681,6 +2682,44 @@ void virtio_device_release_ioeventfd(VirtIODevice *vdev)
     virtio_bus_release_ioeventfd(vbus);
 }
 
+bool virtio_device_parent_is_pci_device(VirtIODevice *vdev)
+{
+    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
+    const char *typename = object_get_typename(OBJECT(qbus->parent));
+
+    return strstr(typename, "pci") != NULL;
+}
+
+bool virtio_device_page_per_vq_enabled(VirtIODevice *vdev)
+{
+#ifdef CONFIG_VIRTIO_PCI
+    if (virtio_device_parent_is_pci_device(vdev)) {
+        return virtio_pci_page_per_vq_enabled(vdev);
+    }
+#endif
+    return false;
+}
+
+int virtio_device_notify_region_map(VirtIODevice *vdev, int queue_idx,
+                                    MemoryRegion *mr)
+{
+#ifdef CONFIG_VIRTIO_PCI
+    if (virtio_device_parent_is_pci_device(vdev)) {
+        return virtio_pci_notify_region_map(vdev, queue_idx, mr);
+    }
+#endif
+    return -1;
+}
+
+void virtio_device_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr)
+{
+#ifdef CONFIG_VIRTIO_PCI
+    if (virtio_device_parent_is_pci_device(vdev)) {
+        virtio_pci_notify_region_unmap(vdev, mr);
+    }
+#endif
+}
+
 static void virtio_device_class_init(ObjectClass *klass, void *data)
 {
     /* Set the default value here. */
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 098bdaaea3..b14accdb08 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -285,6 +285,11 @@ void virtio_device_stop_ioeventfd(VirtIODevice *vdev);
 int virtio_device_grab_ioeventfd(VirtIODevice *vdev);
 void virtio_device_release_ioeventfd(VirtIODevice *vdev);
 bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev);
+bool virtio_device_parent_is_pci_device(VirtIODevice *vdev);
+bool virtio_device_page_per_vq_enabled(VirtIODevice *vdev);
+int virtio_device_notify_region_map(VirtIODevice *vdev, int queue_idx,
+                                    MemoryRegion *mr);
+void virtio_device_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr);
 EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq);
 void virtio_queue_host_notifier_read(EventNotifier *n);
 void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 41658060a7..2532c278ef 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -30,6 +30,7 @@
 #include "config-host.h"
 #ifdef NEED_CPU_H
 #include "config-target.h"
+#include "config-devices.h"
 #else
 #include "exec/poison.h"
 #endif
diff --git a/scripts/create_config b/scripts/create_config
index d727e5e36e..e4541a51ed 100755
--- a/scripts/create_config
+++ b/scripts/create_config
@@ -58,6 +58,9 @@ case $line in
     name=${line%=*}
     echo "#define $name 1"
     ;;
+ CONFIG_*='$(CONFIG_'*')') # configuration
+    continue
+    ;;
  CONFIG_*=*) # configuration
     name=${line%=*}
     value=${line#*=}
-- 
2.11.0


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 4/6] vfio: support getting VFIOGroup from groupfd
  2018-03-19  7:15 ` [virtio-dev] " Tiwei Bie
@ 2018-03-19  7:15   ` Tiwei Bie
  -1 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-19  7:15 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

Add an API to support getting VFIOGroup from groupfd. When
groupfd is shared by another process, the VFIOGroup may not
have its container and address space in QEMU.

Besides, add a reference counter to better support getting
VFIOGroup multiple times.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/vfio/common.c              | 97 ++++++++++++++++++++++++++++++++++++++++++-
 include/hw/vfio/vfio-common.h |  2 +
 2 files changed, 98 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 5e84716218..24ec0f2c8d 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1038,6 +1038,11 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     int ret, fd;
     VFIOAddressSpace *space;
 
+    if (as == NULL) {
+        vfio_kvm_device_add_group(group);
+        return 0;
+    }
+
     space = vfio_get_address_space(as);
 
     QLIST_FOREACH(container, &space->containers, next) {
@@ -1237,6 +1242,10 @@ static void vfio_disconnect_container(VFIOGroup *group)
 {
     VFIOContainer *container = group->container;
 
+    if (container == NULL) {
+        return;
+    }
+
     QLIST_REMOVE(group, container_next);
     group->container = NULL;
 
@@ -1275,6 +1284,86 @@ static void vfio_disconnect_container(VFIOGroup *group)
     }
 }
 
+static int vfio_groupfd_to_groupid(int groupfd)
+{
+    char linkname[PATH_MAX];
+    char pathname[PATH_MAX];
+    char *filename;
+    int groupid, len;
+
+    snprintf(linkname, sizeof(linkname), "/proc/self/fd/%d", groupfd);
+
+    len = readlink(linkname, pathname, sizeof(pathname));
+    if (len <= 0 || len >= sizeof(pathname)) {
+        return -1;
+    }
+    pathname[len] = '\0';
+
+    filename = g_path_get_basename(pathname);
+    groupid = atoi(filename);
+    g_free(filename);
+
+    return groupid;
+}
+
+/*
+ * The @as param could be NULL. In this case, groupfd is shared by
+ * another process which will setup the DMA mapping for this group,
+ * and this group won't have container and address space in QEMU.
+ */
+VFIOGroup *vfio_get_group_from_fd(int groupfd, AddressSpace *as, Error **errp)
+{
+    VFIOGroup *group;
+    int groupid;
+
+    groupid = vfio_groupfd_to_groupid(groupfd);
+    if (groupid < 0) {
+        return NULL;
+    }
+
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        if (group->groupid == groupid) {
+            /* Found it.  Now is it already in the right context? */
+            if ((group->container == NULL && as == NULL) ||
+                (group->container && group->container->space->as == as)) {
+                    group->refcnt++;
+                    return group;
+            }
+            error_setg(errp, "group %d used in multiple address spaces",
+                       group->groupid);
+            return NULL;
+        }
+    }
+
+    group = g_malloc0(sizeof(*group));
+
+    group->fd = groupfd;
+    group->groupid = groupid;
+    group->refcnt = 1;
+
+    QLIST_INIT(&group->device_list);
+
+    if (vfio_connect_container(group, as, errp)) {
+        error_prepend(errp, "failed to setup container for group %d: ",
+                      groupid);
+        goto free_group_exit;
+    }
+
+    if (QLIST_EMPTY(&vfio_group_list)) {
+        qemu_register_reset(vfio_reset_handler, NULL);
+    }
+
+    QLIST_INSERT_HEAD(&vfio_group_list, group, next);
+
+    return group;
+
+free_group_exit:
+    g_free(group);
+
+    return NULL;
+}
+
+/* The @as param cannot be NULL. */
 VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
 {
     VFIOGroup *group;
@@ -1284,7 +1373,8 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
     QLIST_FOREACH(group, &vfio_group_list, next) {
         if (group->groupid == groupid) {
             /* Found it.  Now is it already in the right context? */
-            if (group->container->space->as == as) {
+            if (group->container && group->container->space->as == as) {
+                group->refcnt++;
                 return group;
             } else {
                 error_setg(errp, "group %d used in multiple address spaces",
@@ -1317,6 +1407,7 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
     }
 
     group->groupid = groupid;
+    group->refcnt = 1;
     QLIST_INIT(&group->device_list);
 
     if (vfio_connect_container(group, as, errp)) {
@@ -1348,6 +1439,10 @@ void vfio_put_group(VFIOGroup *group)
         return;
     }
 
+    if (--group->refcnt > 0) {
+        return;
+    }
+
     vfio_kvm_device_del_group(group);
     vfio_disconnect_container(group);
     QLIST_REMOVE(group, next);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index d9360148e6..b820f7984c 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -137,6 +137,7 @@ struct VFIODeviceOps {
 typedef struct VFIOGroup {
     int fd;
     int groupid;
+    int refcnt;
     VFIOContainer *container;
     QLIST_HEAD(, VFIODevice) device_list;
     QLIST_ENTRY(VFIOGroup) next;
@@ -180,6 +181,7 @@ void vfio_region_exit(VFIORegion *region);
 void vfio_region_finalize(VFIORegion *region);
 void vfio_reset_handler(void *opaque);
 VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp);
+VFIOGroup *vfio_get_group_from_fd(int groupfd, AddressSpace *as, Error **errp);
 void vfio_put_group(VFIOGroup *group);
 int vfio_get_device(VFIOGroup *group, const char *name,
                     VFIODevice *vbasedev, Error **errp);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [virtio-dev] [PATCH v2 4/6] vfio: support getting VFIOGroup from groupfd
@ 2018-03-19  7:15   ` Tiwei Bie
  0 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-19  7:15 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

Add an API to support getting VFIOGroup from groupfd. When
groupfd is shared by another process, the VFIOGroup may not
have its container and address space in QEMU.

Besides, add a reference counter to better support getting
VFIOGroup multiple times.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/vfio/common.c              | 97 ++++++++++++++++++++++++++++++++++++++++++-
 include/hw/vfio/vfio-common.h |  2 +
 2 files changed, 98 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 5e84716218..24ec0f2c8d 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1038,6 +1038,11 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     int ret, fd;
     VFIOAddressSpace *space;
 
+    if (as == NULL) {
+        vfio_kvm_device_add_group(group);
+        return 0;
+    }
+
     space = vfio_get_address_space(as);
 
     QLIST_FOREACH(container, &space->containers, next) {
@@ -1237,6 +1242,10 @@ static void vfio_disconnect_container(VFIOGroup *group)
 {
     VFIOContainer *container = group->container;
 
+    if (container == NULL) {
+        return;
+    }
+
     QLIST_REMOVE(group, container_next);
     group->container = NULL;
 
@@ -1275,6 +1284,86 @@ static void vfio_disconnect_container(VFIOGroup *group)
     }
 }
 
+static int vfio_groupfd_to_groupid(int groupfd)
+{
+    char linkname[PATH_MAX];
+    char pathname[PATH_MAX];
+    char *filename;
+    int groupid, len;
+
+    snprintf(linkname, sizeof(linkname), "/proc/self/fd/%d", groupfd);
+
+    len = readlink(linkname, pathname, sizeof(pathname));
+    if (len <= 0 || len >= sizeof(pathname)) {
+        return -1;
+    }
+    pathname[len] = '\0';
+
+    filename = g_path_get_basename(pathname);
+    groupid = atoi(filename);
+    g_free(filename);
+
+    return groupid;
+}
+
+/*
+ * The @as param could be NULL. In this case, groupfd is shared by
+ * another process which will setup the DMA mapping for this group,
+ * and this group won't have container and address space in QEMU.
+ */
+VFIOGroup *vfio_get_group_from_fd(int groupfd, AddressSpace *as, Error **errp)
+{
+    VFIOGroup *group;
+    int groupid;
+
+    groupid = vfio_groupfd_to_groupid(groupfd);
+    if (groupid < 0) {
+        return NULL;
+    }
+
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        if (group->groupid == groupid) {
+            /* Found it.  Now is it already in the right context? */
+            if ((group->container == NULL && as == NULL) ||
+                (group->container && group->container->space->as == as)) {
+                    group->refcnt++;
+                    return group;
+            }
+            error_setg(errp, "group %d used in multiple address spaces",
+                       group->groupid);
+            return NULL;
+        }
+    }
+
+    group = g_malloc0(sizeof(*group));
+
+    group->fd = groupfd;
+    group->groupid = groupid;
+    group->refcnt = 1;
+
+    QLIST_INIT(&group->device_list);
+
+    if (vfio_connect_container(group, as, errp)) {
+        error_prepend(errp, "failed to setup container for group %d: ",
+                      groupid);
+        goto free_group_exit;
+    }
+
+    if (QLIST_EMPTY(&vfio_group_list)) {
+        qemu_register_reset(vfio_reset_handler, NULL);
+    }
+
+    QLIST_INSERT_HEAD(&vfio_group_list, group, next);
+
+    return group;
+
+free_group_exit:
+    g_free(group);
+
+    return NULL;
+}
+
+/* The @as param cannot be NULL. */
 VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
 {
     VFIOGroup *group;
@@ -1284,7 +1373,8 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
     QLIST_FOREACH(group, &vfio_group_list, next) {
         if (group->groupid == groupid) {
             /* Found it.  Now is it already in the right context? */
-            if (group->container->space->as == as) {
+            if (group->container && group->container->space->as == as) {
+                group->refcnt++;
                 return group;
             } else {
                 error_setg(errp, "group %d used in multiple address spaces",
@@ -1317,6 +1407,7 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
     }
 
     group->groupid = groupid;
+    group->refcnt = 1;
     QLIST_INIT(&group->device_list);
 
     if (vfio_connect_container(group, as, errp)) {
@@ -1348,6 +1439,10 @@ void vfio_put_group(VFIOGroup *group)
         return;
     }
 
+    if (--group->refcnt > 0) {
+        return;
+    }
+
     vfio_kvm_device_del_group(group);
     vfio_disconnect_container(group);
     QLIST_REMOVE(group, next);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index d9360148e6..b820f7984c 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -137,6 +137,7 @@ struct VFIODeviceOps {
 typedef struct VFIOGroup {
     int fd;
     int groupid;
+    int refcnt;
     VFIOContainer *container;
     QLIST_HEAD(, VFIODevice) device_list;
     QLIST_ENTRY(VFIOGroup) next;
@@ -180,6 +181,7 @@ void vfio_region_exit(VFIORegion *region);
 void vfio_region_finalize(VFIORegion *region);
 void vfio_reset_handler(void *opaque);
 VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp);
+VFIOGroup *vfio_get_group_from_fd(int groupfd, AddressSpace *as, Error **errp);
 void vfio_put_group(VFIOGroup *group);
 int vfio_get_device(VFIOGroup *group, const char *name,
                     VFIODevice *vbasedev, Error **errp);
-- 
2.11.0


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 5/6] vfio: remove DPRINTF() definition from vfio-common.h
  2018-03-19  7:15 ` [virtio-dev] " Tiwei Bie
@ 2018-03-19  7:15   ` Tiwei Bie
  -1 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-19  7:15 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

This macro isn't used by any VFIO code. And its name is
too generic. The vfio-common.h (in include/hw/vfio) can
be included by other modules in QEMU. It can introduce
conflicts.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 include/hw/vfio/vfio-common.h | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index b820f7984c..f6aa4ae959 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -34,15 +34,6 @@
 #define ERR_PREFIX "vfio error: %s: "
 #define WARN_PREFIX "vfio warning: %s: "
 
-/*#define DEBUG_VFIO*/
-#ifdef DEBUG_VFIO
-#define DPRINTF(fmt, ...) \
-    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...) \
-    do { } while (0)
-#endif
-
 enum {
     VFIO_DEVICE_TYPE_PCI = 0,
     VFIO_DEVICE_TYPE_PLATFORM = 1,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [virtio-dev] [PATCH v2 5/6] vfio: remove DPRINTF() definition from vfio-common.h
@ 2018-03-19  7:15   ` Tiwei Bie
  0 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-19  7:15 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

This macro isn't used by any VFIO code. And its name is
too generic. The vfio-common.h (in include/hw/vfio) can
be included by other modules in QEMU. It can introduce
conflicts.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 include/hw/vfio/vfio-common.h | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index b820f7984c..f6aa4ae959 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -34,15 +34,6 @@
 #define ERR_PREFIX "vfio error: %s: "
 #define WARN_PREFIX "vfio warning: %s: "
 
-/*#define DEBUG_VFIO*/
-#ifdef DEBUG_VFIO
-#define DPRINTF(fmt, ...) \
-    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...) \
-    do { } while (0)
-#endif
-
 enum {
     VFIO_DEVICE_TYPE_PCI = 0,
     VFIO_DEVICE_TYPE_PLATFORM = 1,
-- 
2.11.0


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 6/6] vhost-user: add VFIO based accelerators support
  2018-03-19  7:15 ` [virtio-dev] " Tiwei Bie
@ 2018-03-19  7:15   ` Tiwei Bie
  -1 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-19  7:15 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

This patch does some small extensions to vhost-user protocol to
support VFIO based accelerators, and makes it possible to get the
similar performance of VFIO based PCI passthru while keeping the
virtio device emulation in QEMU.

Any virtio ring compatible devices potentially can be used as the
vhost data path accelerators. We can setup the accelerator based
on the informations (e.g. memory table, features, ring info, etc)
available on the vhost backend. And accelerator will be able to use
the virtio ring provided by the virtio driver in the VM directly.
So the virtio driver in the VM can exchange e.g. network packets
with the accelerator directly via the virtio ring.

But for vhost-user, the critical issue in this case is that the
data path performance is relatively low and some host threads are
needed for the data path, because some necessary mechanisms are
missing to support:

1) guest driver notifies the device directly;
2) device interrupts the guest directly;

So this patch does some small extensions to vhost-user protocol
to make both of them possible. It leverages the same mechanisms
as the VFIO based PCI passthru.

A new protocol feature bit is added to negotiate the accelerator
feature support. Two new slave message types are added to control
the notify region and queue interrupt passthru for each queue.
>From the view of vhost-user protocol design, it's very flexible.
The passthru can be enabled/disabled for each queue individually,
and it's possible to accelerate each queue by different devices.

The key difference with PCI passthru is that, in this case only
the data path of the device (e.g. DMA ring, notify region and
queue interrupt) is pass-throughed to the VM, the device control
path (e.g. PCI configuration space and MMIO regions) is still
defined and emulated by QEMU.

The benefits of keeping virtio device emulation in QEMU compared
with virtio device PCI passthru include (but not limit to):

- consistent device interface for guest OS in the VM;
- max flexibility on the hardware (i.e. the accelerators) design;
- leveraging the existing virtio live-migration framework;

Normally, vhost-user is meant for connecting to e.g. user-space
switch which is shared between multiple VMs. Typically, a vhost
accelerator isn't a simple NIC which is just for packet I/O, but
e.g. an switch accelerator which is also shared between multiple
VMs. This commit extends vhost-user to better support connecting
to e.g. a user-space switch that has an accelerator.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 docs/interop/vhost-user.txt    |  57 ++++++++++++
 hw/virtio/vhost-user.c         | 198 +++++++++++++++++++++++++++++++++++++++++
 include/hw/virtio/vhost-user.h |  17 ++++
 3 files changed, 272 insertions(+)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index cb3a7595aa..264a58a800 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -132,6 +132,15 @@ Depending on the request type, payload can be:
    Payload: Size bytes array holding the contents of the virtio
        device's configuration space
 
+ * Vring area description
+   -----------------------
+   | u64 | size | offset |
+   -----------------------
+
+   u64: a 64-bit unsigned integer
+   Size: a 64-bit size
+   Offset: a 64-bit offset
+
 In QEMU the vhost-user message is implemented with the following struct:
 
 typedef struct VhostUserMsg {
@@ -146,6 +155,7 @@ typedef struct VhostUserMsg {
         VhostUserLog log;
         struct vhost_iotlb_msg iotlb;
         VhostUserConfig config;
+        VhostUserVringArea area;
     };
 } QEMU_PACKED VhostUserMsg;
 
@@ -358,6 +368,17 @@ The fd is provided via VHOST_USER_SET_SLAVE_REQ_FD ancillary data.
 A slave may then send VHOST_USER_SLAVE_* messages to the master
 using this fd communication channel.
 
+VFIO based accelerators
+-----------------------
+
+The VFIO based accelerators feature is a protocol extension. It is supported
+when the protocol feature VHOST_USER_PROTOCOL_F_VFIO (bit 7) is set.
+
+The vhost-user backend will set the accelerator context via slave channel,
+and QEMU just needs to handle those messages passively. The accelerator
+context will be set for each queue independently. So the page-per-vq property
+should also be enabled.
+
 Protocol features
 -----------------
 
@@ -369,6 +390,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_SLAVE_REQ      5
 #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN   6
 #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
+#define VHOST_USER_PROTOCOL_F_VFIO           8
 
 Master message types
 --------------------
@@ -722,6 +744,41 @@ Slave message types
      respond with zero when operation is successfully completed, or non-zero
      otherwise.
 
+ * VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG
+
+      Id: 3
+      Equivalent ioctl: N/A
+      Slave payload: u64
+      Master payload: N/A
+
+      Sets the VFIO group file descriptor which is passed as ancillary data
+      for a specified queue (queue index is carried in the u64 payload).
+      Slave sends this request to tell QEMU to add or delete a VFIO group.
+      QEMU will delete the current group if any for the specified queue when
+      the message is sent without a file descriptor. A VFIO group will be
+      actually deleted when its reference count reaches zero.
+      This request should be sent only when VHOST_USER_PROTOCOL_F_VFIO protocol
+      feature has been successfully negotiated.
+
+ * VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG
+
+      Id: 4
+      Equivalent ioctl: N/A
+      Slave payload: vring area description
+      Master payload: N/A
+
+      Sets the notify area for a specified queue (queue index is carried
+      in the u64 field of the vring area description). A file descriptor is
+      passed as ancillary data (typically it's a VFIO device fd). QEMU can
+      mmap the file descriptor based on the information carried in the vring
+      area description.
+      Slave sends this request to tell QEMU to add or delete a MemoryRegion
+      for a specified queue's notify MMIO region. QEMU will delete the current
+      MemoryRegion if any for the specified queue when the message is sent
+      without a file descriptor.
+      This request should be sent only when VHOST_USER_PROTOCOL_F_VFIO protocol
+      feature and VIRTIO_F_VERSION_1 feature have been successfully negotiated.
+
 VHOST_USER_PROTOCOL_F_REPLY_ACK:
 -------------------------------
 The original vhost-user specification only demands replies for certain
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index b228994ffd..07fc63c6e8 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -42,6 +42,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5,
     VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6,
     VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
+    VHOST_USER_PROTOCOL_F_VFIO = 8,
 
     VHOST_USER_PROTOCOL_F_MAX
 };
@@ -84,6 +85,8 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_NONE = 0,
     VHOST_USER_SLAVE_IOTLB_MSG = 1,
     VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
+    VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG = 3,
+    VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG = 4,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -128,6 +131,12 @@ static VhostUserConfig c __attribute__ ((unused));
                                    + sizeof(c.size) \
                                    + sizeof(c.flags))
 
+typedef struct VhostUserVringArea {
+    uint64_t u64;
+    uint64_t size;
+    uint64_t offset;
+} VhostUserVringArea;
+
 typedef struct {
     VhostUserRequest request;
 
@@ -149,6 +158,7 @@ typedef union {
         struct vhost_iotlb_msg iotlb;
         VhostUserConfig config;
         VhostUserCryptoSession session;
+        VhostUserVringArea area;
 } VhostUserPayload;
 
 typedef struct VhostUserMsg {
@@ -459,9 +469,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
     return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
 }
 
+static void vhost_user_notify_region_remap(struct vhost_dev *dev, int queue_idx)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserVFIOState *vfio = &u->shared->vfio;
+    VhostUserNotifyCtx *notify = &vfio->notify[queue_idx];
+    VirtIODevice *vdev = dev->vdev;
+
+    if (notify->addr && !notify->mapped) {
+        virtio_device_notify_region_map(vdev, queue_idx, &notify->mr);
+        notify->mapped = true;
+    }
+}
+
+static void vhost_user_notify_region_unmap(struct vhost_dev *dev, int queue_idx)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserVFIOState *vfio = &u->shared->vfio;
+    VhostUserNotifyCtx *notify = &vfio->notify[queue_idx];
+    VirtIODevice *vdev = dev->vdev;
+
+    if (notify->addr && notify->mapped) {
+        virtio_device_notify_region_unmap(vdev, &notify->mr);
+        notify->mapped = false;
+    }
+}
+
 static int vhost_user_set_vring_base(struct vhost_dev *dev,
                                      struct vhost_vring_state *ring)
 {
+    vhost_user_notify_region_remap(dev, ring->index);
+
     return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
 }
 
@@ -495,6 +533,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
         .hdr.size = sizeof(msg.payload.state),
     };
 
+    vhost_user_notify_region_unmap(dev, ring->index);
+
     if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
         return -1;
     }
@@ -668,6 +708,133 @@ static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
     return ret;
 }
 
+static int vhost_user_handle_vring_vfio_group(struct vhost_dev *dev,
+                                              uint64_t u64,
+                                              int groupfd)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserVFIOState *vfio = &u->shared->vfio;
+    int queue_idx = u64 & VHOST_USER_VRING_IDX_MASK;
+    VirtIODevice *vdev = dev->vdev;
+    VFIOGroup *group;
+    int ret = 0;
+
+    qemu_mutex_lock(&vfio->lock);
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_VFIO) ||
+        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
+        ret = -1;
+        goto out;
+    }
+
+    if (vfio->group[queue_idx]) {
+        vfio_put_group(vfio->group[queue_idx]);
+        vfio->group[queue_idx] = NULL;
+    }
+
+    if (u64 & VHOST_USER_VRING_NOFD_MASK) {
+        goto out;
+    }
+
+    group = vfio_get_group_from_fd(groupfd, NULL, NULL);
+    if (group == NULL) {
+        ret = -1;
+        goto out;
+    }
+
+    if (group->fd != groupfd) {
+        close(groupfd);
+    }
+
+    vfio->group[queue_idx] = group;
+
+out:
+    kvm_irqchip_commit_routes(kvm_state);
+    qemu_mutex_unlock(&vfio->lock);
+
+    if (ret != 0 && groupfd != -1) {
+        close(groupfd);
+    }
+
+    return ret;
+}
+
+#define NOTIFY_PAGE_SIZE 0x1000
+
+static int vhost_user_handle_vring_notify_area(struct vhost_dev *dev,
+                                               VhostUserVringArea *area,
+                                               int fd)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserVFIOState *vfio = &u->shared->vfio;
+    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
+    VirtIODevice *vdev = dev->vdev;
+    VhostUserNotifyCtx *notify;
+    void *addr = NULL;
+    int ret = 0;
+    char *name;
+
+    qemu_mutex_lock(&vfio->lock);
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_VFIO) ||
+        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev) ||
+        !virtio_device_page_per_vq_enabled(vdev)) {
+        ret = -1;
+        goto out;
+    }
+
+    notify = &vfio->notify[queue_idx];
+
+    if (notify->addr) {
+        virtio_device_notify_region_unmap(vdev, &notify->mr);
+        munmap(notify->addr, NOTIFY_PAGE_SIZE);
+        object_unparent(OBJECT(&notify->mr));
+        notify->addr = NULL;
+    }
+
+    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
+        goto out;
+    }
+
+    if (area->size < NOTIFY_PAGE_SIZE) {
+        ret = -1;
+        goto out;
+    }
+
+    addr = mmap(NULL, NOTIFY_PAGE_SIZE, PROT_READ | PROT_WRITE,
+                MAP_SHARED, fd, area->offset);
+    if (addr == MAP_FAILED) {
+        error_report("Can't map notify region.");
+        ret = -1;
+        goto out;
+    }
+
+    name = g_strdup_printf("vhost-user/vfio@%p mmaps[%d]", vfio, queue_idx);
+    memory_region_init_ram_device_ptr(&notify->mr, OBJECT(vdev), name,
+                                      NOTIFY_PAGE_SIZE, addr);
+    g_free(name);
+
+    if (virtio_device_notify_region_map(vdev, queue_idx, &notify->mr)) {
+        ret = -1;
+        goto out;
+    }
+
+    notify->addr = addr;
+    notify->mapped = true;
+
+out:
+    if (ret < 0 && addr != NULL) {
+        munmap(addr, NOTIFY_PAGE_SIZE);
+    }
+    if (fd != -1) {
+        close(fd);
+    }
+    qemu_mutex_unlock(&vfio->lock);
+    return ret;
+}
+
 static void slave_read(void *opaque)
 {
     struct vhost_dev *dev = opaque;
@@ -734,6 +901,12 @@ static void slave_read(void *opaque)
     case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
         ret = vhost_user_slave_handle_config_change(dev);
         break;
+    case VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG:
+        ret = vhost_user_handle_vring_vfio_group(dev, payload.u64, fd);
+        break;
+    case VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG:
+        ret = vhost_user_handle_vring_notify_area(dev, &payload.area, fd);
+        break;
     default:
         error_report("Received unexpected msg type.");
         if (fd != -1) {
@@ -844,6 +1017,10 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
     u->slave_fd = -1;
     dev->opaque = u;
 
+    if (dev->vq_index == 0) {
+        qemu_mutex_init(&u->shared->vfio.lock);
+    }
+
     err = vhost_user_get_features(dev, &features);
     if (err < 0) {
         return err;
@@ -904,6 +1081,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 static int vhost_user_cleanup(struct vhost_dev *dev)
 {
     struct vhost_user *u;
+    int i;
 
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
@@ -913,6 +1091,26 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
         close(u->slave_fd);
         u->slave_fd = -1;
     }
+
+    if (dev->vq_index == 0) {
+        VhostUserVFIOState *vfio = &u->shared->vfio;
+
+        for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
+            if (vfio->notify[i].addr) {
+                munmap(vfio->notify[i].addr, NOTIFY_PAGE_SIZE);
+                object_unparent(OBJECT(&vfio->notify[i].mr));
+                vfio->notify[i].addr = NULL;
+            }
+
+            if (vfio->group[i]) {
+                vfio_put_group(vfio->group[i]);
+                vfio->group[i] = NULL;
+            }
+        }
+
+        qemu_mutex_destroy(&u->shared->vfio.lock);
+    }
+
     g_free(u);
     dev->opaque = 0;
 
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index 4f5a1477d1..de8c647962 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -9,9 +9,26 @@
 #define HW_VIRTIO_VHOST_USER_H
 
 #include "chardev/char-fe.h"
+#include "hw/virtio/virtio.h"
+#include "hw/vfio/vfio-common.h"
+
+typedef struct VhostUserNotifyCtx {
+    void *addr;
+    MemoryRegion mr;
+    bool mapped;
+} VhostUserNotifyCtx;
+
+typedef struct VhostUserVFIOState {
+    /* The VFIO group associated with each queue */
+    VFIOGroup *group[VIRTIO_QUEUE_MAX];
+    /* The notify context of each queue */
+    VhostUserNotifyCtx notify[VIRTIO_QUEUE_MAX];
+    QemuMutex lock;
+} VhostUserVFIOState;
 
 typedef struct VhostUser {
     CharBackend chr;
+    VhostUserVFIOState vfio;
 } VhostUser;
 
 #endif
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [virtio-dev] [PATCH v2 6/6] vhost-user: add VFIO based accelerators support
@ 2018-03-19  7:15   ` Tiwei Bie
  0 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-19  7:15 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, alex.williamson, jasowang, pbonzini,
	stefanha
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

This patch does some small extensions to vhost-user protocol to
support VFIO based accelerators, and makes it possible to get the
similar performance of VFIO based PCI passthru while keeping the
virtio device emulation in QEMU.

Any virtio ring compatible devices potentially can be used as the
vhost data path accelerators. We can setup the accelerator based
on the informations (e.g. memory table, features, ring info, etc)
available on the vhost backend. And accelerator will be able to use
the virtio ring provided by the virtio driver in the VM directly.
So the virtio driver in the VM can exchange e.g. network packets
with the accelerator directly via the virtio ring.

But for vhost-user, the critical issue in this case is that the
data path performance is relatively low and some host threads are
needed for the data path, because some necessary mechanisms are
missing to support:

1) guest driver notifies the device directly;
2) device interrupts the guest directly;

So this patch does some small extensions to vhost-user protocol
to make both of them possible. It leverages the same mechanisms
as the VFIO based PCI passthru.

A new protocol feature bit is added to negotiate the accelerator
feature support. Two new slave message types are added to control
the notify region and queue interrupt passthru for each queue.
From the view of vhost-user protocol design, it's very flexible.
The passthru can be enabled/disabled for each queue individually,
and it's possible to accelerate each queue by different devices.

The key difference with PCI passthru is that, in this case only
the data path of the device (e.g. DMA ring, notify region and
queue interrupt) is pass-throughed to the VM, the device control
path (e.g. PCI configuration space and MMIO regions) is still
defined and emulated by QEMU.

The benefits of keeping virtio device emulation in QEMU compared
with virtio device PCI passthru include (but not limit to):

- consistent device interface for guest OS in the VM;
- max flexibility on the hardware (i.e. the accelerators) design;
- leveraging the existing virtio live-migration framework;

Normally, vhost-user is meant for connecting to e.g. user-space
switch which is shared between multiple VMs. Typically, a vhost
accelerator isn't a simple NIC which is just for packet I/O, but
e.g. an switch accelerator which is also shared between multiple
VMs. This commit extends vhost-user to better support connecting
to e.g. a user-space switch that has an accelerator.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 docs/interop/vhost-user.txt    |  57 ++++++++++++
 hw/virtio/vhost-user.c         | 198 +++++++++++++++++++++++++++++++++++++++++
 include/hw/virtio/vhost-user.h |  17 ++++
 3 files changed, 272 insertions(+)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index cb3a7595aa..264a58a800 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -132,6 +132,15 @@ Depending on the request type, payload can be:
    Payload: Size bytes array holding the contents of the virtio
        device's configuration space
 
+ * Vring area description
+   -----------------------
+   | u64 | size | offset |
+   -----------------------
+
+   u64: a 64-bit unsigned integer
+   Size: a 64-bit size
+   Offset: a 64-bit offset
+
 In QEMU the vhost-user message is implemented with the following struct:
 
 typedef struct VhostUserMsg {
@@ -146,6 +155,7 @@ typedef struct VhostUserMsg {
         VhostUserLog log;
         struct vhost_iotlb_msg iotlb;
         VhostUserConfig config;
+        VhostUserVringArea area;
     };
 } QEMU_PACKED VhostUserMsg;
 
@@ -358,6 +368,17 @@ The fd is provided via VHOST_USER_SET_SLAVE_REQ_FD ancillary data.
 A slave may then send VHOST_USER_SLAVE_* messages to the master
 using this fd communication channel.
 
+VFIO based accelerators
+-----------------------
+
+The VFIO based accelerators feature is a protocol extension. It is supported
+when the protocol feature VHOST_USER_PROTOCOL_F_VFIO (bit 7) is set.
+
+The vhost-user backend will set the accelerator context via slave channel,
+and QEMU just needs to handle those messages passively. The accelerator
+context will be set for each queue independently. So the page-per-vq property
+should also be enabled.
+
 Protocol features
 -----------------
 
@@ -369,6 +390,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_SLAVE_REQ      5
 #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN   6
 #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
+#define VHOST_USER_PROTOCOL_F_VFIO           8
 
 Master message types
 --------------------
@@ -722,6 +744,41 @@ Slave message types
      respond with zero when operation is successfully completed, or non-zero
      otherwise.
 
+ * VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG
+
+      Id: 3
+      Equivalent ioctl: N/A
+      Slave payload: u64
+      Master payload: N/A
+
+      Sets the VFIO group file descriptor which is passed as ancillary data
+      for a specified queue (queue index is carried in the u64 payload).
+      Slave sends this request to tell QEMU to add or delete a VFIO group.
+      QEMU will delete the current group if any for the specified queue when
+      the message is sent without a file descriptor. A VFIO group will be
+      actually deleted when its reference count reaches zero.
+      This request should be sent only when VHOST_USER_PROTOCOL_F_VFIO protocol
+      feature has been successfully negotiated.
+
+ * VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG
+
+      Id: 4
+      Equivalent ioctl: N/A
+      Slave payload: vring area description
+      Master payload: N/A
+
+      Sets the notify area for a specified queue (queue index is carried
+      in the u64 field of the vring area description). A file descriptor is
+      passed as ancillary data (typically it's a VFIO device fd). QEMU can
+      mmap the file descriptor based on the information carried in the vring
+      area description.
+      Slave sends this request to tell QEMU to add or delete a MemoryRegion
+      for a specified queue's notify MMIO region. QEMU will delete the current
+      MemoryRegion if any for the specified queue when the message is sent
+      without a file descriptor.
+      This request should be sent only when VHOST_USER_PROTOCOL_F_VFIO protocol
+      feature and VIRTIO_F_VERSION_1 feature have been successfully negotiated.
+
 VHOST_USER_PROTOCOL_F_REPLY_ACK:
 -------------------------------
 The original vhost-user specification only demands replies for certain
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index b228994ffd..07fc63c6e8 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -42,6 +42,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5,
     VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6,
     VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
+    VHOST_USER_PROTOCOL_F_VFIO = 8,
 
     VHOST_USER_PROTOCOL_F_MAX
 };
@@ -84,6 +85,8 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_NONE = 0,
     VHOST_USER_SLAVE_IOTLB_MSG = 1,
     VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
+    VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG = 3,
+    VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG = 4,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -128,6 +131,12 @@ static VhostUserConfig c __attribute__ ((unused));
                                    + sizeof(c.size) \
                                    + sizeof(c.flags))
 
+typedef struct VhostUserVringArea {
+    uint64_t u64;
+    uint64_t size;
+    uint64_t offset;
+} VhostUserVringArea;
+
 typedef struct {
     VhostUserRequest request;
 
@@ -149,6 +158,7 @@ typedef union {
         struct vhost_iotlb_msg iotlb;
         VhostUserConfig config;
         VhostUserCryptoSession session;
+        VhostUserVringArea area;
 } VhostUserPayload;
 
 typedef struct VhostUserMsg {
@@ -459,9 +469,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
     return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
 }
 
+static void vhost_user_notify_region_remap(struct vhost_dev *dev, int queue_idx)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserVFIOState *vfio = &u->shared->vfio;
+    VhostUserNotifyCtx *notify = &vfio->notify[queue_idx];
+    VirtIODevice *vdev = dev->vdev;
+
+    if (notify->addr && !notify->mapped) {
+        virtio_device_notify_region_map(vdev, queue_idx, &notify->mr);
+        notify->mapped = true;
+    }
+}
+
+static void vhost_user_notify_region_unmap(struct vhost_dev *dev, int queue_idx)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserVFIOState *vfio = &u->shared->vfio;
+    VhostUserNotifyCtx *notify = &vfio->notify[queue_idx];
+    VirtIODevice *vdev = dev->vdev;
+
+    if (notify->addr && notify->mapped) {
+        virtio_device_notify_region_unmap(vdev, &notify->mr);
+        notify->mapped = false;
+    }
+}
+
 static int vhost_user_set_vring_base(struct vhost_dev *dev,
                                      struct vhost_vring_state *ring)
 {
+    vhost_user_notify_region_remap(dev, ring->index);
+
     return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
 }
 
@@ -495,6 +533,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
         .hdr.size = sizeof(msg.payload.state),
     };
 
+    vhost_user_notify_region_unmap(dev, ring->index);
+
     if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
         return -1;
     }
@@ -668,6 +708,133 @@ static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
     return ret;
 }
 
+static int vhost_user_handle_vring_vfio_group(struct vhost_dev *dev,
+                                              uint64_t u64,
+                                              int groupfd)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserVFIOState *vfio = &u->shared->vfio;
+    int queue_idx = u64 & VHOST_USER_VRING_IDX_MASK;
+    VirtIODevice *vdev = dev->vdev;
+    VFIOGroup *group;
+    int ret = 0;
+
+    qemu_mutex_lock(&vfio->lock);
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_VFIO) ||
+        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
+        ret = -1;
+        goto out;
+    }
+
+    if (vfio->group[queue_idx]) {
+        vfio_put_group(vfio->group[queue_idx]);
+        vfio->group[queue_idx] = NULL;
+    }
+
+    if (u64 & VHOST_USER_VRING_NOFD_MASK) {
+        goto out;
+    }
+
+    group = vfio_get_group_from_fd(groupfd, NULL, NULL);
+    if (group == NULL) {
+        ret = -1;
+        goto out;
+    }
+
+    if (group->fd != groupfd) {
+        close(groupfd);
+    }
+
+    vfio->group[queue_idx] = group;
+
+out:
+    kvm_irqchip_commit_routes(kvm_state);
+    qemu_mutex_unlock(&vfio->lock);
+
+    if (ret != 0 && groupfd != -1) {
+        close(groupfd);
+    }
+
+    return ret;
+}
+
+#define NOTIFY_PAGE_SIZE 0x1000
+
+static int vhost_user_handle_vring_notify_area(struct vhost_dev *dev,
+                                               VhostUserVringArea *area,
+                                               int fd)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserVFIOState *vfio = &u->shared->vfio;
+    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
+    VirtIODevice *vdev = dev->vdev;
+    VhostUserNotifyCtx *notify;
+    void *addr = NULL;
+    int ret = 0;
+    char *name;
+
+    qemu_mutex_lock(&vfio->lock);
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_VFIO) ||
+        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev) ||
+        !virtio_device_page_per_vq_enabled(vdev)) {
+        ret = -1;
+        goto out;
+    }
+
+    notify = &vfio->notify[queue_idx];
+
+    if (notify->addr) {
+        virtio_device_notify_region_unmap(vdev, &notify->mr);
+        munmap(notify->addr, NOTIFY_PAGE_SIZE);
+        object_unparent(OBJECT(&notify->mr));
+        notify->addr = NULL;
+    }
+
+    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
+        goto out;
+    }
+
+    if (area->size < NOTIFY_PAGE_SIZE) {
+        ret = -1;
+        goto out;
+    }
+
+    addr = mmap(NULL, NOTIFY_PAGE_SIZE, PROT_READ | PROT_WRITE,
+                MAP_SHARED, fd, area->offset);
+    if (addr == MAP_FAILED) {
+        error_report("Can't map notify region.");
+        ret = -1;
+        goto out;
+    }
+
+    name = g_strdup_printf("vhost-user/vfio@%p mmaps[%d]", vfio, queue_idx);
+    memory_region_init_ram_device_ptr(&notify->mr, OBJECT(vdev), name,
+                                      NOTIFY_PAGE_SIZE, addr);
+    g_free(name);
+
+    if (virtio_device_notify_region_map(vdev, queue_idx, &notify->mr)) {
+        ret = -1;
+        goto out;
+    }
+
+    notify->addr = addr;
+    notify->mapped = true;
+
+out:
+    if (ret < 0 && addr != NULL) {
+        munmap(addr, NOTIFY_PAGE_SIZE);
+    }
+    if (fd != -1) {
+        close(fd);
+    }
+    qemu_mutex_unlock(&vfio->lock);
+    return ret;
+}
+
 static void slave_read(void *opaque)
 {
     struct vhost_dev *dev = opaque;
@@ -734,6 +901,12 @@ static void slave_read(void *opaque)
     case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
         ret = vhost_user_slave_handle_config_change(dev);
         break;
+    case VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG:
+        ret = vhost_user_handle_vring_vfio_group(dev, payload.u64, fd);
+        break;
+    case VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG:
+        ret = vhost_user_handle_vring_notify_area(dev, &payload.area, fd);
+        break;
     default:
         error_report("Received unexpected msg type.");
         if (fd != -1) {
@@ -844,6 +1017,10 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
     u->slave_fd = -1;
     dev->opaque = u;
 
+    if (dev->vq_index == 0) {
+        qemu_mutex_init(&u->shared->vfio.lock);
+    }
+
     err = vhost_user_get_features(dev, &features);
     if (err < 0) {
         return err;
@@ -904,6 +1081,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 static int vhost_user_cleanup(struct vhost_dev *dev)
 {
     struct vhost_user *u;
+    int i;
 
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
@@ -913,6 +1091,26 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
         close(u->slave_fd);
         u->slave_fd = -1;
     }
+
+    if (dev->vq_index == 0) {
+        VhostUserVFIOState *vfio = &u->shared->vfio;
+
+        for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
+            if (vfio->notify[i].addr) {
+                munmap(vfio->notify[i].addr, NOTIFY_PAGE_SIZE);
+                object_unparent(OBJECT(&vfio->notify[i].mr));
+                vfio->notify[i].addr = NULL;
+            }
+
+            if (vfio->group[i]) {
+                vfio_put_group(vfio->group[i]);
+                vfio->group[i] = NULL;
+            }
+        }
+
+        qemu_mutex_destroy(&u->shared->vfio.lock);
+    }
+
     g_free(u);
     dev->opaque = 0;
 
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index 4f5a1477d1..de8c647962 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -9,9 +9,26 @@
 #define HW_VIRTIO_VHOST_USER_H
 
 #include "chardev/char-fe.h"
+#include "hw/virtio/virtio.h"
+#include "hw/vfio/vfio-common.h"
+
+typedef struct VhostUserNotifyCtx {
+    void *addr;
+    MemoryRegion mr;
+    bool mapped;
+} VhostUserNotifyCtx;
+
+typedef struct VhostUserVFIOState {
+    /* The VFIO group associated with each queue */
+    VFIOGroup *group[VIRTIO_QUEUE_MAX];
+    /* The notify context of each queue */
+    VhostUserNotifyCtx notify[VIRTIO_QUEUE_MAX];
+    QemuMutex lock;
+} VhostUserVFIOState;
 
 typedef struct VhostUser {
     CharBackend chr;
+    VhostUserVFIOState vfio;
 } VhostUser;
 
 #endif
-- 
2.11.0


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators
  2018-03-19  7:15 ` [virtio-dev] " Tiwei Bie
@ 2018-03-22 14:55   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-03-22 14:55 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Mon, Mar 19, 2018 at 03:15:31PM +0800, Tiwei Bie wrote:
> This patch set does some small extensions to vhost-user protocol
> to support VFIO based accelerators, and makes it possible to get
> the similar performance of VFIO based PCI passthru while keeping
> the virtio device emulation in QEMU.

I love your patches!
Yet there are some things to improve.
Posting comments separately as individual messages.


> How does accelerator accelerate vhost (data path)
> =================================================
> 
> Any virtio ring compatible devices potentially can be used as the
> vhost data path accelerators. We can setup the accelerator based
> on the informations (e.g. memory table, features, ring info, etc)
> available on the vhost backend. And accelerator will be able to use
> the virtio ring provided by the virtio driver in the VM directly.
> So the virtio driver in the VM can exchange e.g. network packets
> with the accelerator directly via the virtio ring. That is to say,
> we will be able to use the accelerator to accelerate the vhost
> data path. We call it vDPA: vhost Data Path Acceleration.
> 
> Notice: Although the accelerator can talk with the virtio driver
> in the VM via the virtio ring directly. The control path events
> (e.g. device start/stop) in the VM will still be trapped and handled
> by QEMU, and QEMU will deliver such events to the vhost backend
> via standard vhost protocol.
> 
> Below link is an example showing how to setup a such environment
> via nested VM. In this case, the virtio device in the outer VM is
> the accelerator. It will be used to accelerate the virtio device
> in the inner VM. In reality, we could use virtio ring compatible
> hardware device as the accelerators.
> 
> http://dpdk.org/ml/archives/dev/2017-December/085044.html
> 
> In above example, it doesn't require any changes to QEMU, but
> it has lower performance compared with the traditional VFIO
> based PCI passthru. And that's the problem this patch set wants
> to solve.
> 
> The performance issue of vDPA/vhost-user and solutions
> ======================================================
> 
> For vhost-user backend, the critical issue in vDPA is that the
> data path performance is relatively low and some host threads are
> needed for the data path, because some necessary mechanisms are
> missing to support:
> 
> 1) guest driver notifies the device directly;
> 2) device interrupts the guest directly;
> 
> So this patch set does some small extensions to the vhost-user
> protocol to make both of them possible. It leverages the same
> mechanisms (e.g. EPT and Posted-Interrupt on Intel platform) as
> the PCI passthru.
> 
> A new protocol feature bit is added to negotiate the accelerator
> feature support. Two new slave message types are added to control
> the notify region and queue interrupt passthru for each queue.
> >From the view of vhost-user protocol design, it's very flexible.
> The passthru can be enabled/disabled for each queue individually,
> and it's possible to accelerate each queue by different devices.
> More design and implementation details can be found from the last
> patch.
> 
> Difference between vDPA and PCI passthru
> ========================================
> 
> The key difference between PCI passthru and vDPA is that, in vDPA
> only the data path of the device (e.g. DMA ring, notify region and
> queue interrupt) is pass-throughed to the VM, the device control
> path (e.g. PCI configuration space and MMIO regions) is still
> defined and emulated by QEMU.
> 
> The benefits of keeping virtio device emulation in QEMU compared
> with virtio device PCI passthru include (but not limit to):
> 
> - consistent device interface for guest OS in the VM;
> - max flexibility on the hardware (i.e. the accelerators) design;
> - leveraging the existing virtio live-migration framework;
> 
> Why extend vhost-user for vDPA
> ==============================
> 
> We have already implemented various virtual switches (e.g. OVS-DPDK)
> based on vhost-user for VMs in the Cloud. They are purely software
> running on CPU cores. When we have accelerators for such NFVi applications,
> it's ideal if the applications could keep using the original interface
> (i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
> when and how to switch between CPU and accelerators within the interface.
> And the switching (i.e. switch between CPU and accelerators) can be done
> flexibly and quickly inside the applications.
> 
> More details about this can be found from the Cunming's discussions on
> the RFC patch set.
> 
> Update notes
> ============
> 
> IOMMU feature bit check is removed in this version, because:
> 
> The IOMMU feature is negotiable, when an accelerator is used and
> it doesn't support virtual IOMMU, its driver just won't provide
> this feature bit when vhost library querying its features. And if
> it supports the virtual IOMMU, its driver can provide this feature
> bit. It's not reasonable to add this limitation in this patch set.
> 
> The previous links:
> RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
> v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html
> 
> v1 -> v2:
> - Add some explanations about why extend vhost-user in commit log (Paolo);
> - Bug fix in slave_read() according to Stefan's fix in DPDK;
> - Remove IOMMU feature check and related commit log;
> - Some minor refinements;
> - Rebase to the latest QEMU;
> 
> RFC -> v1:
> - Add some details about how vDPA works in cover letter (Alexey)
> - Add some details about the OVS offload use-case in cover letter (Jason)
> - Move PCI specific stuffs out of vhost-user (Jason)
> - Handle the virtual IOMMU case (Jason)
> - Move VFIO group management code into vfio/common.c (Alex)
> - Various refinements;
> (approximately sorted by comment posting time)
> 
> Tiwei Bie (6):
>   vhost-user: support receiving file descriptors in slave_read
>   vhost-user: introduce shared vhost-user state
>   virtio: support adding sub-regions for notify region
>   vfio: support getting VFIOGroup from groupfd
>   vfio: remove DPRINTF() definition from vfio-common.h
>   vhost-user: add VFIO based accelerators support
> 
>  Makefile.target                 |   4 +
>  docs/interop/vhost-user.txt     |  57 +++++++++
>  hw/scsi/vhost-user-scsi.c       |   6 +-
>  hw/vfio/common.c                |  97 +++++++++++++++-
>  hw/virtio/vhost-user.c          | 248 +++++++++++++++++++++++++++++++++++++++-
>  hw/virtio/virtio-pci.c          |  48 ++++++++
>  hw/virtio/virtio-pci.h          |   5 +
>  hw/virtio/virtio.c              |  39 +++++++
>  include/hw/vfio/vfio-common.h   |  11 +-
>  include/hw/virtio/vhost-user.h  |  34 ++++++
>  include/hw/virtio/virtio-scsi.h |   6 +-
>  include/hw/virtio/virtio.h      |   5 +
>  include/qemu/osdep.h            |   1 +
>  net/vhost-user.c                |  30 ++---
>  scripts/create_config           |   3 +
>  15 files changed, 561 insertions(+), 33 deletions(-)
>  create mode 100644 include/hw/virtio/vhost-user.h
> 
> -- 
> 2.11.0

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [virtio-dev] Re: [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators
@ 2018-03-22 14:55   ` Michael S. Tsirkin
  0 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-03-22 14:55 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Mon, Mar 19, 2018 at 03:15:31PM +0800, Tiwei Bie wrote:
> This patch set does some small extensions to vhost-user protocol
> to support VFIO based accelerators, and makes it possible to get
> the similar performance of VFIO based PCI passthru while keeping
> the virtio device emulation in QEMU.

I love your patches!
Yet there are some things to improve.
Posting comments separately as individual messages.


> How does accelerator accelerate vhost (data path)
> =================================================
> 
> Any virtio ring compatible devices potentially can be used as the
> vhost data path accelerators. We can setup the accelerator based
> on the informations (e.g. memory table, features, ring info, etc)
> available on the vhost backend. And accelerator will be able to use
> the virtio ring provided by the virtio driver in the VM directly.
> So the virtio driver in the VM can exchange e.g. network packets
> with the accelerator directly via the virtio ring. That is to say,
> we will be able to use the accelerator to accelerate the vhost
> data path. We call it vDPA: vhost Data Path Acceleration.
> 
> Notice: Although the accelerator can talk with the virtio driver
> in the VM via the virtio ring directly. The control path events
> (e.g. device start/stop) in the VM will still be trapped and handled
> by QEMU, and QEMU will deliver such events to the vhost backend
> via standard vhost protocol.
> 
> Below link is an example showing how to setup a such environment
> via nested VM. In this case, the virtio device in the outer VM is
> the accelerator. It will be used to accelerate the virtio device
> in the inner VM. In reality, we could use virtio ring compatible
> hardware device as the accelerators.
> 
> http://dpdk.org/ml/archives/dev/2017-December/085044.html
> 
> In above example, it doesn't require any changes to QEMU, but
> it has lower performance compared with the traditional VFIO
> based PCI passthru. And that's the problem this patch set wants
> to solve.
> 
> The performance issue of vDPA/vhost-user and solutions
> ======================================================
> 
> For vhost-user backend, the critical issue in vDPA is that the
> data path performance is relatively low and some host threads are
> needed for the data path, because some necessary mechanisms are
> missing to support:
> 
> 1) guest driver notifies the device directly;
> 2) device interrupts the guest directly;
> 
> So this patch set does some small extensions to the vhost-user
> protocol to make both of them possible. It leverages the same
> mechanisms (e.g. EPT and Posted-Interrupt on Intel platform) as
> the PCI passthru.
> 
> A new protocol feature bit is added to negotiate the accelerator
> feature support. Two new slave message types are added to control
> the notify region and queue interrupt passthru for each queue.
> >From the view of vhost-user protocol design, it's very flexible.
> The passthru can be enabled/disabled for each queue individually,
> and it's possible to accelerate each queue by different devices.
> More design and implementation details can be found from the last
> patch.
> 
> Difference between vDPA and PCI passthru
> ========================================
> 
> The key difference between PCI passthru and vDPA is that, in vDPA
> only the data path of the device (e.g. DMA ring, notify region and
> queue interrupt) is pass-throughed to the VM, the device control
> path (e.g. PCI configuration space and MMIO regions) is still
> defined and emulated by QEMU.
> 
> The benefits of keeping virtio device emulation in QEMU compared
> with virtio device PCI passthru include (but not limit to):
> 
> - consistent device interface for guest OS in the VM;
> - max flexibility on the hardware (i.e. the accelerators) design;
> - leveraging the existing virtio live-migration framework;
> 
> Why extend vhost-user for vDPA
> ==============================
> 
> We have already implemented various virtual switches (e.g. OVS-DPDK)
> based on vhost-user for VMs in the Cloud. They are purely software
> running on CPU cores. When we have accelerators for such NFVi applications,
> it's ideal if the applications could keep using the original interface
> (i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
> when and how to switch between CPU and accelerators within the interface.
> And the switching (i.e. switch between CPU and accelerators) can be done
> flexibly and quickly inside the applications.
> 
> More details about this can be found from the Cunming's discussions on
> the RFC patch set.
> 
> Update notes
> ============
> 
> IOMMU feature bit check is removed in this version, because:
> 
> The IOMMU feature is negotiable, when an accelerator is used and
> it doesn't support virtual IOMMU, its driver just won't provide
> this feature bit when vhost library querying its features. And if
> it supports the virtual IOMMU, its driver can provide this feature
> bit. It's not reasonable to add this limitation in this patch set.
> 
> The previous links:
> RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
> v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html
> 
> v1 -> v2:
> - Add some explanations about why extend vhost-user in commit log (Paolo);
> - Bug fix in slave_read() according to Stefan's fix in DPDK;
> - Remove IOMMU feature check and related commit log;
> - Some minor refinements;
> - Rebase to the latest QEMU;
> 
> RFC -> v1:
> - Add some details about how vDPA works in cover letter (Alexey)
> - Add some details about the OVS offload use-case in cover letter (Jason)
> - Move PCI specific stuffs out of vhost-user (Jason)
> - Handle the virtual IOMMU case (Jason)
> - Move VFIO group management code into vfio/common.c (Alex)
> - Various refinements;
> (approximately sorted by comment posting time)
> 
> Tiwei Bie (6):
>   vhost-user: support receiving file descriptors in slave_read
>   vhost-user: introduce shared vhost-user state
>   virtio: support adding sub-regions for notify region
>   vfio: support getting VFIOGroup from groupfd
>   vfio: remove DPRINTF() definition from vfio-common.h
>   vhost-user: add VFIO based accelerators support
> 
>  Makefile.target                 |   4 +
>  docs/interop/vhost-user.txt     |  57 +++++++++
>  hw/scsi/vhost-user-scsi.c       |   6 +-
>  hw/vfio/common.c                |  97 +++++++++++++++-
>  hw/virtio/vhost-user.c          | 248 +++++++++++++++++++++++++++++++++++++++-
>  hw/virtio/virtio-pci.c          |  48 ++++++++
>  hw/virtio/virtio-pci.h          |   5 +
>  hw/virtio/virtio.c              |  39 +++++++
>  include/hw/vfio/vfio-common.h   |  11 +-
>  include/hw/virtio/vhost-user.h  |  34 ++++++
>  include/hw/virtio/virtio-scsi.h |   6 +-
>  include/hw/virtio/virtio.h      |   5 +
>  include/qemu/osdep.h            |   1 +
>  net/vhost-user.c                |  30 ++---
>  scripts/create_config           |   3 +
>  15 files changed, 561 insertions(+), 33 deletions(-)
>  create mode 100644 include/hw/virtio/vhost-user.h
> 
> -- 
> 2.11.0

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/6] virtio: support adding sub-regions for notify region
  2018-03-19  7:15   ` [virtio-dev] " Tiwei Bie
@ 2018-03-22 14:57     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-03-22 14:57 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Mon, Mar 19, 2018 at 03:15:34PM +0800, Tiwei Bie wrote:
> Provide APIs to support querying whether the page-per-vq
> is enabled and adding sub-regions for notify region.
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>  Makefile.target            |  4 ++++
>  hw/virtio/virtio-pci.c     | 48 ++++++++++++++++++++++++++++++++++++++++++++++
>  hw/virtio/virtio-pci.h     |  5 +++++
>  hw/virtio/virtio.c         | 39 +++++++++++++++++++++++++++++++++++++
>  include/hw/virtio/virtio.h |  5 +++++
>  include/qemu/osdep.h       |  1 +
>  scripts/create_config      |  3 +++
>  7 files changed, 105 insertions(+)
> 
> diff --git a/Makefile.target b/Makefile.target
> index 6549481096..b2cf618dc9 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -39,6 +39,9 @@ STPFILES=
>  config-target.h: config-target.h-timestamp
>  config-target.h-timestamp: config-target.mak
>  
> +config-devices.h: config-devices.h-timestamp
> +config-devices.h-timestamp: config-devices.mak
> +
>  ifdef CONFIG_TRACE_SYSTEMTAP
>  stap: $(QEMU_PROG).stp-installed $(QEMU_PROG).stp $(QEMU_PROG)-simpletrace.stp
>

So +config-devices.h is made from config-devices.h-timestamp
and config-devices.h-timestamp from config-devices.mak

What is config-devices.mak made from?

  
> @@ -224,4 +227,5 @@ ifdef CONFIG_TRACE_SYSTEMTAP
>  endif
>  
>  GENERATED_FILES += config-target.h
> +GENERATED_FILES += config-devices.h
>  Makefile: $(GENERATED_FILES)
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 1e8ab7bbc5..b17471092a 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1534,6 +1534,54 @@ static void virtio_pci_modern_io_region_unmap(VirtIOPCIProxy *proxy,
>                                  &region->mr);
>  }
>  
> +static VirtIOPCIProxy *virtio_device_to_virtio_pci_proxy(VirtIODevice *vdev)
> +{
> +    VirtIOPCIProxy *proxy = NULL;
> +
> +    if (vdev->device_id == VIRTIO_ID_NET) {
> +        VirtIONetPCI *d = container_of(vdev, VirtIONetPCI, vdev.parent_obj);
> +        proxy = &d->parent_obj;
> +    }
> +
> +    return proxy;
> +}
> +
> +bool virtio_pci_page_per_vq_enabled(VirtIODevice *vdev)
> +{
> +    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
> +
> +    if (proxy == NULL) {
> +        return false;
> +    }
> +
> +    return !!(proxy->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ);
> +}
> +

VIRTIO_PCI_FLAG_PAGE_PER_VQ is not something external users
should care about. Need to find some other way to express the
specific requirements.

In particular do you want to use a host page per VQ?

This isn't what VIRTIO_PCI_FLAG_PAGE_PER_VQ does - it uses a 4K offset
which does not match a memory page size on all platforms.


> +int virtio_pci_notify_region_map(VirtIODevice *vdev, int queue_idx,
> +                                 MemoryRegion *mr)
> +{
> +    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
> +    int offset;
> +
> +    if (proxy == NULL || !virtio_pci_modern(proxy)) {
> +        return -1;
> +    }
> +
> +    offset = virtio_pci_queue_mem_mult(proxy) * queue_idx;
> +    memory_region_add_subregion(&proxy->notify.mr, offset, mr);
> +
> +    return 0;
> +}
> +
> +void virtio_pci_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr)
> +{
> +    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
> +
> +    if (proxy != NULL) {
> +        memory_region_del_subregion(&proxy->notify.mr, mr);
> +    }
> +}
> +
>  static void virtio_pci_pre_plugged(DeviceState *d, Error **errp)
>  {
>      VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
> diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
> index 813082b0d7..8061133741 100644
> --- a/hw/virtio/virtio-pci.h
> +++ b/hw/virtio/virtio-pci.h
> @@ -213,6 +213,11 @@ static inline void virtio_pci_disable_modern(VirtIOPCIProxy *proxy)
>      proxy->disable_modern = true;
>  }
>  
> +bool virtio_pci_page_per_vq_enabled(VirtIODevice *vdev);
> +int virtio_pci_notify_region_map(VirtIODevice *vdev, int queue_idx,
> +                                 MemoryRegion *mr);
> +void virtio_pci_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr);
> +
>  /*
>   * virtio-scsi-pci: This extends VirtioPCIProxy.
>   */

These are not great APIs unfortunately. Need to come up with generic names.
E.g. do we register and de-register host notifiers maybe?


> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 006d3d1148..90ee72984c 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -22,6 +22,7 @@
>  #include "qemu/atomic.h"
>  #include "hw/virtio/virtio-bus.h"
>  #include "hw/virtio/virtio-access.h"
> +#include "hw/virtio/virtio-pci.h"
>  #include "sysemu/dma.h"
>  
>  /*
> @@ -2681,6 +2682,44 @@ void virtio_device_release_ioeventfd(VirtIODevice *vdev)
>      virtio_bus_release_ioeventfd(vbus);
>  }
>  
> +bool virtio_device_parent_is_pci_device(VirtIODevice *vdev)
> +{
> +    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
> +    const char *typename = object_get_typename(OBJECT(qbus->parent));
> +
> +    return strstr(typename, "pci") != NULL;
> +}
> +
> +bool virtio_device_page_per_vq_enabled(VirtIODevice *vdev)
> +{
> +#ifdef CONFIG_VIRTIO_PCI
> +    if (virtio_device_parent_is_pci_device(vdev)) {
> +        return virtio_pci_page_per_vq_enabled(vdev);
> +    }
> +#endif
> +    return false;
> +}
> +

A better way to do this is to pass a callback to the bus where each bus can
implement its own.


> +int virtio_device_notify_region_map(VirtIODevice *vdev, int queue_idx,
> +                                    MemoryRegion *mr)
> +{
> +#ifdef CONFIG_VIRTIO_PCI
> +    if (virtio_device_parent_is_pci_device(vdev)) {
> +        return virtio_pci_notify_region_map(vdev, queue_idx, mr);
> +    }
> +#endif
> +    return -1;
> +}
> +
> +void virtio_device_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr)
> +{
> +#ifdef CONFIG_VIRTIO_PCI
> +    if (virtio_device_parent_is_pci_device(vdev)) {
> +        virtio_pci_notify_region_unmap(vdev, mr);
> +    }
> +#endif
> +}
> +
>  static void virtio_device_class_init(ObjectClass *klass, void *data)
>  {
>      /* Set the default value here. */
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 098bdaaea3..b14accdb08 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -285,6 +285,11 @@ void virtio_device_stop_ioeventfd(VirtIODevice *vdev);
>  int virtio_device_grab_ioeventfd(VirtIODevice *vdev);
>  void virtio_device_release_ioeventfd(VirtIODevice *vdev);
>  bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev);
> +bool virtio_device_parent_is_pci_device(VirtIODevice *vdev);
> +bool virtio_device_page_per_vq_enabled(VirtIODevice *vdev);
> +int virtio_device_notify_region_map(VirtIODevice *vdev, int queue_idx,
> +                                    MemoryRegion *mr);
> +void virtio_device_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr);
>  EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq);
>  void virtio_queue_host_notifier_read(EventNotifier *n);
>  void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index 41658060a7..2532c278ef 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -30,6 +30,7 @@
>  #include "config-host.h"
>  #ifdef NEED_CPU_H
>  #include "config-target.h"
> +#include "config-devices.h"
>  #else
>  #include "exec/poison.h"
>  #endif

This confuses me.
What does config-devices.h have to do with cpu.h?
And why do we want every single file in qemu to pull this in?

> diff --git a/scripts/create_config b/scripts/create_config
> index d727e5e36e..e4541a51ed 100755
> --- a/scripts/create_config
> +++ b/scripts/create_config
> @@ -58,6 +58,9 @@ case $line in
>      name=${line%=*}
>      echo "#define $name 1"
>      ;;
> + CONFIG_*='$(CONFIG_'*')') # configuration
> +    continue
> +    ;;

Can't figure out what this does. The comment does not help unfortunately.


>   CONFIG_*=*) # configuration
>      name=${line%=*}
>      value=${line#*=}
> -- 
> 2.11.0

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [virtio-dev] Re: [PATCH v2 3/6] virtio: support adding sub-regions for notify region
@ 2018-03-22 14:57     ` Michael S. Tsirkin
  0 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-03-22 14:57 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Mon, Mar 19, 2018 at 03:15:34PM +0800, Tiwei Bie wrote:
> Provide APIs to support querying whether the page-per-vq
> is enabled and adding sub-regions for notify region.
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>  Makefile.target            |  4 ++++
>  hw/virtio/virtio-pci.c     | 48 ++++++++++++++++++++++++++++++++++++++++++++++
>  hw/virtio/virtio-pci.h     |  5 +++++
>  hw/virtio/virtio.c         | 39 +++++++++++++++++++++++++++++++++++++
>  include/hw/virtio/virtio.h |  5 +++++
>  include/qemu/osdep.h       |  1 +
>  scripts/create_config      |  3 +++
>  7 files changed, 105 insertions(+)
> 
> diff --git a/Makefile.target b/Makefile.target
> index 6549481096..b2cf618dc9 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -39,6 +39,9 @@ STPFILES=
>  config-target.h: config-target.h-timestamp
>  config-target.h-timestamp: config-target.mak
>  
> +config-devices.h: config-devices.h-timestamp
> +config-devices.h-timestamp: config-devices.mak
> +
>  ifdef CONFIG_TRACE_SYSTEMTAP
>  stap: $(QEMU_PROG).stp-installed $(QEMU_PROG).stp $(QEMU_PROG)-simpletrace.stp
>

So +config-devices.h is made from config-devices.h-timestamp
and config-devices.h-timestamp from config-devices.mak

What is config-devices.mak made from?

  
> @@ -224,4 +227,5 @@ ifdef CONFIG_TRACE_SYSTEMTAP
>  endif
>  
>  GENERATED_FILES += config-target.h
> +GENERATED_FILES += config-devices.h
>  Makefile: $(GENERATED_FILES)
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 1e8ab7bbc5..b17471092a 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1534,6 +1534,54 @@ static void virtio_pci_modern_io_region_unmap(VirtIOPCIProxy *proxy,
>                                  &region->mr);
>  }
>  
> +static VirtIOPCIProxy *virtio_device_to_virtio_pci_proxy(VirtIODevice *vdev)
> +{
> +    VirtIOPCIProxy *proxy = NULL;
> +
> +    if (vdev->device_id == VIRTIO_ID_NET) {
> +        VirtIONetPCI *d = container_of(vdev, VirtIONetPCI, vdev.parent_obj);
> +        proxy = &d->parent_obj;
> +    }
> +
> +    return proxy;
> +}
> +
> +bool virtio_pci_page_per_vq_enabled(VirtIODevice *vdev)
> +{
> +    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
> +
> +    if (proxy == NULL) {
> +        return false;
> +    }
> +
> +    return !!(proxy->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ);
> +}
> +

VIRTIO_PCI_FLAG_PAGE_PER_VQ is not something external users
should care about. Need to find some other way to express the
specific requirements.

In particular do you want to use a host page per VQ?

This isn't what VIRTIO_PCI_FLAG_PAGE_PER_VQ does - it uses a 4K offset
which does not match a memory page size on all platforms.


> +int virtio_pci_notify_region_map(VirtIODevice *vdev, int queue_idx,
> +                                 MemoryRegion *mr)
> +{
> +    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
> +    int offset;
> +
> +    if (proxy == NULL || !virtio_pci_modern(proxy)) {
> +        return -1;
> +    }
> +
> +    offset = virtio_pci_queue_mem_mult(proxy) * queue_idx;
> +    memory_region_add_subregion(&proxy->notify.mr, offset, mr);
> +
> +    return 0;
> +}
> +
> +void virtio_pci_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr)
> +{
> +    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
> +
> +    if (proxy != NULL) {
> +        memory_region_del_subregion(&proxy->notify.mr, mr);
> +    }
> +}
> +
>  static void virtio_pci_pre_plugged(DeviceState *d, Error **errp)
>  {
>      VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
> diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
> index 813082b0d7..8061133741 100644
> --- a/hw/virtio/virtio-pci.h
> +++ b/hw/virtio/virtio-pci.h
> @@ -213,6 +213,11 @@ static inline void virtio_pci_disable_modern(VirtIOPCIProxy *proxy)
>      proxy->disable_modern = true;
>  }
>  
> +bool virtio_pci_page_per_vq_enabled(VirtIODevice *vdev);
> +int virtio_pci_notify_region_map(VirtIODevice *vdev, int queue_idx,
> +                                 MemoryRegion *mr);
> +void virtio_pci_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr);
> +
>  /*
>   * virtio-scsi-pci: This extends VirtioPCIProxy.
>   */

These are not great APIs unfortunately. Need to come up with generic names.
E.g. do we register and de-register host notifiers maybe?


> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 006d3d1148..90ee72984c 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -22,6 +22,7 @@
>  #include "qemu/atomic.h"
>  #include "hw/virtio/virtio-bus.h"
>  #include "hw/virtio/virtio-access.h"
> +#include "hw/virtio/virtio-pci.h"
>  #include "sysemu/dma.h"
>  
>  /*
> @@ -2681,6 +2682,44 @@ void virtio_device_release_ioeventfd(VirtIODevice *vdev)
>      virtio_bus_release_ioeventfd(vbus);
>  }
>  
> +bool virtio_device_parent_is_pci_device(VirtIODevice *vdev)
> +{
> +    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
> +    const char *typename = object_get_typename(OBJECT(qbus->parent));
> +
> +    return strstr(typename, "pci") != NULL;
> +}
> +
> +bool virtio_device_page_per_vq_enabled(VirtIODevice *vdev)
> +{
> +#ifdef CONFIG_VIRTIO_PCI
> +    if (virtio_device_parent_is_pci_device(vdev)) {
> +        return virtio_pci_page_per_vq_enabled(vdev);
> +    }
> +#endif
> +    return false;
> +}
> +

A better way to do this is to pass a callback to the bus where each bus can
implement its own.


> +int virtio_device_notify_region_map(VirtIODevice *vdev, int queue_idx,
> +                                    MemoryRegion *mr)
> +{
> +#ifdef CONFIG_VIRTIO_PCI
> +    if (virtio_device_parent_is_pci_device(vdev)) {
> +        return virtio_pci_notify_region_map(vdev, queue_idx, mr);
> +    }
> +#endif
> +    return -1;
> +}
> +
> +void virtio_device_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr)
> +{
> +#ifdef CONFIG_VIRTIO_PCI
> +    if (virtio_device_parent_is_pci_device(vdev)) {
> +        virtio_pci_notify_region_unmap(vdev, mr);
> +    }
> +#endif
> +}
> +
>  static void virtio_device_class_init(ObjectClass *klass, void *data)
>  {
>      /* Set the default value here. */
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 098bdaaea3..b14accdb08 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -285,6 +285,11 @@ void virtio_device_stop_ioeventfd(VirtIODevice *vdev);
>  int virtio_device_grab_ioeventfd(VirtIODevice *vdev);
>  void virtio_device_release_ioeventfd(VirtIODevice *vdev);
>  bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev);
> +bool virtio_device_parent_is_pci_device(VirtIODevice *vdev);
> +bool virtio_device_page_per_vq_enabled(VirtIODevice *vdev);
> +int virtio_device_notify_region_map(VirtIODevice *vdev, int queue_idx,
> +                                    MemoryRegion *mr);
> +void virtio_device_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr);
>  EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq);
>  void virtio_queue_host_notifier_read(EventNotifier *n);
>  void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index 41658060a7..2532c278ef 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -30,6 +30,7 @@
>  #include "config-host.h"
>  #ifdef NEED_CPU_H
>  #include "config-target.h"
> +#include "config-devices.h"
>  #else
>  #include "exec/poison.h"
>  #endif

This confuses me.
What does config-devices.h have to do with cpu.h?
And why do we want every single file in qemu to pull this in?

> diff --git a/scripts/create_config b/scripts/create_config
> index d727e5e36e..e4541a51ed 100755
> --- a/scripts/create_config
> +++ b/scripts/create_config
> @@ -58,6 +58,9 @@ case $line in
>      name=${line%=*}
>      echo "#define $name 1"
>      ;;
> + CONFIG_*='$(CONFIG_'*')') # configuration
> +    continue
> +    ;;

Can't figure out what this does. The comment does not help unfortunately.


>   CONFIG_*=*) # configuration
>      name=${line%=*}
>      value=${line#*=}
> -- 
> 2.11.0

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/6] vhost-user: introduce shared vhost-user state
  2018-03-19  7:15   ` [virtio-dev] " Tiwei Bie
@ 2018-03-22 15:13     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-03-22 15:13 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Mon, Mar 19, 2018 at 03:15:33PM +0800, Tiwei Bie wrote:
> @@ -22,7 +23,7 @@
>  
>  typedef struct VhostUserState {
>      NetClientState nc;
> -    CharBackend chr; /* only queue index 0 */
> +    VhostUser vhost_user; /* only queue index 0 */
>      VHostNetState *vhost_net;
>      guint watch;
>      uint64_t acked_features;

Is the comment still valid?

> @@ -64,7 +65,7 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
>      }
>  }
>  
> -static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
> +static int vhost_user_start(int queues, NetClientState *ncs[], void *be)
>  {
>      VhostNetOptions options;
>      struct vhost_net *net = NULL;

Type safety going away here. This is actually pretty scary:
are we sure no users cast this pointer to CharBackend?

For example it seems that vhost_user_init does exactly that.

Need to find a way to add type safety before making
such a change.


> @@ -158,7 +159,7 @@ static void vhost_user_cleanup(NetClientState *nc)
>              g_source_remove(s->watch);
>              s->watch = 0;
>          }
> -        qemu_chr_fe_deinit(&s->chr, true);
> +        qemu_chr_fe_deinit(&s->vhost_user.chr, true);
>      }
>  
>      qemu_purge_queued_packets(nc);
> @@ -192,7 +193,7 @@ static gboolean net_vhost_user_watch(GIOChannel *chan, GIOCondition cond,
>  {
>      VhostUserState *s = opaque;
>  
> -    qemu_chr_fe_disconnect(&s->chr);
> +    qemu_chr_fe_disconnect(&s->vhost_user.chr);
>  
>      return TRUE;
>  }
> @@ -217,7 +218,8 @@ static void chr_closed_bh(void *opaque)
>      qmp_set_link(name, false, &err);
>      vhost_user_stop(queues, ncs);
>  
> -    qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, net_vhost_user_event,
> +    qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL,
> +                             net_vhost_user_event,
>                               NULL, opaque, NULL, true);
>  
>      if (err) {
> @@ -240,15 +242,15 @@ static void net_vhost_user_event(void *opaque, int event)
>      assert(queues < MAX_QUEUE_NUM);
>  
>      s = DO_UPCAST(VhostUserState, nc, ncs[0]);
> -    chr = qemu_chr_fe_get_driver(&s->chr);
> +    chr = qemu_chr_fe_get_driver(&s->vhost_user.chr);
>      trace_vhost_user_event(chr->label, event);
>      switch (event) {
>      case CHR_EVENT_OPENED:
> -        if (vhost_user_start(queues, ncs, &s->chr) < 0) {
> -            qemu_chr_fe_disconnect(&s->chr);
> +        if (vhost_user_start(queues, ncs, &s->vhost_user) < 0) {
> +            qemu_chr_fe_disconnect(&s->vhost_user.chr);
>              return;
>          }
> -        s->watch = qemu_chr_fe_add_watch(&s->chr, G_IO_HUP,
> +        s->watch = qemu_chr_fe_add_watch(&s->vhost_user.chr, G_IO_HUP,
>                                           net_vhost_user_watch, s);
>          qmp_set_link(name, true, &err);
>          s->started = true;
> @@ -264,8 +266,8 @@ static void net_vhost_user_event(void *opaque, int event)
>  
>              g_source_remove(s->watch);
>              s->watch = 0;
> -            qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
> -                                     NULL, NULL, false);
> +            qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL, NULL,
> +                                     NULL, NULL, NULL, false);
>  
>              aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
>          }
> @@ -297,7 +299,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
>          if (!nc0) {
>              nc0 = nc;
>              s = DO_UPCAST(VhostUserState, nc, nc);
> -            if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
> +            if (!qemu_chr_fe_init(&s->vhost_user.chr, chr, &err)) {
>                  error_report_err(err);
>                  return -1;
>              }
> @@ -307,11 +309,11 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
>  
>      s = DO_UPCAST(VhostUserState, nc, nc0);
>      do {
> -        if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
> +        if (qemu_chr_fe_wait_connected(&s->vhost_user.chr, &err) < 0) {
>              error_report_err(err);
>              return -1;
>          }
> -        qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> +        qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL,
>                                   net_vhost_user_event, NULL, nc0->name, NULL,
>                                   true);
>      } while (!s->started);
> -- 
> 2.11.0

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [virtio-dev] Re: [PATCH v2 2/6] vhost-user: introduce shared vhost-user state
@ 2018-03-22 15:13     ` Michael S. Tsirkin
  0 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-03-22 15:13 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Mon, Mar 19, 2018 at 03:15:33PM +0800, Tiwei Bie wrote:
> @@ -22,7 +23,7 @@
>  
>  typedef struct VhostUserState {
>      NetClientState nc;
> -    CharBackend chr; /* only queue index 0 */
> +    VhostUser vhost_user; /* only queue index 0 */
>      VHostNetState *vhost_net;
>      guint watch;
>      uint64_t acked_features;

Is the comment still valid?

> @@ -64,7 +65,7 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
>      }
>  }
>  
> -static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
> +static int vhost_user_start(int queues, NetClientState *ncs[], void *be)
>  {
>      VhostNetOptions options;
>      struct vhost_net *net = NULL;

Type safety going away here. This is actually pretty scary:
are we sure no users cast this pointer to CharBackend?

For example it seems that vhost_user_init does exactly that.

Need to find a way to add type safety before making
such a change.


> @@ -158,7 +159,7 @@ static void vhost_user_cleanup(NetClientState *nc)
>              g_source_remove(s->watch);
>              s->watch = 0;
>          }
> -        qemu_chr_fe_deinit(&s->chr, true);
> +        qemu_chr_fe_deinit(&s->vhost_user.chr, true);
>      }
>  
>      qemu_purge_queued_packets(nc);
> @@ -192,7 +193,7 @@ static gboolean net_vhost_user_watch(GIOChannel *chan, GIOCondition cond,
>  {
>      VhostUserState *s = opaque;
>  
> -    qemu_chr_fe_disconnect(&s->chr);
> +    qemu_chr_fe_disconnect(&s->vhost_user.chr);
>  
>      return TRUE;
>  }
> @@ -217,7 +218,8 @@ static void chr_closed_bh(void *opaque)
>      qmp_set_link(name, false, &err);
>      vhost_user_stop(queues, ncs);
>  
> -    qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, net_vhost_user_event,
> +    qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL,
> +                             net_vhost_user_event,
>                               NULL, opaque, NULL, true);
>  
>      if (err) {
> @@ -240,15 +242,15 @@ static void net_vhost_user_event(void *opaque, int event)
>      assert(queues < MAX_QUEUE_NUM);
>  
>      s = DO_UPCAST(VhostUserState, nc, ncs[0]);
> -    chr = qemu_chr_fe_get_driver(&s->chr);
> +    chr = qemu_chr_fe_get_driver(&s->vhost_user.chr);
>      trace_vhost_user_event(chr->label, event);
>      switch (event) {
>      case CHR_EVENT_OPENED:
> -        if (vhost_user_start(queues, ncs, &s->chr) < 0) {
> -            qemu_chr_fe_disconnect(&s->chr);
> +        if (vhost_user_start(queues, ncs, &s->vhost_user) < 0) {
> +            qemu_chr_fe_disconnect(&s->vhost_user.chr);
>              return;
>          }
> -        s->watch = qemu_chr_fe_add_watch(&s->chr, G_IO_HUP,
> +        s->watch = qemu_chr_fe_add_watch(&s->vhost_user.chr, G_IO_HUP,
>                                           net_vhost_user_watch, s);
>          qmp_set_link(name, true, &err);
>          s->started = true;
> @@ -264,8 +266,8 @@ static void net_vhost_user_event(void *opaque, int event)
>  
>              g_source_remove(s->watch);
>              s->watch = 0;
> -            qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
> -                                     NULL, NULL, false);
> +            qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL, NULL,
> +                                     NULL, NULL, NULL, false);
>  
>              aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
>          }
> @@ -297,7 +299,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
>          if (!nc0) {
>              nc0 = nc;
>              s = DO_UPCAST(VhostUserState, nc, nc);
> -            if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
> +            if (!qemu_chr_fe_init(&s->vhost_user.chr, chr, &err)) {
>                  error_report_err(err);
>                  return -1;
>              }
> @@ -307,11 +309,11 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
>  
>      s = DO_UPCAST(VhostUserState, nc, nc0);
>      do {
> -        if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
> +        if (qemu_chr_fe_wait_connected(&s->vhost_user.chr, &err) < 0) {
>              error_report_err(err);
>              return -1;
>          }
> -        qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> +        qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL,
>                                   net_vhost_user_event, NULL, nc0->name, NULL,
>                                   true);
>      } while (!s->started);
> -- 
> 2.11.0

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/6] vfio: remove DPRINTF() definition from vfio-common.h
  2018-03-19  7:15   ` [virtio-dev] " Tiwei Bie
@ 2018-03-22 15:15     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-03-22 15:15 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Mon, Mar 19, 2018 at 03:15:36PM +0800, Tiwei Bie wrote:
> This macro isn't used by any VFIO code. And its name is
> too generic. The vfio-common.h (in include/hw/vfio) can
> be included by other modules in QEMU. It can introduce
> conflicts.
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>

This one can go ahead immediately.
Try posting as a separate patch.

> ---
>  include/hw/vfio/vfio-common.h | 9 ---------
>  1 file changed, 9 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index b820f7984c..f6aa4ae959 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -34,15 +34,6 @@
>  #define ERR_PREFIX "vfio error: %s: "
>  #define WARN_PREFIX "vfio warning: %s: "
>  
> -/*#define DEBUG_VFIO*/
> -#ifdef DEBUG_VFIO
> -#define DPRINTF(fmt, ...) \
> -    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
> -#else
> -#define DPRINTF(fmt, ...) \
> -    do { } while (0)
> -#endif
> -
>  enum {
>      VFIO_DEVICE_TYPE_PCI = 0,
>      VFIO_DEVICE_TYPE_PLATFORM = 1,
> -- 
> 2.11.0

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [virtio-dev] Re: [PATCH v2 5/6] vfio: remove DPRINTF() definition from vfio-common.h
@ 2018-03-22 15:15     ` Michael S. Tsirkin
  0 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-03-22 15:15 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Mon, Mar 19, 2018 at 03:15:36PM +0800, Tiwei Bie wrote:
> This macro isn't used by any VFIO code. And its name is
> too generic. The vfio-common.h (in include/hw/vfio) can
> be included by other modules in QEMU. It can introduce
> conflicts.
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>

This one can go ahead immediately.
Try posting as a separate patch.

> ---
>  include/hw/vfio/vfio-common.h | 9 ---------
>  1 file changed, 9 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index b820f7984c..f6aa4ae959 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -34,15 +34,6 @@
>  #define ERR_PREFIX "vfio error: %s: "
>  #define WARN_PREFIX "vfio warning: %s: "
>  
> -/*#define DEBUG_VFIO*/
> -#ifdef DEBUG_VFIO
> -#define DPRINTF(fmt, ...) \
> -    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
> -#else
> -#define DPRINTF(fmt, ...) \
> -    do { } while (0)
> -#endif
> -
>  enum {
>      VFIO_DEVICE_TYPE_PCI = 0,
>      VFIO_DEVICE_TYPE_PLATFORM = 1,
> -- 
> 2.11.0

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 6/6] vhost-user: add VFIO based accelerators support
  2018-03-19  7:15   ` [virtio-dev] " Tiwei Bie
@ 2018-03-22 16:19     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-03-22 16:19 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Mon, Mar 19, 2018 at 03:15:37PM +0800, Tiwei Bie wrote:
> This patch does some small extensions to vhost-user protocol to
> support VFIO based accelerators, and makes it possible to get the
> similar performance of VFIO based PCI passthru while keeping the
> virtio device emulation in QEMU.
> 
> Any virtio ring compatible devices potentially can be used as the
> vhost data path accelerators. We can setup the accelerator based
> on the informations (e.g. memory table, features, ring info, etc)
> available on the vhost backend. And accelerator will be able to use
> the virtio ring provided by the virtio driver in the VM directly.
> So the virtio driver in the VM can exchange e.g. network packets
> with the accelerator directly via the virtio ring.
> 
> But for vhost-user, the critical issue in this case is that the
> data path performance is relatively low and some host threads are
> needed for the data path, because some necessary mechanisms are
> missing to support:
> 
> 1) guest driver notifies the device directly;
> 2) device interrupts the guest directly;
> 
> So this patch does some small extensions to vhost-user protocol
> to make both of them possible. It leverages the same mechanisms
> as the VFIO based PCI passthru.
> 
> A new protocol feature bit is added to negotiate the accelerator
> feature support. Two new slave message types are added to control
> the notify region and queue interrupt passthru for each queue.
> >From the view of vhost-user protocol design, it's very flexible.
> The passthru can be enabled/disabled for each queue individually,
> and it's possible to accelerate each queue by different devices.
> 
> The key difference with PCI passthru is that, in this case only
> the data path of the device (e.g. DMA ring, notify region and
> queue interrupt) is pass-throughed to the VM, the device control
> path (e.g. PCI configuration space and MMIO regions) is still
> defined and emulated by QEMU.
> 
> The benefits of keeping virtio device emulation in QEMU compared
> with virtio device PCI passthru include (but not limit to):
> 
> - consistent device interface for guest OS in the VM;
> - max flexibility on the hardware (i.e. the accelerators) design;
> - leveraging the existing virtio live-migration framework;
> 
> Normally, vhost-user is meant for connecting to e.g. user-space
> switch which is shared between multiple VMs. Typically, a vhost
> accelerator isn't a simple NIC which is just for packet I/O, but
> e.g. an switch accelerator which is also shared between multiple
> VMs. This commit extends vhost-user to better support connecting
> to e.g. a user-space switch that has an accelerator.
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>  docs/interop/vhost-user.txt    |  57 ++++++++++++
>  hw/virtio/vhost-user.c         | 198 +++++++++++++++++++++++++++++++++++++++++
>  include/hw/virtio/vhost-user.h |  17 ++++
>  3 files changed, 272 insertions(+)
> 
> diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> index cb3a7595aa..264a58a800 100644
> --- a/docs/interop/vhost-user.txt
> +++ b/docs/interop/vhost-user.txt
> @@ -132,6 +132,15 @@ Depending on the request type, payload can be:
>     Payload: Size bytes array holding the contents of the virtio
>         device's configuration space
>  
> + * Vring area description
> +   -----------------------
> +   | u64 | size | offset |
> +   -----------------------
> +
> +   u64: a 64-bit unsigned integer
> +   Size: a 64-bit size
> +   Offset: a 64-bit offset
> +
>  In QEMU the vhost-user message is implemented with the following struct:
>  
>  typedef struct VhostUserMsg {

I see you modeled this after a generic message such as vring state
description, but that one is used in many msgs, that is
why it is not documented in a single place.

vring address description is a better model for how to document this
message.

> @@ -146,6 +155,7 @@ typedef struct VhostUserMsg {
>          VhostUserLog log;
>          struct vhost_iotlb_msg iotlb;
>          VhostUserConfig config;
> +        VhostUserVringArea area;
>      };
>  } QEMU_PACKED VhostUserMsg;
>  
> @@ -358,6 +368,17 @@ The fd is provided via VHOST_USER_SET_SLAVE_REQ_FD ancillary data.
>  A slave may then send VHOST_USER_SLAVE_* messages to the master
>  using this fd communication channel.
>  
> +VFIO based accelerators
> +-----------------------
> +
> +The VFIO based accelerators feature is a protocol extension. It is supported
> +when the protocol feature VHOST_USER_PROTOCOL_F_VFIO (bit 7) is set.
> +
> +The vhost-user backend will set the accelerator context via slave channel,
> +and QEMU just needs to handle those messages passively.

I didn't understand the above unfortunately.
accelerator context and passively do not seem to be defined anywhere.
What do these terms mean here?

How is the backend supposed to use this?
Could you describe this in a way that will make it possible
for backend writers to use?


> The accelerator
> +context will be set for each queue independently. So the page-per-vq property
> +should also be enabled.

Backend author is unlikely to know what does page-per-vq property mean.

Is this intended for users maybe? docs/interop is not the best place
for user-facing documentation.

I also wonder:

	commit d9997d89a4a09a330a056929d06d4b7b0b7a8239
	Author: Marcel Apfelbaum <marcel@redhat.com>
	Date:   Wed Sep 7 18:02:25 2016 +0300

	    virtio-pci: reduce modern_mem_bar size
	    
	    Currently each VQ Notification Virtio Capability is allocated
	    on a different page. The idea is to enable split drivers within
	    guests, however there are no known plans to do that.
	    The allocation will result in a 8MB BAR, more than various
	    guest firmwares pre-allocates for PCI Bridges hotplug process.
    
looks like enabling page per vq will break pci express hotplug.
I suspect more work is needed to down-size the BAR to # of VQs
actually supported.



> +
>  Protocol features
>  -----------------
>  
> @@ -369,6 +390,7 @@ Protocol features
>  #define VHOST_USER_PROTOCOL_F_SLAVE_REQ      5
>  #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN   6
>  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> +#define VHOST_USER_PROTOCOL_F_VFIO           8
>  
>  Master message types
>  --------------------
> @@ -722,6 +744,41 @@ Slave message types
>       respond with zero when operation is successfully completed, or non-zero
>       otherwise.
>  
> + * VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG
> +
> +      Id: 3
> +      Equivalent ioctl: N/A
> +      Slave payload: u64
> +      Master payload: N/A
> +
> +      Sets the VFIO group file descriptor which is passed as ancillary data
> +      for a specified queue (queue index is carried in the u64 payload).
> +      Slave sends this request to tell QEMU to add or delete a VFIO group.

add or delete it where?

> +      QEMU will delete the current group if any for the specified queue when
> +      the message is sent without a file descriptor. A VFIO group will be
> +      actually deleted when its reference count reaches zero.
> +      This request should be sent only when VHOST_USER_PROTOCOL_F_VFIO protocol
> +      feature has been successfully negotiated.

I think this text should refer reader to Documentation/vfio.txt in Linux
(you can add a link), and explain how to use it
in terms consistent with that document.

I also wonder how does this interact with the vIOMMU.


To put it another way, I think what is really going on here
is that slave has configured a VFIO device and is asking the
master to enable that device to interrupt the guest
directly (without using an eventfd)?

So qemu offered support for the VFIO extension and slave is using it.
What assumptions can slave make then?


> +
> + * VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG
> +
> +      Id: 4
> +      Equivalent ioctl: N/A
> +      Slave payload: vring area description
> +      Master payload: N/A
> +
> +      Sets the notify area for a specified queue (queue index is carried
> +      in the u64 field of the vring area description). A file descriptor is
> +      passed as ancillary data (typically it's a VFIO device fd). QEMU can
> +      mmap the file descriptor based on the information carried in the vring
> +      area description.

Based on it how? What do all fields mean?

> +      Slave sends this request to tell QEMU to add or delete a MemoryRegion
> +      for a specified queue's notify MMIO region. QEMU will delete the current
> +      MemoryRegion if any for the specified queue when the message is sent
> +      without a file descriptor.
> +      This request should be sent only when VHOST_USER_PROTOCOL_F_VFIO protocol
> +      feature and VIRTIO_F_VERSION_1 feature have been successfully negotiated.
> +

This message is a bit easier to understand.
So this allows backend to replace the notification eventfd
with a memory area.


but readers of this
document do not know what MemoryRegion is.


>  VHOST_USER_PROTOCOL_F_REPLY_ACK:
>  -------------------------------
>  The original vhost-user specification only demands replies for certain
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index b228994ffd..07fc63c6e8 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -42,6 +42,7 @@ enum VhostUserProtocolFeature {
>      VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5,
>      VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6,
>      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> +    VHOST_USER_PROTOCOL_F_VFIO = 8,
>  
>      VHOST_USER_PROTOCOL_F_MAX
>  };
> @@ -84,6 +85,8 @@ typedef enum VhostUserSlaveRequest {
>      VHOST_USER_SLAVE_NONE = 0,
>      VHOST_USER_SLAVE_IOTLB_MSG = 1,
>      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> +    VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG = 3,
> +    VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG = 4,
>      VHOST_USER_SLAVE_MAX
>  }  VhostUserSlaveRequest;
>  
> @@ -128,6 +131,12 @@ static VhostUserConfig c __attribute__ ((unused));
>                                     + sizeof(c.size) \
>                                     + sizeof(c.flags))
>  
> +typedef struct VhostUserVringArea {
> +    uint64_t u64;
> +    uint64_t size;
> +    uint64_t offset;
> +} VhostUserVringArea;
> +
>  typedef struct {
>      VhostUserRequest request;
>  
> @@ -149,6 +158,7 @@ typedef union {
>          struct vhost_iotlb_msg iotlb;
>          VhostUserConfig config;
>          VhostUserCryptoSession session;
> +        VhostUserVringArea area;
>  } VhostUserPayload;
>  
>  typedef struct VhostUserMsg {
> @@ -459,9 +469,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
>      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
>  }
>  
> +static void vhost_user_notify_region_remap(struct vhost_dev *dev, int queue_idx)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserVFIOState *vfio = &u->shared->vfio;
> +    VhostUserNotifyCtx *notify = &vfio->notify[queue_idx];
> +    VirtIODevice *vdev = dev->vdev;
> +
> +    if (notify->addr && !notify->mapped) {
> +        virtio_device_notify_region_map(vdev, queue_idx, &notify->mr);
> +        notify->mapped = true;
> +    }
> +}
> +
> +static void vhost_user_notify_region_unmap(struct vhost_dev *dev, int queue_idx)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserVFIOState *vfio = &u->shared->vfio;
> +    VhostUserNotifyCtx *notify = &vfio->notify[queue_idx];
> +    VirtIODevice *vdev = dev->vdev;
> +
> +    if (notify->addr && notify->mapped) {
> +        virtio_device_notify_region_unmap(vdev, &notify->mr);
> +        notify->mapped = false;
> +    }
> +}
> +
>  static int vhost_user_set_vring_base(struct vhost_dev *dev,
>                                       struct vhost_vring_state *ring)
>  {
> +    vhost_user_notify_region_remap(dev, ring->index);
> +
>      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
>  }
>  
> @@ -495,6 +533,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
>          .hdr.size = sizeof(msg.payload.state),
>      };
>  
> +    vhost_user_notify_region_unmap(dev, ring->index);
> +
>      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
>          return -1;
>      }
> @@ -668,6 +708,133 @@ static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
>      return ret;
>  }
>  
> +static int vhost_user_handle_vring_vfio_group(struct vhost_dev *dev,
> +                                              uint64_t u64,

That's not a good variable name.

> +                                              int groupfd)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserVFIOState *vfio = &u->shared->vfio;
> +    int queue_idx = u64 & VHOST_USER_VRING_IDX_MASK;
> +    VirtIODevice *vdev = dev->vdev;
> +    VFIOGroup *group;
> +    int ret = 0;
> +
> +    qemu_mutex_lock(&vfio->lock);
> +
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_VFIO) ||
> +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    if (vfio->group[queue_idx]) {
> +        vfio_put_group(vfio->group[queue_idx]);
> +        vfio->group[queue_idx] = NULL;
> +    }
> +
> +    if (u64 & VHOST_USER_VRING_NOFD_MASK) {
> +        goto out;
> +    }
> +
> +    group = vfio_get_group_from_fd(groupfd, NULL, NULL);
> +    if (group == NULL) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    if (group->fd != groupfd) {
> +        close(groupfd);
> +    }
> +
> +    vfio->group[queue_idx] = group;
> +
> +out:
> +    kvm_irqchip_commit_routes(kvm_state);

The fact we poke at kvm_eventfds_enabled is already kind of ugly.
It would be better to just process eventfds in QEMU when we do not
and make it transparent to the backend.

I don't think vhost should touch more kvm state directly like that.



> +    qemu_mutex_unlock(&vfio->lock);
> +
> +    if (ret != 0 && groupfd != -1) {
> +        close(groupfd);
> +    }
> +
> +    return ret;
> +}
> +
> +#define NOTIFY_PAGE_SIZE 0x1000

why is this correct for all systems?

> +
> +static int vhost_user_handle_vring_notify_area(struct vhost_dev *dev,
> +                                               VhostUserVringArea *area,
> +                                               int fd)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserVFIOState *vfio = &u->shared->vfio;
> +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> +    VirtIODevice *vdev = dev->vdev;
> +    VhostUserNotifyCtx *notify;
> +    void *addr = NULL;
> +    int ret = 0;
> +    char *name;
> +
> +    qemu_mutex_lock(&vfio->lock);
> +
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_VFIO) ||
> +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev) ||
> +        !virtio_device_page_per_vq_enabled(vdev)) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    notify = &vfio->notify[queue_idx];
> +
> +    if (notify->addr) {
> +        virtio_device_notify_region_unmap(vdev, &notify->mr);
> +        munmap(notify->addr, NOTIFY_PAGE_SIZE);
> +        object_unparent(OBJECT(&notify->mr));
> +        notify->addr = NULL;
> +    }
> +
> +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> +        goto out;
> +    }
> +
> +    if (area->size < NOTIFY_PAGE_SIZE) {
> +        ret = -1;
> +        goto out;
> +    }

So that's the only use of size. Why have it at all then?

> +
> +    addr = mmap(NULL, NOTIFY_PAGE_SIZE, PROT_READ | PROT_WRITE,
> +                MAP_SHARED, fd, area->offset);

Can't we use memory_region_init_ram_from_fd?

Also, must validate the message before doing things like that.


> +    if (addr == MAP_FAILED) {
> +        error_report("Can't map notify region.");
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    name = g_strdup_printf("vhost-user/vfio@%p mmaps[%d]", vfio, queue_idx);
> +    memory_region_init_ram_device_ptr(&notify->mr, OBJECT(vdev), name,
> +                                      NOTIFY_PAGE_SIZE, addr);

This will register RAM for migration which probably isn't what you want.

> +    g_free(name);
> +
> +    if (virtio_device_notify_region_map(vdev, queue_idx, &notify->mr)) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    notify->addr = addr;
> +    notify->mapped = true;
> +
> +out:
> +    if (ret < 0 && addr != NULL) {
> +        munmap(addr, NOTIFY_PAGE_SIZE);

Does this actually do the right thing?
Don't we need to finalize the mr we created?


> +    }
> +    if (fd != -1) {
> +        close(fd);
> +    }

Who will close it if there's no error?
Looks like this leaks fds on success.

> +    qemu_mutex_unlock(&vfio->lock);
> +    return ret;
> +}
> +
>  static void slave_read(void *opaque)
>  {
>      struct vhost_dev *dev = opaque;
> @@ -734,6 +901,12 @@ static void slave_read(void *opaque)
>      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
>          ret = vhost_user_slave_handle_config_change(dev);
>          break;
> +    case VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG:
> +        ret = vhost_user_handle_vring_vfio_group(dev, payload.u64, fd);
> +        break;
> +    case VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG:
> +        ret = vhost_user_handle_vring_notify_area(dev, &payload.area, fd);
> +        break;
>      default:
>          error_report("Received unexpected msg type.");
>          if (fd != -1) {
> @@ -844,6 +1017,10 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
>      u->slave_fd = -1;
>      dev->opaque = u;
>  
> +    if (dev->vq_index == 0) {
> +        qemu_mutex_init(&u->shared->vfio.lock);
> +    }
> +
>      err = vhost_user_get_features(dev, &features);
>      if (err < 0) {
>          return err;

That seems inelegant.
Now that we have a shared vhost user state, I'd expect a
clean way to initialize it.

> @@ -904,6 +1081,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
>  static int vhost_user_cleanup(struct vhost_dev *dev)
>  {
>      struct vhost_user *u;
> +    int i;
>  
>      assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
>  
> @@ -913,6 +1091,26 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
>          close(u->slave_fd);
>          u->slave_fd = -1;
>      }
> +
> +    if (dev->vq_index == 0) {
> +        VhostUserVFIOState *vfio = &u->shared->vfio;
> +
> +        for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> +            if (vfio->notify[i].addr) {
> +                munmap(vfio->notify[i].addr, NOTIFY_PAGE_SIZE);
> +                object_unparent(OBJECT(&vfio->notify[i].mr));
> +                vfio->notify[i].addr = NULL;
> +            }
> +
> +            if (vfio->group[i]) {
> +                vfio_put_group(vfio->group[i]);
> +                vfio->group[i] = NULL;
> +            }
> +        }
> +
> +        qemu_mutex_destroy(&u->shared->vfio.lock);
> +    }
> +
>      g_free(u);
>      dev->opaque = 0;
>  

Same here.

> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> index 4f5a1477d1..de8c647962 100644
> --- a/include/hw/virtio/vhost-user.h
> +++ b/include/hw/virtio/vhost-user.h
> @@ -9,9 +9,26 @@
>  #define HW_VIRTIO_VHOST_USER_H
>  
>  #include "chardev/char-fe.h"
> +#include "hw/virtio/virtio.h"
> +#include "hw/vfio/vfio-common.h"
> +
> +typedef struct VhostUserNotifyCtx {
> +    void *addr;
> +    MemoryRegion mr;
> +    bool mapped;
> +} VhostUserNotifyCtx;
> +
> +typedef struct VhostUserVFIOState {
> +    /* The VFIO group associated with each queue */
> +    VFIOGroup *group[VIRTIO_QUEUE_MAX];
> +    /* The notify context of each queue */
> +    VhostUserNotifyCtx notify[VIRTIO_QUEUE_MAX];
> +    QemuMutex lock;
> +} VhostUserVFIOState;
>  
>  typedef struct VhostUser {
>      CharBackend chr;
> +    VhostUserVFIOState vfio;
>  } VhostUser;
>  
>  #endif
> -- 
> 2.11.0

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [virtio-dev] Re: [PATCH v2 6/6] vhost-user: add VFIO based accelerators support
@ 2018-03-22 16:19     ` Michael S. Tsirkin
  0 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-03-22 16:19 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Mon, Mar 19, 2018 at 03:15:37PM +0800, Tiwei Bie wrote:
> This patch does some small extensions to vhost-user protocol to
> support VFIO based accelerators, and makes it possible to get the
> similar performance of VFIO based PCI passthru while keeping the
> virtio device emulation in QEMU.
> 
> Any virtio ring compatible devices potentially can be used as the
> vhost data path accelerators. We can setup the accelerator based
> on the informations (e.g. memory table, features, ring info, etc)
> available on the vhost backend. And accelerator will be able to use
> the virtio ring provided by the virtio driver in the VM directly.
> So the virtio driver in the VM can exchange e.g. network packets
> with the accelerator directly via the virtio ring.
> 
> But for vhost-user, the critical issue in this case is that the
> data path performance is relatively low and some host threads are
> needed for the data path, because some necessary mechanisms are
> missing to support:
> 
> 1) guest driver notifies the device directly;
> 2) device interrupts the guest directly;
> 
> So this patch does some small extensions to vhost-user protocol
> to make both of them possible. It leverages the same mechanisms
> as the VFIO based PCI passthru.
> 
> A new protocol feature bit is added to negotiate the accelerator
> feature support. Two new slave message types are added to control
> the notify region and queue interrupt passthru for each queue.
> >From the view of vhost-user protocol design, it's very flexible.
> The passthru can be enabled/disabled for each queue individually,
> and it's possible to accelerate each queue by different devices.
> 
> The key difference with PCI passthru is that, in this case only
> the data path of the device (e.g. DMA ring, notify region and
> queue interrupt) is pass-throughed to the VM, the device control
> path (e.g. PCI configuration space and MMIO regions) is still
> defined and emulated by QEMU.
> 
> The benefits of keeping virtio device emulation in QEMU compared
> with virtio device PCI passthru include (but not limit to):
> 
> - consistent device interface for guest OS in the VM;
> - max flexibility on the hardware (i.e. the accelerators) design;
> - leveraging the existing virtio live-migration framework;
> 
> Normally, vhost-user is meant for connecting to e.g. user-space
> switch which is shared between multiple VMs. Typically, a vhost
> accelerator isn't a simple NIC which is just for packet I/O, but
> e.g. an switch accelerator which is also shared between multiple
> VMs. This commit extends vhost-user to better support connecting
> to e.g. a user-space switch that has an accelerator.
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>  docs/interop/vhost-user.txt    |  57 ++++++++++++
>  hw/virtio/vhost-user.c         | 198 +++++++++++++++++++++++++++++++++++++++++
>  include/hw/virtio/vhost-user.h |  17 ++++
>  3 files changed, 272 insertions(+)
> 
> diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> index cb3a7595aa..264a58a800 100644
> --- a/docs/interop/vhost-user.txt
> +++ b/docs/interop/vhost-user.txt
> @@ -132,6 +132,15 @@ Depending on the request type, payload can be:
>     Payload: Size bytes array holding the contents of the virtio
>         device's configuration space
>  
> + * Vring area description
> +   -----------------------
> +   | u64 | size | offset |
> +   -----------------------
> +
> +   u64: a 64-bit unsigned integer
> +   Size: a 64-bit size
> +   Offset: a 64-bit offset
> +
>  In QEMU the vhost-user message is implemented with the following struct:
>  
>  typedef struct VhostUserMsg {

I see you modeled this after a generic message such as vring state
description, but that one is used in many msgs, that is
why it is not documented in a single place.

vring address description is a better model for how to document this
message.

> @@ -146,6 +155,7 @@ typedef struct VhostUserMsg {
>          VhostUserLog log;
>          struct vhost_iotlb_msg iotlb;
>          VhostUserConfig config;
> +        VhostUserVringArea area;
>      };
>  } QEMU_PACKED VhostUserMsg;
>  
> @@ -358,6 +368,17 @@ The fd is provided via VHOST_USER_SET_SLAVE_REQ_FD ancillary data.
>  A slave may then send VHOST_USER_SLAVE_* messages to the master
>  using this fd communication channel.
>  
> +VFIO based accelerators
> +-----------------------
> +
> +The VFIO based accelerators feature is a protocol extension. It is supported
> +when the protocol feature VHOST_USER_PROTOCOL_F_VFIO (bit 7) is set.
> +
> +The vhost-user backend will set the accelerator context via slave channel,
> +and QEMU just needs to handle those messages passively.

I didn't understand the above unfortunately.
accelerator context and passively do not seem to be defined anywhere.
What do these terms mean here?

How is the backend supposed to use this?
Could you describe this in a way that will make it possible
for backend writers to use?


> The accelerator
> +context will be set for each queue independently. So the page-per-vq property
> +should also be enabled.

Backend author is unlikely to know what does page-per-vq property mean.

Is this intended for users maybe? docs/interop is not the best place
for user-facing documentation.

I also wonder:

	commit d9997d89a4a09a330a056929d06d4b7b0b7a8239
	Author: Marcel Apfelbaum <marcel@redhat.com>
	Date:   Wed Sep 7 18:02:25 2016 +0300

	    virtio-pci: reduce modern_mem_bar size
	    
	    Currently each VQ Notification Virtio Capability is allocated
	    on a different page. The idea is to enable split drivers within
	    guests, however there are no known plans to do that.
	    The allocation will result in a 8MB BAR, more than various
	    guest firmwares pre-allocates for PCI Bridges hotplug process.
    
looks like enabling page per vq will break pci express hotplug.
I suspect more work is needed to down-size the BAR to # of VQs
actually supported.



> +
>  Protocol features
>  -----------------
>  
> @@ -369,6 +390,7 @@ Protocol features
>  #define VHOST_USER_PROTOCOL_F_SLAVE_REQ      5
>  #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN   6
>  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> +#define VHOST_USER_PROTOCOL_F_VFIO           8
>  
>  Master message types
>  --------------------
> @@ -722,6 +744,41 @@ Slave message types
>       respond with zero when operation is successfully completed, or non-zero
>       otherwise.
>  
> + * VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG
> +
> +      Id: 3
> +      Equivalent ioctl: N/A
> +      Slave payload: u64
> +      Master payload: N/A
> +
> +      Sets the VFIO group file descriptor which is passed as ancillary data
> +      for a specified queue (queue index is carried in the u64 payload).
> +      Slave sends this request to tell QEMU to add or delete a VFIO group.

add or delete it where?

> +      QEMU will delete the current group if any for the specified queue when
> +      the message is sent without a file descriptor. A VFIO group will be
> +      actually deleted when its reference count reaches zero.
> +      This request should be sent only when VHOST_USER_PROTOCOL_F_VFIO protocol
> +      feature has been successfully negotiated.

I think this text should refer reader to Documentation/vfio.txt in Linux
(you can add a link), and explain how to use it
in terms consistent with that document.

I also wonder how does this interact with the vIOMMU.


To put it another way, I think what is really going on here
is that slave has configured a VFIO device and is asking the
master to enable that device to interrupt the guest
directly (without using an eventfd)?

So qemu offered support for the VFIO extension and slave is using it.
What assumptions can slave make then?


> +
> + * VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG
> +
> +      Id: 4
> +      Equivalent ioctl: N/A
> +      Slave payload: vring area description
> +      Master payload: N/A
> +
> +      Sets the notify area for a specified queue (queue index is carried
> +      in the u64 field of the vring area description). A file descriptor is
> +      passed as ancillary data (typically it's a VFIO device fd). QEMU can
> +      mmap the file descriptor based on the information carried in the vring
> +      area description.

Based on it how? What do all fields mean?

> +      Slave sends this request to tell QEMU to add or delete a MemoryRegion
> +      for a specified queue's notify MMIO region. QEMU will delete the current
> +      MemoryRegion if any for the specified queue when the message is sent
> +      without a file descriptor.
> +      This request should be sent only when VHOST_USER_PROTOCOL_F_VFIO protocol
> +      feature and VIRTIO_F_VERSION_1 feature have been successfully negotiated.
> +

This message is a bit easier to understand.
So this allows backend to replace the notification eventfd
with a memory area.


but readers of this
document do not know what MemoryRegion is.


>  VHOST_USER_PROTOCOL_F_REPLY_ACK:
>  -------------------------------
>  The original vhost-user specification only demands replies for certain
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index b228994ffd..07fc63c6e8 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -42,6 +42,7 @@ enum VhostUserProtocolFeature {
>      VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5,
>      VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6,
>      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> +    VHOST_USER_PROTOCOL_F_VFIO = 8,
>  
>      VHOST_USER_PROTOCOL_F_MAX
>  };
> @@ -84,6 +85,8 @@ typedef enum VhostUserSlaveRequest {
>      VHOST_USER_SLAVE_NONE = 0,
>      VHOST_USER_SLAVE_IOTLB_MSG = 1,
>      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> +    VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG = 3,
> +    VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG = 4,
>      VHOST_USER_SLAVE_MAX
>  }  VhostUserSlaveRequest;
>  
> @@ -128,6 +131,12 @@ static VhostUserConfig c __attribute__ ((unused));
>                                     + sizeof(c.size) \
>                                     + sizeof(c.flags))
>  
> +typedef struct VhostUserVringArea {
> +    uint64_t u64;
> +    uint64_t size;
> +    uint64_t offset;
> +} VhostUserVringArea;
> +
>  typedef struct {
>      VhostUserRequest request;
>  
> @@ -149,6 +158,7 @@ typedef union {
>          struct vhost_iotlb_msg iotlb;
>          VhostUserConfig config;
>          VhostUserCryptoSession session;
> +        VhostUserVringArea area;
>  } VhostUserPayload;
>  
>  typedef struct VhostUserMsg {
> @@ -459,9 +469,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
>      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
>  }
>  
> +static void vhost_user_notify_region_remap(struct vhost_dev *dev, int queue_idx)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserVFIOState *vfio = &u->shared->vfio;
> +    VhostUserNotifyCtx *notify = &vfio->notify[queue_idx];
> +    VirtIODevice *vdev = dev->vdev;
> +
> +    if (notify->addr && !notify->mapped) {
> +        virtio_device_notify_region_map(vdev, queue_idx, &notify->mr);
> +        notify->mapped = true;
> +    }
> +}
> +
> +static void vhost_user_notify_region_unmap(struct vhost_dev *dev, int queue_idx)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserVFIOState *vfio = &u->shared->vfio;
> +    VhostUserNotifyCtx *notify = &vfio->notify[queue_idx];
> +    VirtIODevice *vdev = dev->vdev;
> +
> +    if (notify->addr && notify->mapped) {
> +        virtio_device_notify_region_unmap(vdev, &notify->mr);
> +        notify->mapped = false;
> +    }
> +}
> +
>  static int vhost_user_set_vring_base(struct vhost_dev *dev,
>                                       struct vhost_vring_state *ring)
>  {
> +    vhost_user_notify_region_remap(dev, ring->index);
> +
>      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
>  }
>  
> @@ -495,6 +533,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
>          .hdr.size = sizeof(msg.payload.state),
>      };
>  
> +    vhost_user_notify_region_unmap(dev, ring->index);
> +
>      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
>          return -1;
>      }
> @@ -668,6 +708,133 @@ static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
>      return ret;
>  }
>  
> +static int vhost_user_handle_vring_vfio_group(struct vhost_dev *dev,
> +                                              uint64_t u64,

That's not a good variable name.

> +                                              int groupfd)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserVFIOState *vfio = &u->shared->vfio;
> +    int queue_idx = u64 & VHOST_USER_VRING_IDX_MASK;
> +    VirtIODevice *vdev = dev->vdev;
> +    VFIOGroup *group;
> +    int ret = 0;
> +
> +    qemu_mutex_lock(&vfio->lock);
> +
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_VFIO) ||
> +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    if (vfio->group[queue_idx]) {
> +        vfio_put_group(vfio->group[queue_idx]);
> +        vfio->group[queue_idx] = NULL;
> +    }
> +
> +    if (u64 & VHOST_USER_VRING_NOFD_MASK) {
> +        goto out;
> +    }
> +
> +    group = vfio_get_group_from_fd(groupfd, NULL, NULL);
> +    if (group == NULL) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    if (group->fd != groupfd) {
> +        close(groupfd);
> +    }
> +
> +    vfio->group[queue_idx] = group;
> +
> +out:
> +    kvm_irqchip_commit_routes(kvm_state);

The fact we poke at kvm_eventfds_enabled is already kind of ugly.
It would be better to just process eventfds in QEMU when we do not
and make it transparent to the backend.

I don't think vhost should touch more kvm state directly like that.



> +    qemu_mutex_unlock(&vfio->lock);
> +
> +    if (ret != 0 && groupfd != -1) {
> +        close(groupfd);
> +    }
> +
> +    return ret;
> +}
> +
> +#define NOTIFY_PAGE_SIZE 0x1000

why is this correct for all systems?

> +
> +static int vhost_user_handle_vring_notify_area(struct vhost_dev *dev,
> +                                               VhostUserVringArea *area,
> +                                               int fd)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserVFIOState *vfio = &u->shared->vfio;
> +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> +    VirtIODevice *vdev = dev->vdev;
> +    VhostUserNotifyCtx *notify;
> +    void *addr = NULL;
> +    int ret = 0;
> +    char *name;
> +
> +    qemu_mutex_lock(&vfio->lock);
> +
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_VFIO) ||
> +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev) ||
> +        !virtio_device_page_per_vq_enabled(vdev)) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    notify = &vfio->notify[queue_idx];
> +
> +    if (notify->addr) {
> +        virtio_device_notify_region_unmap(vdev, &notify->mr);
> +        munmap(notify->addr, NOTIFY_PAGE_SIZE);
> +        object_unparent(OBJECT(&notify->mr));
> +        notify->addr = NULL;
> +    }
> +
> +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> +        goto out;
> +    }
> +
> +    if (area->size < NOTIFY_PAGE_SIZE) {
> +        ret = -1;
> +        goto out;
> +    }

So that's the only use of size. Why have it at all then?

> +
> +    addr = mmap(NULL, NOTIFY_PAGE_SIZE, PROT_READ | PROT_WRITE,
> +                MAP_SHARED, fd, area->offset);

Can't we use memory_region_init_ram_from_fd?

Also, must validate the message before doing things like that.


> +    if (addr == MAP_FAILED) {
> +        error_report("Can't map notify region.");
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    name = g_strdup_printf("vhost-user/vfio@%p mmaps[%d]", vfio, queue_idx);
> +    memory_region_init_ram_device_ptr(&notify->mr, OBJECT(vdev), name,
> +                                      NOTIFY_PAGE_SIZE, addr);

This will register RAM for migration which probably isn't what you want.

> +    g_free(name);
> +
> +    if (virtio_device_notify_region_map(vdev, queue_idx, &notify->mr)) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    notify->addr = addr;
> +    notify->mapped = true;
> +
> +out:
> +    if (ret < 0 && addr != NULL) {
> +        munmap(addr, NOTIFY_PAGE_SIZE);

Does this actually do the right thing?
Don't we need to finalize the mr we created?


> +    }
> +    if (fd != -1) {
> +        close(fd);
> +    }

Who will close it if there's no error?
Looks like this leaks fds on success.

> +    qemu_mutex_unlock(&vfio->lock);
> +    return ret;
> +}
> +
>  static void slave_read(void *opaque)
>  {
>      struct vhost_dev *dev = opaque;
> @@ -734,6 +901,12 @@ static void slave_read(void *opaque)
>      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
>          ret = vhost_user_slave_handle_config_change(dev);
>          break;
> +    case VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG:
> +        ret = vhost_user_handle_vring_vfio_group(dev, payload.u64, fd);
> +        break;
> +    case VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG:
> +        ret = vhost_user_handle_vring_notify_area(dev, &payload.area, fd);
> +        break;
>      default:
>          error_report("Received unexpected msg type.");
>          if (fd != -1) {
> @@ -844,6 +1017,10 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
>      u->slave_fd = -1;
>      dev->opaque = u;
>  
> +    if (dev->vq_index == 0) {
> +        qemu_mutex_init(&u->shared->vfio.lock);
> +    }
> +
>      err = vhost_user_get_features(dev, &features);
>      if (err < 0) {
>          return err;

That seems inelegant.
Now that we have a shared vhost user state, I'd expect a
clean way to initialize it.

> @@ -904,6 +1081,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
>  static int vhost_user_cleanup(struct vhost_dev *dev)
>  {
>      struct vhost_user *u;
> +    int i;
>  
>      assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
>  
> @@ -913,6 +1091,26 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
>          close(u->slave_fd);
>          u->slave_fd = -1;
>      }
> +
> +    if (dev->vq_index == 0) {
> +        VhostUserVFIOState *vfio = &u->shared->vfio;
> +
> +        for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> +            if (vfio->notify[i].addr) {
> +                munmap(vfio->notify[i].addr, NOTIFY_PAGE_SIZE);
> +                object_unparent(OBJECT(&vfio->notify[i].mr));
> +                vfio->notify[i].addr = NULL;
> +            }
> +
> +            if (vfio->group[i]) {
> +                vfio_put_group(vfio->group[i]);
> +                vfio->group[i] = NULL;
> +            }
> +        }
> +
> +        qemu_mutex_destroy(&u->shared->vfio.lock);
> +    }
> +
>      g_free(u);
>      dev->opaque = 0;
>  

Same here.

> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> index 4f5a1477d1..de8c647962 100644
> --- a/include/hw/virtio/vhost-user.h
> +++ b/include/hw/virtio/vhost-user.h
> @@ -9,9 +9,26 @@
>  #define HW_VIRTIO_VHOST_USER_H
>  
>  #include "chardev/char-fe.h"
> +#include "hw/virtio/virtio.h"
> +#include "hw/vfio/vfio-common.h"
> +
> +typedef struct VhostUserNotifyCtx {
> +    void *addr;
> +    MemoryRegion mr;
> +    bool mapped;
> +} VhostUserNotifyCtx;
> +
> +typedef struct VhostUserVFIOState {
> +    /* The VFIO group associated with each queue */
> +    VFIOGroup *group[VIRTIO_QUEUE_MAX];
> +    /* The notify context of each queue */
> +    VhostUserNotifyCtx notify[VIRTIO_QUEUE_MAX];
> +    QemuMutex lock;
> +} VhostUserVFIOState;
>  
>  typedef struct VhostUser {
>      CharBackend chr;
> +    VhostUserVFIOState vfio;
>  } VhostUser;
>  
>  #endif
> -- 
> 2.11.0

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators
  2018-03-19  7:15 ` [virtio-dev] " Tiwei Bie
@ 2018-03-22 16:40   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-03-22 16:40 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Mon, Mar 19, 2018 at 03:15:31PM +0800, Tiwei Bie wrote:
> This patch set does some small extensions to vhost-user protocol
> to support VFIO based accelerators, and makes it possible to get
> the similar performance of VFIO based PCI passthru while keeping
> the virtio device emulation in QEMU.
> 
> How does accelerator accelerate vhost (data path)
> =================================================
> 
> Any virtio ring compatible devices potentially can be used as the
> vhost data path accelerators. We can setup the accelerator based
> on the informations (e.g. memory table, features, ring info, etc)
> available on the vhost backend. And accelerator will be able to use
> the virtio ring provided by the virtio driver in the VM directly.
> So the virtio driver in the VM can exchange e.g. network packets
> with the accelerator directly via the virtio ring. That is to say,
> we will be able to use the accelerator to accelerate the vhost
> data path. We call it vDPA: vhost Data Path Acceleration.
> 
> Notice: Although the accelerator can talk with the virtio driver
> in the VM via the virtio ring directly. The control path events
> (e.g. device start/stop) in the VM will still be trapped and handled
> by QEMU, and QEMU will deliver such events to the vhost backend
> via standard vhost protocol.
> 
> Below link is an example showing how to setup a such environment
> via nested VM. In this case, the virtio device in the outer VM is
> the accelerator. It will be used to accelerate the virtio device
> in the inner VM. In reality, we could use virtio ring compatible
> hardware device as the accelerators.
> 
> http://dpdk.org/ml/archives/dev/2017-December/085044.html

I understand that it might be challenging due to
the tight coupling with VFIO. Still - isn't there
a way do make it easier to set a testing rig up?

In particular can we avoid the dpdk requirement for testing?



> In above example, it doesn't require any changes to QEMU, but
> it has lower performance compared with the traditional VFIO
> based PCI passthru. And that's the problem this patch set wants
> to solve.
> 
> The performance issue of vDPA/vhost-user and solutions
> ======================================================
> 
> For vhost-user backend, the critical issue in vDPA is that the
> data path performance is relatively low and some host threads are
> needed for the data path, because some necessary mechanisms are
> missing to support:
> 
> 1) guest driver notifies the device directly;
> 2) device interrupts the guest directly;
> 
> So this patch set does some small extensions to the vhost-user
> protocol to make both of them possible. It leverages the same
> mechanisms (e.g. EPT and Posted-Interrupt on Intel platform) as
> the PCI passthru.

Not all platforms support posted interrupts, and EPT isn't
required for MMIO to be mapped to devices.

It probably makes sense to separate the more portable
host notification offload from the less portable
guest notification offload.



> A new protocol feature bit is added to negotiate the accelerator
> feature support. Two new slave message types are added to control
> the notify region and queue interrupt passthru for each queue.
> >From the view of vhost-user protocol design, it's very flexible.
> The passthru can be enabled/disabled for each queue individually,
> and it's possible to accelerate each queue by different devices.
> More design and implementation details can be found from the last
> patch.
> 
> Difference between vDPA and PCI passthru
> ========================================
> 
> The key difference between PCI passthru and vDPA is that, in vDPA
> only the data path of the device (e.g. DMA ring, notify region and
> queue interrupt) is pass-throughed to the VM, the device control
> path (e.g. PCI configuration space and MMIO regions) is still
> defined and emulated by QEMU.
> 
> The benefits of keeping virtio device emulation in QEMU compared
> with virtio device PCI passthru include (but not limit to):
> 
> - consistent device interface for guest OS in the VM;
> - max flexibility on the hardware (i.e. the accelerators) design;
> - leveraging the existing virtio live-migration framework;
> 
> Why extend vhost-user for vDPA
> ==============================
> 
> We have already implemented various virtual switches (e.g. OVS-DPDK)
> based on vhost-user for VMs in the Cloud. They are purely software
> running on CPU cores. When we have accelerators for such NFVi applications,
> it's ideal if the applications could keep using the original interface
> (i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
> when and how to switch between CPU and accelerators within the interface.
> And the switching (i.e. switch between CPU and accelerators) can be done
> flexibly and quickly inside the applications.
> 
> More details about this can be found from the Cunming's discussions on
> the RFC patch set.
> 
> Update notes
> ============
> 
> IOMMU feature bit check is removed in this version, because:
> 
> The IOMMU feature is negotiable, when an accelerator is used and
> it doesn't support virtual IOMMU, its driver just won't provide
> this feature bit when vhost library querying its features. And if
> it supports the virtual IOMMU, its driver can provide this feature
> bit. It's not reasonable to add this limitation in this patch set.

Fair enough. Still:
Can hardware on intel platforms actually support IOTLB requests?
Don't you need to add support for vIOMMU shadowing instead?


> The previous links:
> RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
> v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html
> 
> v1 -> v2:
> - Add some explanations about why extend vhost-user in commit log (Paolo);
> - Bug fix in slave_read() according to Stefan's fix in DPDK;
> - Remove IOMMU feature check and related commit log;
> - Some minor refinements;
> - Rebase to the latest QEMU;
> 
> RFC -> v1:
> - Add some details about how vDPA works in cover letter (Alexey)
> - Add some details about the OVS offload use-case in cover letter (Jason)
> - Move PCI specific stuffs out of vhost-user (Jason)
> - Handle the virtual IOMMU case (Jason)
> - Move VFIO group management code into vfio/common.c (Alex)
> - Various refinements;
> (approximately sorted by comment posting time)
> 
> Tiwei Bie (6):
>   vhost-user: support receiving file descriptors in slave_read
>   vhost-user: introduce shared vhost-user state
>   virtio: support adding sub-regions for notify region
>   vfio: support getting VFIOGroup from groupfd
>   vfio: remove DPRINTF() definition from vfio-common.h
>   vhost-user: add VFIO based accelerators support
> 
>  Makefile.target                 |   4 +
>  docs/interop/vhost-user.txt     |  57 +++++++++
>  hw/scsi/vhost-user-scsi.c       |   6 +-
>  hw/vfio/common.c                |  97 +++++++++++++++-
>  hw/virtio/vhost-user.c          | 248 +++++++++++++++++++++++++++++++++++++++-
>  hw/virtio/virtio-pci.c          |  48 ++++++++
>  hw/virtio/virtio-pci.h          |   5 +
>  hw/virtio/virtio.c              |  39 +++++++
>  include/hw/vfio/vfio-common.h   |  11 +-
>  include/hw/virtio/vhost-user.h  |  34 ++++++
>  include/hw/virtio/virtio-scsi.h |   6 +-
>  include/hw/virtio/virtio.h      |   5 +
>  include/qemu/osdep.h            |   1 +
>  net/vhost-user.c                |  30 ++---
>  scripts/create_config           |   3 +
>  15 files changed, 561 insertions(+), 33 deletions(-)
>  create mode 100644 include/hw/virtio/vhost-user.h
> 
> -- 
> 2.11.0

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [virtio-dev] Re: [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators
@ 2018-03-22 16:40   ` Michael S. Tsirkin
  0 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-03-22 16:40 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Mon, Mar 19, 2018 at 03:15:31PM +0800, Tiwei Bie wrote:
> This patch set does some small extensions to vhost-user protocol
> to support VFIO based accelerators, and makes it possible to get
> the similar performance of VFIO based PCI passthru while keeping
> the virtio device emulation in QEMU.
> 
> How does accelerator accelerate vhost (data path)
> =================================================
> 
> Any virtio ring compatible devices potentially can be used as the
> vhost data path accelerators. We can setup the accelerator based
> on the informations (e.g. memory table, features, ring info, etc)
> available on the vhost backend. And accelerator will be able to use
> the virtio ring provided by the virtio driver in the VM directly.
> So the virtio driver in the VM can exchange e.g. network packets
> with the accelerator directly via the virtio ring. That is to say,
> we will be able to use the accelerator to accelerate the vhost
> data path. We call it vDPA: vhost Data Path Acceleration.
> 
> Notice: Although the accelerator can talk with the virtio driver
> in the VM via the virtio ring directly. The control path events
> (e.g. device start/stop) in the VM will still be trapped and handled
> by QEMU, and QEMU will deliver such events to the vhost backend
> via standard vhost protocol.
> 
> Below link is an example showing how to setup a such environment
> via nested VM. In this case, the virtio device in the outer VM is
> the accelerator. It will be used to accelerate the virtio device
> in the inner VM. In reality, we could use virtio ring compatible
> hardware device as the accelerators.
> 
> http://dpdk.org/ml/archives/dev/2017-December/085044.html

I understand that it might be challenging due to
the tight coupling with VFIO. Still - isn't there
a way do make it easier to set a testing rig up?

In particular can we avoid the dpdk requirement for testing?



> In above example, it doesn't require any changes to QEMU, but
> it has lower performance compared with the traditional VFIO
> based PCI passthru. And that's the problem this patch set wants
> to solve.
> 
> The performance issue of vDPA/vhost-user and solutions
> ======================================================
> 
> For vhost-user backend, the critical issue in vDPA is that the
> data path performance is relatively low and some host threads are
> needed for the data path, because some necessary mechanisms are
> missing to support:
> 
> 1) guest driver notifies the device directly;
> 2) device interrupts the guest directly;
> 
> So this patch set does some small extensions to the vhost-user
> protocol to make both of them possible. It leverages the same
> mechanisms (e.g. EPT and Posted-Interrupt on Intel platform) as
> the PCI passthru.

Not all platforms support posted interrupts, and EPT isn't
required for MMIO to be mapped to devices.

It probably makes sense to separate the more portable
host notification offload from the less portable
guest notification offload.



> A new protocol feature bit is added to negotiate the accelerator
> feature support. Two new slave message types are added to control
> the notify region and queue interrupt passthru for each queue.
> >From the view of vhost-user protocol design, it's very flexible.
> The passthru can be enabled/disabled for each queue individually,
> and it's possible to accelerate each queue by different devices.
> More design and implementation details can be found from the last
> patch.
> 
> Difference between vDPA and PCI passthru
> ========================================
> 
> The key difference between PCI passthru and vDPA is that, in vDPA
> only the data path of the device (e.g. DMA ring, notify region and
> queue interrupt) is pass-throughed to the VM, the device control
> path (e.g. PCI configuration space and MMIO regions) is still
> defined and emulated by QEMU.
> 
> The benefits of keeping virtio device emulation in QEMU compared
> with virtio device PCI passthru include (but not limit to):
> 
> - consistent device interface for guest OS in the VM;
> - max flexibility on the hardware (i.e. the accelerators) design;
> - leveraging the existing virtio live-migration framework;
> 
> Why extend vhost-user for vDPA
> ==============================
> 
> We have already implemented various virtual switches (e.g. OVS-DPDK)
> based on vhost-user for VMs in the Cloud. They are purely software
> running on CPU cores. When we have accelerators for such NFVi applications,
> it's ideal if the applications could keep using the original interface
> (i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
> when and how to switch between CPU and accelerators within the interface.
> And the switching (i.e. switch between CPU and accelerators) can be done
> flexibly and quickly inside the applications.
> 
> More details about this can be found from the Cunming's discussions on
> the RFC patch set.
> 
> Update notes
> ============
> 
> IOMMU feature bit check is removed in this version, because:
> 
> The IOMMU feature is negotiable, when an accelerator is used and
> it doesn't support virtual IOMMU, its driver just won't provide
> this feature bit when vhost library querying its features. And if
> it supports the virtual IOMMU, its driver can provide this feature
> bit. It's not reasonable to add this limitation in this patch set.

Fair enough. Still:
Can hardware on intel platforms actually support IOTLB requests?
Don't you need to add support for vIOMMU shadowing instead?


> The previous links:
> RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
> v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html
> 
> v1 -> v2:
> - Add some explanations about why extend vhost-user in commit log (Paolo);
> - Bug fix in slave_read() according to Stefan's fix in DPDK;
> - Remove IOMMU feature check and related commit log;
> - Some minor refinements;
> - Rebase to the latest QEMU;
> 
> RFC -> v1:
> - Add some details about how vDPA works in cover letter (Alexey)
> - Add some details about the OVS offload use-case in cover letter (Jason)
> - Move PCI specific stuffs out of vhost-user (Jason)
> - Handle the virtual IOMMU case (Jason)
> - Move VFIO group management code into vfio/common.c (Alex)
> - Various refinements;
> (approximately sorted by comment posting time)
> 
> Tiwei Bie (6):
>   vhost-user: support receiving file descriptors in slave_read
>   vhost-user: introduce shared vhost-user state
>   virtio: support adding sub-regions for notify region
>   vfio: support getting VFIOGroup from groupfd
>   vfio: remove DPRINTF() definition from vfio-common.h
>   vhost-user: add VFIO based accelerators support
> 
>  Makefile.target                 |   4 +
>  docs/interop/vhost-user.txt     |  57 +++++++++
>  hw/scsi/vhost-user-scsi.c       |   6 +-
>  hw/vfio/common.c                |  97 +++++++++++++++-
>  hw/virtio/vhost-user.c          | 248 +++++++++++++++++++++++++++++++++++++++-
>  hw/virtio/virtio-pci.c          |  48 ++++++++
>  hw/virtio/virtio-pci.h          |   5 +
>  hw/virtio/virtio.c              |  39 +++++++
>  include/hw/vfio/vfio-common.h   |  11 +-
>  include/hw/virtio/vhost-user.h  |  34 ++++++
>  include/hw/virtio/virtio-scsi.h |   6 +-
>  include/hw/virtio/virtio.h      |   5 +
>  include/qemu/osdep.h            |   1 +
>  net/vhost-user.c                |  30 ++---
>  scripts/create_config           |   3 +
>  15 files changed, 561 insertions(+), 33 deletions(-)
>  create mode 100644 include/hw/virtio/vhost-user.h
> 
> -- 
> 2.11.0

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators
  2018-03-22 14:55   ` [virtio-dev] " Michael S. Tsirkin
@ 2018-03-23  8:54     ` Tiwei Bie
  -1 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-23  8:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Mar 22, 2018 at 04:55:39PM +0200, Michael S. Tsirkin wrote:
> On Mon, Mar 19, 2018 at 03:15:31PM +0800, Tiwei Bie wrote:
> > This patch set does some small extensions to vhost-user protocol
> > to support VFIO based accelerators, and makes it possible to get
> > the similar performance of VFIO based PCI passthru while keeping
> > the virtio device emulation in QEMU.
> 
> I love your patches!
> Yet there are some things to improve.
> Posting comments separately as individual messages.
> 

Thank you so much! :-)

It may take me some time to address all your comments.
They're really helpful! I'll try to address and reply
to these comments in the next few days. Thanks again!
I do appreciate it!

Best regards,
Tiwei Bie

> 
> > How does accelerator accelerate vhost (data path)
> > =================================================
> > 
> > Any virtio ring compatible devices potentially can be used as the
> > vhost data path accelerators. We can setup the accelerator based
> > on the informations (e.g. memory table, features, ring info, etc)
> > available on the vhost backend. And accelerator will be able to use
> > the virtio ring provided by the virtio driver in the VM directly.
> > So the virtio driver in the VM can exchange e.g. network packets
> > with the accelerator directly via the virtio ring. That is to say,
> > we will be able to use the accelerator to accelerate the vhost
> > data path. We call it vDPA: vhost Data Path Acceleration.
> > 
> > Notice: Although the accelerator can talk with the virtio driver
> > in the VM via the virtio ring directly. The control path events
> > (e.g. device start/stop) in the VM will still be trapped and handled
> > by QEMU, and QEMU will deliver such events to the vhost backend
> > via standard vhost protocol.
> > 
> > Below link is an example showing how to setup a such environment
> > via nested VM. In this case, the virtio device in the outer VM is
> > the accelerator. It will be used to accelerate the virtio device
> > in the inner VM. In reality, we could use virtio ring compatible
> > hardware device as the accelerators.
> > 
> > http://dpdk.org/ml/archives/dev/2017-December/085044.html
> > 
> > In above example, it doesn't require any changes to QEMU, but
> > it has lower performance compared with the traditional VFIO
> > based PCI passthru. And that's the problem this patch set wants
> > to solve.
> > 
> > The performance issue of vDPA/vhost-user and solutions
> > ======================================================
> > 
> > For vhost-user backend, the critical issue in vDPA is that the
> > data path performance is relatively low and some host threads are
> > needed for the data path, because some necessary mechanisms are
> > missing to support:
> > 
> > 1) guest driver notifies the device directly;
> > 2) device interrupts the guest directly;
> > 
> > So this patch set does some small extensions to the vhost-user
> > protocol to make both of them possible. It leverages the same
> > mechanisms (e.g. EPT and Posted-Interrupt on Intel platform) as
> > the PCI passthru.
> > 
> > A new protocol feature bit is added to negotiate the accelerator
> > feature support. Two new slave message types are added to control
> > the notify region and queue interrupt passthru for each queue.
> > >From the view of vhost-user protocol design, it's very flexible.
> > The passthru can be enabled/disabled for each queue individually,
> > and it's possible to accelerate each queue by different devices.
> > More design and implementation details can be found from the last
> > patch.
> > 
> > Difference between vDPA and PCI passthru
> > ========================================
> > 
> > The key difference between PCI passthru and vDPA is that, in vDPA
> > only the data path of the device (e.g. DMA ring, notify region and
> > queue interrupt) is pass-throughed to the VM, the device control
> > path (e.g. PCI configuration space and MMIO regions) is still
> > defined and emulated by QEMU.
> > 
> > The benefits of keeping virtio device emulation in QEMU compared
> > with virtio device PCI passthru include (but not limit to):
> > 
> > - consistent device interface for guest OS in the VM;
> > - max flexibility on the hardware (i.e. the accelerators) design;
> > - leveraging the existing virtio live-migration framework;
> > 
> > Why extend vhost-user for vDPA
> > ==============================
> > 
> > We have already implemented various virtual switches (e.g. OVS-DPDK)
> > based on vhost-user for VMs in the Cloud. They are purely software
> > running on CPU cores. When we have accelerators for such NFVi applications,
> > it's ideal if the applications could keep using the original interface
> > (i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
> > when and how to switch between CPU and accelerators within the interface.
> > And the switching (i.e. switch between CPU and accelerators) can be done
> > flexibly and quickly inside the applications.
> > 
> > More details about this can be found from the Cunming's discussions on
> > the RFC patch set.
> > 
> > Update notes
> > ============
> > 
> > IOMMU feature bit check is removed in this version, because:
> > 
> > The IOMMU feature is negotiable, when an accelerator is used and
> > it doesn't support virtual IOMMU, its driver just won't provide
> > this feature bit when vhost library querying its features. And if
> > it supports the virtual IOMMU, its driver can provide this feature
> > bit. It's not reasonable to add this limitation in this patch set.
> > 
> > The previous links:
> > RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
> > v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html
> > 
> > v1 -> v2:
> > - Add some explanations about why extend vhost-user in commit log (Paolo);
> > - Bug fix in slave_read() according to Stefan's fix in DPDK;
> > - Remove IOMMU feature check and related commit log;
> > - Some minor refinements;
> > - Rebase to the latest QEMU;
> > 
> > RFC -> v1:
> > - Add some details about how vDPA works in cover letter (Alexey)
> > - Add some details about the OVS offload use-case in cover letter (Jason)
> > - Move PCI specific stuffs out of vhost-user (Jason)
> > - Handle the virtual IOMMU case (Jason)
> > - Move VFIO group management code into vfio/common.c (Alex)
> > - Various refinements;
> > (approximately sorted by comment posting time)
> > 
> > Tiwei Bie (6):
> >   vhost-user: support receiving file descriptors in slave_read
> >   vhost-user: introduce shared vhost-user state
> >   virtio: support adding sub-regions for notify region
> >   vfio: support getting VFIOGroup from groupfd
> >   vfio: remove DPRINTF() definition from vfio-common.h
> >   vhost-user: add VFIO based accelerators support
> > 
> >  Makefile.target                 |   4 +
> >  docs/interop/vhost-user.txt     |  57 +++++++++
> >  hw/scsi/vhost-user-scsi.c       |   6 +-
> >  hw/vfio/common.c                |  97 +++++++++++++++-
> >  hw/virtio/vhost-user.c          | 248 +++++++++++++++++++++++++++++++++++++++-
> >  hw/virtio/virtio-pci.c          |  48 ++++++++
> >  hw/virtio/virtio-pci.h          |   5 +
> >  hw/virtio/virtio.c              |  39 +++++++
> >  include/hw/vfio/vfio-common.h   |  11 +-
> >  include/hw/virtio/vhost-user.h  |  34 ++++++
> >  include/hw/virtio/virtio-scsi.h |   6 +-
> >  include/hw/virtio/virtio.h      |   5 +
> >  include/qemu/osdep.h            |   1 +
> >  net/vhost-user.c                |  30 ++---
> >  scripts/create_config           |   3 +
> >  15 files changed, 561 insertions(+), 33 deletions(-)
> >  create mode 100644 include/hw/virtio/vhost-user.h
> > 
> > -- 
> > 2.11.0

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [virtio-dev] Re: [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators
@ 2018-03-23  8:54     ` Tiwei Bie
  0 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-23  8:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Mar 22, 2018 at 04:55:39PM +0200, Michael S. Tsirkin wrote:
> On Mon, Mar 19, 2018 at 03:15:31PM +0800, Tiwei Bie wrote:
> > This patch set does some small extensions to vhost-user protocol
> > to support VFIO based accelerators, and makes it possible to get
> > the similar performance of VFIO based PCI passthru while keeping
> > the virtio device emulation in QEMU.
> 
> I love your patches!
> Yet there are some things to improve.
> Posting comments separately as individual messages.
> 

Thank you so much! :-)

It may take me some time to address all your comments.
They're really helpful! I'll try to address and reply
to these comments in the next few days. Thanks again!
I do appreciate it!

Best regards,
Tiwei Bie

> 
> > How does accelerator accelerate vhost (data path)
> > =================================================
> > 
> > Any virtio ring compatible devices potentially can be used as the
> > vhost data path accelerators. We can setup the accelerator based
> > on the informations (e.g. memory table, features, ring info, etc)
> > available on the vhost backend. And accelerator will be able to use
> > the virtio ring provided by the virtio driver in the VM directly.
> > So the virtio driver in the VM can exchange e.g. network packets
> > with the accelerator directly via the virtio ring. That is to say,
> > we will be able to use the accelerator to accelerate the vhost
> > data path. We call it vDPA: vhost Data Path Acceleration.
> > 
> > Notice: Although the accelerator can talk with the virtio driver
> > in the VM via the virtio ring directly. The control path events
> > (e.g. device start/stop) in the VM will still be trapped and handled
> > by QEMU, and QEMU will deliver such events to the vhost backend
> > via standard vhost protocol.
> > 
> > Below link is an example showing how to setup a such environment
> > via nested VM. In this case, the virtio device in the outer VM is
> > the accelerator. It will be used to accelerate the virtio device
> > in the inner VM. In reality, we could use virtio ring compatible
> > hardware device as the accelerators.
> > 
> > http://dpdk.org/ml/archives/dev/2017-December/085044.html
> > 
> > In above example, it doesn't require any changes to QEMU, but
> > it has lower performance compared with the traditional VFIO
> > based PCI passthru. And that's the problem this patch set wants
> > to solve.
> > 
> > The performance issue of vDPA/vhost-user and solutions
> > ======================================================
> > 
> > For vhost-user backend, the critical issue in vDPA is that the
> > data path performance is relatively low and some host threads are
> > needed for the data path, because some necessary mechanisms are
> > missing to support:
> > 
> > 1) guest driver notifies the device directly;
> > 2) device interrupts the guest directly;
> > 
> > So this patch set does some small extensions to the vhost-user
> > protocol to make both of them possible. It leverages the same
> > mechanisms (e.g. EPT and Posted-Interrupt on Intel platform) as
> > the PCI passthru.
> > 
> > A new protocol feature bit is added to negotiate the accelerator
> > feature support. Two new slave message types are added to control
> > the notify region and queue interrupt passthru for each queue.
> > >From the view of vhost-user protocol design, it's very flexible.
> > The passthru can be enabled/disabled for each queue individually,
> > and it's possible to accelerate each queue by different devices.
> > More design and implementation details can be found from the last
> > patch.
> > 
> > Difference between vDPA and PCI passthru
> > ========================================
> > 
> > The key difference between PCI passthru and vDPA is that, in vDPA
> > only the data path of the device (e.g. DMA ring, notify region and
> > queue interrupt) is pass-throughed to the VM, the device control
> > path (e.g. PCI configuration space and MMIO regions) is still
> > defined and emulated by QEMU.
> > 
> > The benefits of keeping virtio device emulation in QEMU compared
> > with virtio device PCI passthru include (but not limit to):
> > 
> > - consistent device interface for guest OS in the VM;
> > - max flexibility on the hardware (i.e. the accelerators) design;
> > - leveraging the existing virtio live-migration framework;
> > 
> > Why extend vhost-user for vDPA
> > ==============================
> > 
> > We have already implemented various virtual switches (e.g. OVS-DPDK)
> > based on vhost-user for VMs in the Cloud. They are purely software
> > running on CPU cores. When we have accelerators for such NFVi applications,
> > it's ideal if the applications could keep using the original interface
> > (i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
> > when and how to switch between CPU and accelerators within the interface.
> > And the switching (i.e. switch between CPU and accelerators) can be done
> > flexibly and quickly inside the applications.
> > 
> > More details about this can be found from the Cunming's discussions on
> > the RFC patch set.
> > 
> > Update notes
> > ============
> > 
> > IOMMU feature bit check is removed in this version, because:
> > 
> > The IOMMU feature is negotiable, when an accelerator is used and
> > it doesn't support virtual IOMMU, its driver just won't provide
> > this feature bit when vhost library querying its features. And if
> > it supports the virtual IOMMU, its driver can provide this feature
> > bit. It's not reasonable to add this limitation in this patch set.
> > 
> > The previous links:
> > RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
> > v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html
> > 
> > v1 -> v2:
> > - Add some explanations about why extend vhost-user in commit log (Paolo);
> > - Bug fix in slave_read() according to Stefan's fix in DPDK;
> > - Remove IOMMU feature check and related commit log;
> > - Some minor refinements;
> > - Rebase to the latest QEMU;
> > 
> > RFC -> v1:
> > - Add some details about how vDPA works in cover letter (Alexey)
> > - Add some details about the OVS offload use-case in cover letter (Jason)
> > - Move PCI specific stuffs out of vhost-user (Jason)
> > - Handle the virtual IOMMU case (Jason)
> > - Move VFIO group management code into vfio/common.c (Alex)
> > - Various refinements;
> > (approximately sorted by comment posting time)
> > 
> > Tiwei Bie (6):
> >   vhost-user: support receiving file descriptors in slave_read
> >   vhost-user: introduce shared vhost-user state
> >   virtio: support adding sub-regions for notify region
> >   vfio: support getting VFIOGroup from groupfd
> >   vfio: remove DPRINTF() definition from vfio-common.h
> >   vhost-user: add VFIO based accelerators support
> > 
> >  Makefile.target                 |   4 +
> >  docs/interop/vhost-user.txt     |  57 +++++++++
> >  hw/scsi/vhost-user-scsi.c       |   6 +-
> >  hw/vfio/common.c                |  97 +++++++++++++++-
> >  hw/virtio/vhost-user.c          | 248 +++++++++++++++++++++++++++++++++++++++-
> >  hw/virtio/virtio-pci.c          |  48 ++++++++
> >  hw/virtio/virtio-pci.h          |   5 +
> >  hw/virtio/virtio.c              |  39 +++++++
> >  include/hw/vfio/vfio-common.h   |  11 +-
> >  include/hw/virtio/vhost-user.h  |  34 ++++++
> >  include/hw/virtio/virtio-scsi.h |   6 +-
> >  include/hw/virtio/virtio.h      |   5 +
> >  include/qemu/osdep.h            |   1 +
> >  net/vhost-user.c                |  30 ++---
> >  scripts/create_config           |   3 +
> >  15 files changed, 561 insertions(+), 33 deletions(-)
> >  create mode 100644 include/hw/virtio/vhost-user.h
> > 
> > -- 
> > 2.11.0

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 6/6] vhost-user: add VFIO based accelerators support
  2018-03-22 16:19     ` [virtio-dev] " Michael S. Tsirkin
@ 2018-03-27 11:06       ` Tiwei Bie
  -1 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-27 11:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Mar 22, 2018 at 06:19:44PM +0200, Michael S. Tsirkin wrote:
> On Mon, Mar 19, 2018 at 03:15:37PM +0800, Tiwei Bie wrote:
[...]
> > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > index cb3a7595aa..264a58a800 100644
> > --- a/docs/interop/vhost-user.txt
> > +++ b/docs/interop/vhost-user.txt
> > @@ -132,6 +132,15 @@ Depending on the request type, payload can be:
[...]
> 
> but readers of this
> document do not know what MemoryRegion is.
> 
> 
> >  VHOST_USER_PROTOCOL_F_REPLY_ACK:
> >  -------------------------------
> >  The original vhost-user specification only demands replies for certain

All above comments about the doc are very helpful,
I'll improve the doc accordingly! Thanks a lot!

> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index b228994ffd..07fc63c6e8 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
[...]
> >  
> > +static int vhost_user_handle_vring_vfio_group(struct vhost_dev *dev,
> > +                                              uint64_t u64,
> 
> That's not a good variable name.
> 
> > +                                              int groupfd)
> > +{
[...]
> > +
> > +    vfio->group[queue_idx] = group;
> > +
> > +out:
> > +    kvm_irqchip_commit_routes(kvm_state);
> 
> The fact we poke at kvm_eventfds_enabled is already kind of ugly.
> It would be better to just process eventfds in QEMU when we do not
> and make it transparent to the backend.
> 
> I don't think vhost should touch more kvm state directly like that.
> 

I'll think about whether there's a better way to do this.

> 
> 
> > +    qemu_mutex_unlock(&vfio->lock);
> > +
> > +    if (ret != 0 && groupfd != -1) {
> > +        close(groupfd);
> > +    }
> > +
> > +    return ret;
> > +}
> > +
> > +#define NOTIFY_PAGE_SIZE 0x1000
> 
> why is this correct for all systems?

Will fix this.

> 
> > +
> > +static int vhost_user_handle_vring_notify_area(struct vhost_dev *dev,
> > +                                               VhostUserVringArea *area,
> > +                                               int fd)
> > +{
[...]
> > +    if (area->size < NOTIFY_PAGE_SIZE) {
> > +        ret = -1;
> > +        goto out;
> > +    }
> 
> So that's the only use of size. Why have it at all then?
> 
> > +
> > +    addr = mmap(NULL, NOTIFY_PAGE_SIZE, PROT_READ | PROT_WRITE,
> > +                MAP_SHARED, fd, area->offset);
> 
> Can't we use memory_region_init_ram_from_fd?

Because we need to map the file with a specified offset.

> 
> Also, must validate the message before doing things like that.

Not pretty sure I have got your point. Do you mean we
also need to validate e.g. area->offset?

> 
> 
> > +    if (addr == MAP_FAILED) {
> > +        error_report("Can't map notify region.");
> > +        ret = -1;
> > +        goto out;
> > +    }
> > +
> > +    name = g_strdup_printf("vhost-user/vfio@%p mmaps[%d]", vfio, queue_idx);
> > +    memory_region_init_ram_device_ptr(&notify->mr, OBJECT(vdev), name,
> > +                                      NOTIFY_PAGE_SIZE, addr);
> 
> This will register RAM for migration which probably isn't what you want.

It's definitely not what I want. But I don't know how
to avoid it. Could you please provide more details about
how to avoid this? Thanks a lot!

> 
> > +    g_free(name);
> > +
> > +    if (virtio_device_notify_region_map(vdev, queue_idx, &notify->mr)) {
> > +        ret = -1;
> > +        goto out;
> > +    }
> > +
> > +    notify->addr = addr;
> > +    notify->mapped = true;
> > +
> > +out:
> > +    if (ret < 0 && addr != NULL) {
> > +        munmap(addr, NOTIFY_PAGE_SIZE);
> 
> Does this actually do the right thing?
> Don't we need to finalize the mr we created?

ret < 0 means this function failed. So we will
unmap the memory if the memory has been mapped.
I just noticed a bug, addr may also be MAP_FAILED.
Will fix this!

> 
> 
> > +    }
> > +    if (fd != -1) {
> > +        close(fd);
> > +    }
> 
> Who will close it if there's no error?
> Looks like this leaks fds on success.

fd != -1 means we have received a fd. The logic of
above code is to close the fd before returning from
this function (for both of error case and success
case). Because for success case, we also don't need
to keep the fd after we mmap it.

I will try to make above code more readable.

> 
> > +    qemu_mutex_unlock(&vfio->lock);
> > +    return ret;
> > +}
> > +
> >  static void slave_read(void *opaque)
> >  {
> >      struct vhost_dev *dev = opaque;
> > @@ -734,6 +901,12 @@ static void slave_read(void *opaque)
> >      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
> >          ret = vhost_user_slave_handle_config_change(dev);
> >          break;
> > +    case VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG:
> > +        ret = vhost_user_handle_vring_vfio_group(dev, payload.u64, fd);
> > +        break;
> > +    case VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG:
> > +        ret = vhost_user_handle_vring_notify_area(dev, &payload.area, fd);
> > +        break;
> >      default:
> >          error_report("Received unexpected msg type.");
> >          if (fd != -1) {
> > @@ -844,6 +1017,10 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
> >      u->slave_fd = -1;
> >      dev->opaque = u;
> >  
> > +    if (dev->vq_index == 0) {
> > +        qemu_mutex_init(&u->shared->vfio.lock);
> > +    }
> > +
> >      err = vhost_user_get_features(dev, &features);
> >      if (err < 0) {
> >          return err;
> 
> That seems inelegant.
> Now that we have a shared vhost user state, I'd expect a
> clean way to initialize it.

I'll fix this.

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [virtio-dev] Re: [PATCH v2 6/6] vhost-user: add VFIO based accelerators support
@ 2018-03-27 11:06       ` Tiwei Bie
  0 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-27 11:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Mar 22, 2018 at 06:19:44PM +0200, Michael S. Tsirkin wrote:
> On Mon, Mar 19, 2018 at 03:15:37PM +0800, Tiwei Bie wrote:
[...]
> > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > index cb3a7595aa..264a58a800 100644
> > --- a/docs/interop/vhost-user.txt
> > +++ b/docs/interop/vhost-user.txt
> > @@ -132,6 +132,15 @@ Depending on the request type, payload can be:
[...]
> 
> but readers of this
> document do not know what MemoryRegion is.
> 
> 
> >  VHOST_USER_PROTOCOL_F_REPLY_ACK:
> >  -------------------------------
> >  The original vhost-user specification only demands replies for certain

All above comments about the doc are very helpful,
I'll improve the doc accordingly! Thanks a lot!

> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index b228994ffd..07fc63c6e8 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
[...]
> >  
> > +static int vhost_user_handle_vring_vfio_group(struct vhost_dev *dev,
> > +                                              uint64_t u64,
> 
> That's not a good variable name.
> 
> > +                                              int groupfd)
> > +{
[...]
> > +
> > +    vfio->group[queue_idx] = group;
> > +
> > +out:
> > +    kvm_irqchip_commit_routes(kvm_state);
> 
> The fact we poke at kvm_eventfds_enabled is already kind of ugly.
> It would be better to just process eventfds in QEMU when we do not
> and make it transparent to the backend.
> 
> I don't think vhost should touch more kvm state directly like that.
> 

I'll think about whether there's a better way to do this.

> 
> 
> > +    qemu_mutex_unlock(&vfio->lock);
> > +
> > +    if (ret != 0 && groupfd != -1) {
> > +        close(groupfd);
> > +    }
> > +
> > +    return ret;
> > +}
> > +
> > +#define NOTIFY_PAGE_SIZE 0x1000
> 
> why is this correct for all systems?

Will fix this.

> 
> > +
> > +static int vhost_user_handle_vring_notify_area(struct vhost_dev *dev,
> > +                                               VhostUserVringArea *area,
> > +                                               int fd)
> > +{
[...]
> > +    if (area->size < NOTIFY_PAGE_SIZE) {
> > +        ret = -1;
> > +        goto out;
> > +    }
> 
> So that's the only use of size. Why have it at all then?
> 
> > +
> > +    addr = mmap(NULL, NOTIFY_PAGE_SIZE, PROT_READ | PROT_WRITE,
> > +                MAP_SHARED, fd, area->offset);
> 
> Can't we use memory_region_init_ram_from_fd?

Because we need to map the file with a specified offset.

> 
> Also, must validate the message before doing things like that.

Not pretty sure I have got your point. Do you mean we
also need to validate e.g. area->offset?

> 
> 
> > +    if (addr == MAP_FAILED) {
> > +        error_report("Can't map notify region.");
> > +        ret = -1;
> > +        goto out;
> > +    }
> > +
> > +    name = g_strdup_printf("vhost-user/vfio@%p mmaps[%d]", vfio, queue_idx);
> > +    memory_region_init_ram_device_ptr(&notify->mr, OBJECT(vdev), name,
> > +                                      NOTIFY_PAGE_SIZE, addr);
> 
> This will register RAM for migration which probably isn't what you want.

It's definitely not what I want. But I don't know how
to avoid it. Could you please provide more details about
how to avoid this? Thanks a lot!

> 
> > +    g_free(name);
> > +
> > +    if (virtio_device_notify_region_map(vdev, queue_idx, &notify->mr)) {
> > +        ret = -1;
> > +        goto out;
> > +    }
> > +
> > +    notify->addr = addr;
> > +    notify->mapped = true;
> > +
> > +out:
> > +    if (ret < 0 && addr != NULL) {
> > +        munmap(addr, NOTIFY_PAGE_SIZE);
> 
> Does this actually do the right thing?
> Don't we need to finalize the mr we created?

ret < 0 means this function failed. So we will
unmap the memory if the memory has been mapped.
I just noticed a bug, addr may also be MAP_FAILED.
Will fix this!

> 
> 
> > +    }
> > +    if (fd != -1) {
> > +        close(fd);
> > +    }
> 
> Who will close it if there's no error?
> Looks like this leaks fds on success.

fd != -1 means we have received a fd. The logic of
above code is to close the fd before returning from
this function (for both of error case and success
case). Because for success case, we also don't need
to keep the fd after we mmap it.

I will try to make above code more readable.

> 
> > +    qemu_mutex_unlock(&vfio->lock);
> > +    return ret;
> > +}
> > +
> >  static void slave_read(void *opaque)
> >  {
> >      struct vhost_dev *dev = opaque;
> > @@ -734,6 +901,12 @@ static void slave_read(void *opaque)
> >      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
> >          ret = vhost_user_slave_handle_config_change(dev);
> >          break;
> > +    case VHOST_USER_SLAVE_VRING_VFIO_GROUP_MSG:
> > +        ret = vhost_user_handle_vring_vfio_group(dev, payload.u64, fd);
> > +        break;
> > +    case VHOST_USER_SLAVE_VRING_NOTIFY_AREA_MSG:
> > +        ret = vhost_user_handle_vring_notify_area(dev, &payload.area, fd);
> > +        break;
> >      default:
> >          error_report("Received unexpected msg type.");
> >          if (fd != -1) {
> > @@ -844,6 +1017,10 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
> >      u->slave_fd = -1;
> >      dev->opaque = u;
> >  
> > +    if (dev->vq_index == 0) {
> > +        qemu_mutex_init(&u->shared->vfio.lock);
> > +    }
> > +
> >      err = vhost_user_get_features(dev, &features);
> >      if (err < 0) {
> >          return err;
> 
> That seems inelegant.
> Now that we have a shared vhost user state, I'd expect a
> clean way to initialize it.

I'll fix this.

Best regards,
Tiwei Bie

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v2 2/6] vhost-user: introduce shared vhost-user state
  2018-03-22 15:13     ` [virtio-dev] " Michael S. Tsirkin
@ 2018-03-27 13:32       ` Tiwei Bie
  -1 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-27 13:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Mar 22, 2018 at 05:13:41PM +0200, Michael S. Tsirkin wrote:
> On Mon, Mar 19, 2018 at 03:15:33PM +0800, Tiwei Bie wrote:
> > @@ -22,7 +23,7 @@
> >  
> >  typedef struct VhostUserState {
> >      NetClientState nc;
> > -    CharBackend chr; /* only queue index 0 */
> > +    VhostUser vhost_user; /* only queue index 0 */
> >      VHostNetState *vhost_net;
> >      guint watch;
> >      uint64_t acked_features;
> 
> Is the comment still valid?

The comment is still valid in this patch. But the
implementation in this patch is inelegant. I plan
to rewrite this patch.

> 
> > @@ -64,7 +65,7 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
> >      }
> >  }
> >  
> > -static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
> > +static int vhost_user_start(int queues, NetClientState *ncs[], void *be)
> >  {
> >      VhostNetOptions options;
> >      struct vhost_net *net = NULL;
> 
> Type safety going away here. This is actually pretty scary:
> are we sure no users cast this pointer to CharBackend?
> 
> For example it seems that vhost_user_init does exactly that.
> 
> Need to find a way to add type safety before making
> such a change.

I have changed vhost_user_init() to cast this pointer
to the new type (VhostUser) in this patch. But my bad,
I shouldn't change the type to 'void *'. Will fix this.

Best regards,
Tiwei Bie

> 
> 
> > @@ -158,7 +159,7 @@ static void vhost_user_cleanup(NetClientState *nc)
> >              g_source_remove(s->watch);
> >              s->watch = 0;
> >          }
> > -        qemu_chr_fe_deinit(&s->chr, true);
> > +        qemu_chr_fe_deinit(&s->vhost_user.chr, true);
> >      }
> >  
> >      qemu_purge_queued_packets(nc);
> > @@ -192,7 +193,7 @@ static gboolean net_vhost_user_watch(GIOChannel *chan, GIOCondition cond,
> >  {
> >      VhostUserState *s = opaque;
> >  
> > -    qemu_chr_fe_disconnect(&s->chr);
> > +    qemu_chr_fe_disconnect(&s->vhost_user.chr);
> >  
> >      return TRUE;
> >  }
> > @@ -217,7 +218,8 @@ static void chr_closed_bh(void *opaque)
> >      qmp_set_link(name, false, &err);
> >      vhost_user_stop(queues, ncs);
> >  
> > -    qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, net_vhost_user_event,
> > +    qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL,
> > +                             net_vhost_user_event,
> >                               NULL, opaque, NULL, true);
> >  
> >      if (err) {
> > @@ -240,15 +242,15 @@ static void net_vhost_user_event(void *opaque, int event)
> >      assert(queues < MAX_QUEUE_NUM);
> >  
> >      s = DO_UPCAST(VhostUserState, nc, ncs[0]);
> > -    chr = qemu_chr_fe_get_driver(&s->chr);
> > +    chr = qemu_chr_fe_get_driver(&s->vhost_user.chr);
> >      trace_vhost_user_event(chr->label, event);
> >      switch (event) {
> >      case CHR_EVENT_OPENED:
> > -        if (vhost_user_start(queues, ncs, &s->chr) < 0) {
> > -            qemu_chr_fe_disconnect(&s->chr);
> > +        if (vhost_user_start(queues, ncs, &s->vhost_user) < 0) {
> > +            qemu_chr_fe_disconnect(&s->vhost_user.chr);
> >              return;
> >          }
> > -        s->watch = qemu_chr_fe_add_watch(&s->chr, G_IO_HUP,
> > +        s->watch = qemu_chr_fe_add_watch(&s->vhost_user.chr, G_IO_HUP,
> >                                           net_vhost_user_watch, s);
> >          qmp_set_link(name, true, &err);
> >          s->started = true;
> > @@ -264,8 +266,8 @@ static void net_vhost_user_event(void *opaque, int event)
> >  
> >              g_source_remove(s->watch);
> >              s->watch = 0;
> > -            qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
> > -                                     NULL, NULL, false);
> > +            qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL, NULL,
> > +                                     NULL, NULL, NULL, false);
> >  
> >              aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
> >          }
> > @@ -297,7 +299,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> >          if (!nc0) {
> >              nc0 = nc;
> >              s = DO_UPCAST(VhostUserState, nc, nc);
> > -            if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
> > +            if (!qemu_chr_fe_init(&s->vhost_user.chr, chr, &err)) {
> >                  error_report_err(err);
> >                  return -1;
> >              }
> > @@ -307,11 +309,11 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> >  
> >      s = DO_UPCAST(VhostUserState, nc, nc0);
> >      do {
> > -        if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
> > +        if (qemu_chr_fe_wait_connected(&s->vhost_user.chr, &err) < 0) {
> >              error_report_err(err);
> >              return -1;
> >          }
> > -        qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > +        qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL,
> >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> >                                   true);
> >      } while (!s->started);
> > -- 
> > 2.11.0
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 2/6] vhost-user: introduce shared vhost-user state
@ 2018-03-27 13:32       ` Tiwei Bie
  0 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-27 13:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Mar 22, 2018 at 05:13:41PM +0200, Michael S. Tsirkin wrote:
> On Mon, Mar 19, 2018 at 03:15:33PM +0800, Tiwei Bie wrote:
> > @@ -22,7 +23,7 @@
> >  
> >  typedef struct VhostUserState {
> >      NetClientState nc;
> > -    CharBackend chr; /* only queue index 0 */
> > +    VhostUser vhost_user; /* only queue index 0 */
> >      VHostNetState *vhost_net;
> >      guint watch;
> >      uint64_t acked_features;
> 
> Is the comment still valid?

The comment is still valid in this patch. But the
implementation in this patch is inelegant. I plan
to rewrite this patch.

> 
> > @@ -64,7 +65,7 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
> >      }
> >  }
> >  
> > -static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
> > +static int vhost_user_start(int queues, NetClientState *ncs[], void *be)
> >  {
> >      VhostNetOptions options;
> >      struct vhost_net *net = NULL;
> 
> Type safety going away here. This is actually pretty scary:
> are we sure no users cast this pointer to CharBackend?
> 
> For example it seems that vhost_user_init does exactly that.
> 
> Need to find a way to add type safety before making
> such a change.

I have changed vhost_user_init() to cast this pointer
to the new type (VhostUser) in this patch. But my bad,
I shouldn't change the type to 'void *'. Will fix this.

Best regards,
Tiwei Bie

> 
> 
> > @@ -158,7 +159,7 @@ static void vhost_user_cleanup(NetClientState *nc)
> >              g_source_remove(s->watch);
> >              s->watch = 0;
> >          }
> > -        qemu_chr_fe_deinit(&s->chr, true);
> > +        qemu_chr_fe_deinit(&s->vhost_user.chr, true);
> >      }
> >  
> >      qemu_purge_queued_packets(nc);
> > @@ -192,7 +193,7 @@ static gboolean net_vhost_user_watch(GIOChannel *chan, GIOCondition cond,
> >  {
> >      VhostUserState *s = opaque;
> >  
> > -    qemu_chr_fe_disconnect(&s->chr);
> > +    qemu_chr_fe_disconnect(&s->vhost_user.chr);
> >  
> >      return TRUE;
> >  }
> > @@ -217,7 +218,8 @@ static void chr_closed_bh(void *opaque)
> >      qmp_set_link(name, false, &err);
> >      vhost_user_stop(queues, ncs);
> >  
> > -    qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, net_vhost_user_event,
> > +    qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL,
> > +                             net_vhost_user_event,
> >                               NULL, opaque, NULL, true);
> >  
> >      if (err) {
> > @@ -240,15 +242,15 @@ static void net_vhost_user_event(void *opaque, int event)
> >      assert(queues < MAX_QUEUE_NUM);
> >  
> >      s = DO_UPCAST(VhostUserState, nc, ncs[0]);
> > -    chr = qemu_chr_fe_get_driver(&s->chr);
> > +    chr = qemu_chr_fe_get_driver(&s->vhost_user.chr);
> >      trace_vhost_user_event(chr->label, event);
> >      switch (event) {
> >      case CHR_EVENT_OPENED:
> > -        if (vhost_user_start(queues, ncs, &s->chr) < 0) {
> > -            qemu_chr_fe_disconnect(&s->chr);
> > +        if (vhost_user_start(queues, ncs, &s->vhost_user) < 0) {
> > +            qemu_chr_fe_disconnect(&s->vhost_user.chr);
> >              return;
> >          }
> > -        s->watch = qemu_chr_fe_add_watch(&s->chr, G_IO_HUP,
> > +        s->watch = qemu_chr_fe_add_watch(&s->vhost_user.chr, G_IO_HUP,
> >                                           net_vhost_user_watch, s);
> >          qmp_set_link(name, true, &err);
> >          s->started = true;
> > @@ -264,8 +266,8 @@ static void net_vhost_user_event(void *opaque, int event)
> >  
> >              g_source_remove(s->watch);
> >              s->watch = 0;
> > -            qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
> > -                                     NULL, NULL, false);
> > +            qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL, NULL,
> > +                                     NULL, NULL, NULL, false);
> >  
> >              aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
> >          }
> > @@ -297,7 +299,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> >          if (!nc0) {
> >              nc0 = nc;
> >              s = DO_UPCAST(VhostUserState, nc, nc);
> > -            if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
> > +            if (!qemu_chr_fe_init(&s->vhost_user.chr, chr, &err)) {
> >                  error_report_err(err);
> >                  return -1;
> >              }
> > @@ -307,11 +309,11 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> >  
> >      s = DO_UPCAST(VhostUserState, nc, nc0);
> >      do {
> > -        if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
> > +        if (qemu_chr_fe_wait_connected(&s->vhost_user.chr, &err) < 0) {
> >              error_report_err(err);
> >              return -1;
> >          }
> > -        qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > +        qemu_chr_fe_set_handlers(&s->vhost_user.chr, NULL, NULL,
> >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> >                                   true);
> >      } while (!s->started);
> > -- 
> > 2.11.0
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v2 5/6] vfio: remove DPRINTF() definition from vfio-common.h
  2018-03-22 15:15     ` [virtio-dev] " Michael S. Tsirkin
@ 2018-03-27 13:33       ` Tiwei Bie
  -1 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-27 13:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Mar 22, 2018 at 05:15:30PM +0200, Michael S. Tsirkin wrote:
> On Mon, Mar 19, 2018 at 03:15:36PM +0800, Tiwei Bie wrote:
> > This macro isn't used by any VFIO code. And its name is
> > too generic. The vfio-common.h (in include/hw/vfio) can
> > be included by other modules in QEMU. It can introduce
> > conflicts.
> > 
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> 
> This one can go ahead immediately.
> Try posting as a separate patch.

Got it. Thanks!

Best regards,
Tiwei Bie

> 
> > ---
> >  include/hw/vfio/vfio-common.h | 9 ---------
> >  1 file changed, 9 deletions(-)
> > 
> > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> > index b820f7984c..f6aa4ae959 100644
> > --- a/include/hw/vfio/vfio-common.h
> > +++ b/include/hw/vfio/vfio-common.h
> > @@ -34,15 +34,6 @@
> >  #define ERR_PREFIX "vfio error: %s: "
> >  #define WARN_PREFIX "vfio warning: %s: "
> >  
> > -/*#define DEBUG_VFIO*/
> > -#ifdef DEBUG_VFIO
> > -#define DPRINTF(fmt, ...) \
> > -    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
> > -#else
> > -#define DPRINTF(fmt, ...) \
> > -    do { } while (0)
> > -#endif
> > -
> >  enum {
> >      VFIO_DEVICE_TYPE_PCI = 0,
> >      VFIO_DEVICE_TYPE_PLATFORM = 1,
> > -- 
> > 2.11.0
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 5/6] vfio: remove DPRINTF() definition from vfio-common.h
@ 2018-03-27 13:33       ` Tiwei Bie
  0 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-27 13:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Mar 22, 2018 at 05:15:30PM +0200, Michael S. Tsirkin wrote:
> On Mon, Mar 19, 2018 at 03:15:36PM +0800, Tiwei Bie wrote:
> > This macro isn't used by any VFIO code. And its name is
> > too generic. The vfio-common.h (in include/hw/vfio) can
> > be included by other modules in QEMU. It can introduce
> > conflicts.
> > 
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> 
> This one can go ahead immediately.
> Try posting as a separate patch.

Got it. Thanks!

Best regards,
Tiwei Bie

> 
> > ---
> >  include/hw/vfio/vfio-common.h | 9 ---------
> >  1 file changed, 9 deletions(-)
> > 
> > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> > index b820f7984c..f6aa4ae959 100644
> > --- a/include/hw/vfio/vfio-common.h
> > +++ b/include/hw/vfio/vfio-common.h
> > @@ -34,15 +34,6 @@
> >  #define ERR_PREFIX "vfio error: %s: "
> >  #define WARN_PREFIX "vfio warning: %s: "
> >  
> > -/*#define DEBUG_VFIO*/
> > -#ifdef DEBUG_VFIO
> > -#define DPRINTF(fmt, ...) \
> > -    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
> > -#else
> > -#define DPRINTF(fmt, ...) \
> > -    do { } while (0)
> > -#endif
> > -
> >  enum {
> >      VFIO_DEVICE_TYPE_PCI = 0,
> >      VFIO_DEVICE_TYPE_PLATFORM = 1,
> > -- 
> > 2.11.0
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/6] virtio: support adding sub-regions for notify region
  2018-03-22 14:57     ` [virtio-dev] " Michael S. Tsirkin
@ 2018-03-27 13:47       ` Tiwei Bie
  -1 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-27 13:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Mar 22, 2018 at 04:57:23PM +0200, Michael S. Tsirkin wrote:
> On Mon, Mar 19, 2018 at 03:15:34PM +0800, Tiwei Bie wrote:
[...]
> > +
> > +bool virtio_pci_page_per_vq_enabled(VirtIODevice *vdev)
> > +{
> > +    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
> > +
> > +    if (proxy == NULL) {
> > +        return false;
> > +    }
> > +
> > +    return !!(proxy->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ);
> > +}
> > +
> 
> VIRTIO_PCI_FLAG_PAGE_PER_VQ is not something external users
> should care about. Need to find some other way to express the
> specific requirements.
> 
> In particular do you want to use a host page per VQ?
> 
> This isn't what VIRTIO_PCI_FLAG_PAGE_PER_VQ does - it uses a 4K offset
> which does not match a memory page size on all platforms.

Yeah, right. I'll think about how to deal with this.

> 
> 
> > +int virtio_pci_notify_region_map(VirtIODevice *vdev, int queue_idx,
> > +                                 MemoryRegion *mr)
> > +{
> > +    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
> > +    int offset;
> > +
> > +    if (proxy == NULL || !virtio_pci_modern(proxy)) {
> > +        return -1;
> > +    }
> > +
> > +    offset = virtio_pci_queue_mem_mult(proxy) * queue_idx;
> > +    memory_region_add_subregion(&proxy->notify.mr, offset, mr);
> > +
> > +    return 0;
> > +}
> > +
> > +void virtio_pci_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr)
> > +{
> > +    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
> > +
> > +    if (proxy != NULL) {
> > +        memory_region_del_subregion(&proxy->notify.mr, mr);
> > +    }
> > +}
> > +
> >  static void virtio_pci_pre_plugged(DeviceState *d, Error **errp)
> >  {
> >      VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
> > diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
> > index 813082b0d7..8061133741 100644
> > --- a/hw/virtio/virtio-pci.h
> > +++ b/hw/virtio/virtio-pci.h
> > @@ -213,6 +213,11 @@ static inline void virtio_pci_disable_modern(VirtIOPCIProxy *proxy)
> >      proxy->disable_modern = true;
> >  }
> >  
> > +bool virtio_pci_page_per_vq_enabled(VirtIODevice *vdev);
> > +int virtio_pci_notify_region_map(VirtIODevice *vdev, int queue_idx,
> > +                                 MemoryRegion *mr);
> > +void virtio_pci_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr);
> > +
> >  /*
> >   * virtio-scsi-pci: This extends VirtioPCIProxy.
> >   */
> 
> These are not great APIs unfortunately. Need to come up with generic names.
> E.g. do we register and de-register host notifiers maybe?
> 

I like the name "host notifier". I'll try to use it. Thanks a lot!

> 
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index 006d3d1148..90ee72984c 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -22,6 +22,7 @@
> >  #include "qemu/atomic.h"
> >  #include "hw/virtio/virtio-bus.h"
> >  #include "hw/virtio/virtio-access.h"
> > +#include "hw/virtio/virtio-pci.h"
> >  #include "sysemu/dma.h"
> >  
> >  /*
> > @@ -2681,6 +2682,44 @@ void virtio_device_release_ioeventfd(VirtIODevice *vdev)
> >      virtio_bus_release_ioeventfd(vbus);
> >  }
> >  
> > +bool virtio_device_parent_is_pci_device(VirtIODevice *vdev)
> > +{
> > +    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
> > +    const char *typename = object_get_typename(OBJECT(qbus->parent));
> > +
> > +    return strstr(typename, "pci") != NULL;
> > +}
> > +
> > +bool virtio_device_page_per_vq_enabled(VirtIODevice *vdev)
> > +{
> > +#ifdef CONFIG_VIRTIO_PCI
> > +    if (virtio_device_parent_is_pci_device(vdev)) {
> > +        return virtio_pci_page_per_vq_enabled(vdev);
> > +    }
> > +#endif
> > +    return false;
> > +}
> > +
> 
> A better way to do this is to pass a callback to the bus where each bus can
> implement its own.
> 

It's pretty neat! It helped me get rid of all the
changes to the build scripts. Thanks a lot!

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [virtio-dev] Re: [PATCH v2 3/6] virtio: support adding sub-regions for notify region
@ 2018-03-27 13:47       ` Tiwei Bie
  0 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-27 13:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Mar 22, 2018 at 04:57:23PM +0200, Michael S. Tsirkin wrote:
> On Mon, Mar 19, 2018 at 03:15:34PM +0800, Tiwei Bie wrote:
[...]
> > +
> > +bool virtio_pci_page_per_vq_enabled(VirtIODevice *vdev)
> > +{
> > +    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
> > +
> > +    if (proxy == NULL) {
> > +        return false;
> > +    }
> > +
> > +    return !!(proxy->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ);
> > +}
> > +
> 
> VIRTIO_PCI_FLAG_PAGE_PER_VQ is not something external users
> should care about. Need to find some other way to express the
> specific requirements.
> 
> In particular do you want to use a host page per VQ?
> 
> This isn't what VIRTIO_PCI_FLAG_PAGE_PER_VQ does - it uses a 4K offset
> which does not match a memory page size on all platforms.

Yeah, right. I'll think about how to deal with this.

> 
> 
> > +int virtio_pci_notify_region_map(VirtIODevice *vdev, int queue_idx,
> > +                                 MemoryRegion *mr)
> > +{
> > +    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
> > +    int offset;
> > +
> > +    if (proxy == NULL || !virtio_pci_modern(proxy)) {
> > +        return -1;
> > +    }
> > +
> > +    offset = virtio_pci_queue_mem_mult(proxy) * queue_idx;
> > +    memory_region_add_subregion(&proxy->notify.mr, offset, mr);
> > +
> > +    return 0;
> > +}
> > +
> > +void virtio_pci_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr)
> > +{
> > +    VirtIOPCIProxy *proxy = virtio_device_to_virtio_pci_proxy(vdev);
> > +
> > +    if (proxy != NULL) {
> > +        memory_region_del_subregion(&proxy->notify.mr, mr);
> > +    }
> > +}
> > +
> >  static void virtio_pci_pre_plugged(DeviceState *d, Error **errp)
> >  {
> >      VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
> > diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
> > index 813082b0d7..8061133741 100644
> > --- a/hw/virtio/virtio-pci.h
> > +++ b/hw/virtio/virtio-pci.h
> > @@ -213,6 +213,11 @@ static inline void virtio_pci_disable_modern(VirtIOPCIProxy *proxy)
> >      proxy->disable_modern = true;
> >  }
> >  
> > +bool virtio_pci_page_per_vq_enabled(VirtIODevice *vdev);
> > +int virtio_pci_notify_region_map(VirtIODevice *vdev, int queue_idx,
> > +                                 MemoryRegion *mr);
> > +void virtio_pci_notify_region_unmap(VirtIODevice *vdev, MemoryRegion *mr);
> > +
> >  /*
> >   * virtio-scsi-pci: This extends VirtioPCIProxy.
> >   */
> 
> These are not great APIs unfortunately. Need to come up with generic names.
> E.g. do we register and de-register host notifiers maybe?
> 

I like the name "host notifier". I'll try to use it. Thanks a lot!

> 
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index 006d3d1148..90ee72984c 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -22,6 +22,7 @@
> >  #include "qemu/atomic.h"
> >  #include "hw/virtio/virtio-bus.h"
> >  #include "hw/virtio/virtio-access.h"
> > +#include "hw/virtio/virtio-pci.h"
> >  #include "sysemu/dma.h"
> >  
> >  /*
> > @@ -2681,6 +2682,44 @@ void virtio_device_release_ioeventfd(VirtIODevice *vdev)
> >      virtio_bus_release_ioeventfd(vbus);
> >  }
> >  
> > +bool virtio_device_parent_is_pci_device(VirtIODevice *vdev)
> > +{
> > +    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
> > +    const char *typename = object_get_typename(OBJECT(qbus->parent));
> > +
> > +    return strstr(typename, "pci") != NULL;
> > +}
> > +
> > +bool virtio_device_page_per_vq_enabled(VirtIODevice *vdev)
> > +{
> > +#ifdef CONFIG_VIRTIO_PCI
> > +    if (virtio_device_parent_is_pci_device(vdev)) {
> > +        return virtio_pci_page_per_vq_enabled(vdev);
> > +    }
> > +#endif
> > +    return false;
> > +}
> > +
> 
> A better way to do this is to pass a callback to the bus where each bus can
> implement its own.
> 

It's pretty neat! It helped me get rid of all the
changes to the build scripts. Thanks a lot!

Best regards,
Tiwei Bie

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 6/6] vhost-user: add VFIO based accelerators support
  2018-03-22 16:19     ` [virtio-dev] " Michael S. Tsirkin
@ 2018-03-27 13:59       ` Tiwei Bie
  -1 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-27 13:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Mar 22, 2018 at 06:19:44PM +0200, Michael S. Tsirkin wrote:
> On Mon, Mar 19, 2018 at 03:15:37PM +0800, Tiwei Bie wrote:
[...]
> > 
> > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > index cb3a7595aa..264a58a800 100644
> > --- a/docs/interop/vhost-user.txt
> > +++ b/docs/interop/vhost-user.txt
[...]
> 
> > The accelerator
> > +context will be set for each queue independently. So the page-per-vq property
> > +should also be enabled.
> 
> Backend author is unlikely to know what does page-per-vq property mean.
> 
> Is this intended for users maybe? docs/interop is not the best place
> for user-facing documentation.
> 
> I also wonder:
> 
> 	commit d9997d89a4a09a330a056929d06d4b7b0b7a8239
> 	Author: Marcel Apfelbaum <marcel@redhat.com>
> 	Date:   Wed Sep 7 18:02:25 2016 +0300
> 
> 	    virtio-pci: reduce modern_mem_bar size
> 	    
> 	    Currently each VQ Notification Virtio Capability is allocated
> 	    on a different page. The idea is to enable split drivers within
> 	    guests, however there are no known plans to do that.
> 	    The allocation will result in a 8MB BAR, more than various
> 	    guest firmwares pre-allocates for PCI Bridges hotplug process.
>     
> looks like enabling page per vq will break pci express hotplug.
> I suspect more work is needed to down-size the BAR to # of VQs
> actually supported.
> 

I agree. Maybe we can fix the potential pci express hotplug
issue caused by enabling page-per-vq in another patch set.

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [virtio-dev] Re: [PATCH v2 6/6] vhost-user: add VFIO based accelerators support
@ 2018-03-27 13:59       ` Tiwei Bie
  0 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-27 13:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Mar 22, 2018 at 06:19:44PM +0200, Michael S. Tsirkin wrote:
> On Mon, Mar 19, 2018 at 03:15:37PM +0800, Tiwei Bie wrote:
[...]
> > 
> > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > index cb3a7595aa..264a58a800 100644
> > --- a/docs/interop/vhost-user.txt
> > +++ b/docs/interop/vhost-user.txt
[...]
> 
> > The accelerator
> > +context will be set for each queue independently. So the page-per-vq property
> > +should also be enabled.
> 
> Backend author is unlikely to know what does page-per-vq property mean.
> 
> Is this intended for users maybe? docs/interop is not the best place
> for user-facing documentation.
> 
> I also wonder:
> 
> 	commit d9997d89a4a09a330a056929d06d4b7b0b7a8239
> 	Author: Marcel Apfelbaum <marcel@redhat.com>
> 	Date:   Wed Sep 7 18:02:25 2016 +0300
> 
> 	    virtio-pci: reduce modern_mem_bar size
> 	    
> 	    Currently each VQ Notification Virtio Capability is allocated
> 	    on a different page. The idea is to enable split drivers within
> 	    guests, however there are no known plans to do that.
> 	    The allocation will result in a 8MB BAR, more than various
> 	    guest firmwares pre-allocates for PCI Bridges hotplug process.
>     
> looks like enabling page per vq will break pci express hotplug.
> I suspect more work is needed to down-size the BAR to # of VQs
> actually supported.
> 

I agree. Maybe we can fix the potential pci express hotplug
issue caused by enabling page-per-vq in another patch set.

Best regards,
Tiwei Bie

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators
  2018-03-22 16:40   ` [virtio-dev] " Michael S. Tsirkin
@ 2018-03-28 12:24     ` Tiwei Bie
  -1 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-28 12:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Mar 22, 2018 at 06:40:18PM +0200, Michael S. Tsirkin wrote:
> On Mon, Mar 19, 2018 at 03:15:31PM +0800, Tiwei Bie wrote:
[...]
> > 
> > Below link is an example showing how to setup a such environment
> > via nested VM. In this case, the virtio device in the outer VM is
> > the accelerator. It will be used to accelerate the virtio device
> > in the inner VM. In reality, we could use virtio ring compatible
> > hardware device as the accelerators.
> > 
> > http://dpdk.org/ml/archives/dev/2017-December/085044.html
> 
> I understand that it might be challenging due to
> the tight coupling with VFIO. Still - isn't there
> a way do make it easier to set a testing rig up?
> 
> In particular can we avoid the dpdk requirement for testing?
> 

If we want to try vDPA (e.g. use one virtio device to accelerate
another virtio device of a VM), I think we need vDPA. Otherwise
we will need to write a VFIO based userspace virtio driver and
find another vhost-user backend.

> 
> 
> > In above example, it doesn't require any changes to QEMU, but
> > it has lower performance compared with the traditional VFIO
> > based PCI passthru. And that's the problem this patch set wants
> > to solve.
> > 
> > The performance issue of vDPA/vhost-user and solutions
> > ======================================================
> > 
> > For vhost-user backend, the critical issue in vDPA is that the
> > data path performance is relatively low and some host threads are
> > needed for the data path, because some necessary mechanisms are
> > missing to support:
> > 
> > 1) guest driver notifies the device directly;
> > 2) device interrupts the guest directly;
> > 
> > So this patch set does some small extensions to the vhost-user
> > protocol to make both of them possible. It leverages the same
> > mechanisms (e.g. EPT and Posted-Interrupt on Intel platform) as
> > the PCI passthru.
> 
> Not all platforms support posted interrupts, and EPT isn't
> required for MMIO to be mapped to devices.
> 
> It probably makes sense to separate the more portable
> host notification offload from the less portable
> guest notification offload.
> 

Make sense. I'll split the two types of offloads. Thanks for
the suggestion!

> 
> 
> > A new protocol feature bit is added to negotiate the accelerator
> > feature support. Two new slave message types are added to control
> > the notify region and queue interrupt passthru for each queue.
> > >From the view of vhost-user protocol design, it's very flexible.
> > The passthru can be enabled/disabled for each queue individually,
> > and it's possible to accelerate each queue by different devices.
> > More design and implementation details can be found from the last
> > patch.
> > 
> > Difference between vDPA and PCI passthru
> > ========================================
> > 
> > The key difference between PCI passthru and vDPA is that, in vDPA
> > only the data path of the device (e.g. DMA ring, notify region and
> > queue interrupt) is pass-throughed to the VM, the device control
> > path (e.g. PCI configuration space and MMIO regions) is still
> > defined and emulated by QEMU.
> > 
> > The benefits of keeping virtio device emulation in QEMU compared
> > with virtio device PCI passthru include (but not limit to):
> > 
> > - consistent device interface for guest OS in the VM;
> > - max flexibility on the hardware (i.e. the accelerators) design;
> > - leveraging the existing virtio live-migration framework;
> > 
> > Why extend vhost-user for vDPA
> > ==============================
> > 
> > We have already implemented various virtual switches (e.g. OVS-DPDK)
> > based on vhost-user for VMs in the Cloud. They are purely software
> > running on CPU cores. When we have accelerators for such NFVi applications,
> > it's ideal if the applications could keep using the original interface
> > (i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
> > when and how to switch between CPU and accelerators within the interface.
> > And the switching (i.e. switch between CPU and accelerators) can be done
> > flexibly and quickly inside the applications.
> > 
> > More details about this can be found from the Cunming's discussions on
> > the RFC patch set.
> > 
> > Update notes
> > ============
> > 
> > IOMMU feature bit check is removed in this version, because:
> > 
> > The IOMMU feature is negotiable, when an accelerator is used and
> > it doesn't support virtual IOMMU, its driver just won't provide
> > this feature bit when vhost library querying its features. And if
> > it supports the virtual IOMMU, its driver can provide this feature
> > bit. It's not reasonable to add this limitation in this patch set.
> 
> Fair enough. Still:
> Can hardware on intel platforms actually support IOTLB requests?
> Don't you need to add support for vIOMMU shadowing instead?
> 

For the hardware I have, I guess they can't for now.

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [virtio-dev] Re: [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators
@ 2018-03-28 12:24     ` Tiwei Bie
  0 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-28 12:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Mar 22, 2018 at 06:40:18PM +0200, Michael S. Tsirkin wrote:
> On Mon, Mar 19, 2018 at 03:15:31PM +0800, Tiwei Bie wrote:
[...]
> > 
> > Below link is an example showing how to setup a such environment
> > via nested VM. In this case, the virtio device in the outer VM is
> > the accelerator. It will be used to accelerate the virtio device
> > in the inner VM. In reality, we could use virtio ring compatible
> > hardware device as the accelerators.
> > 
> > http://dpdk.org/ml/archives/dev/2017-December/085044.html
> 
> I understand that it might be challenging due to
> the tight coupling with VFIO. Still - isn't there
> a way do make it easier to set a testing rig up?
> 
> In particular can we avoid the dpdk requirement for testing?
> 

If we want to try vDPA (e.g. use one virtio device to accelerate
another virtio device of a VM), I think we need vDPA. Otherwise
we will need to write a VFIO based userspace virtio driver and
find another vhost-user backend.

> 
> 
> > In above example, it doesn't require any changes to QEMU, but
> > it has lower performance compared with the traditional VFIO
> > based PCI passthru. And that's the problem this patch set wants
> > to solve.
> > 
> > The performance issue of vDPA/vhost-user and solutions
> > ======================================================
> > 
> > For vhost-user backend, the critical issue in vDPA is that the
> > data path performance is relatively low and some host threads are
> > needed for the data path, because some necessary mechanisms are
> > missing to support:
> > 
> > 1) guest driver notifies the device directly;
> > 2) device interrupts the guest directly;
> > 
> > So this patch set does some small extensions to the vhost-user
> > protocol to make both of them possible. It leverages the same
> > mechanisms (e.g. EPT and Posted-Interrupt on Intel platform) as
> > the PCI passthru.
> 
> Not all platforms support posted interrupts, and EPT isn't
> required for MMIO to be mapped to devices.
> 
> It probably makes sense to separate the more portable
> host notification offload from the less portable
> guest notification offload.
> 

Make sense. I'll split the two types of offloads. Thanks for
the suggestion!

> 
> 
> > A new protocol feature bit is added to negotiate the accelerator
> > feature support. Two new slave message types are added to control
> > the notify region and queue interrupt passthru for each queue.
> > >From the view of vhost-user protocol design, it's very flexible.
> > The passthru can be enabled/disabled for each queue individually,
> > and it's possible to accelerate each queue by different devices.
> > More design and implementation details can be found from the last
> > patch.
> > 
> > Difference between vDPA and PCI passthru
> > ========================================
> > 
> > The key difference between PCI passthru and vDPA is that, in vDPA
> > only the data path of the device (e.g. DMA ring, notify region and
> > queue interrupt) is pass-throughed to the VM, the device control
> > path (e.g. PCI configuration space and MMIO regions) is still
> > defined and emulated by QEMU.
> > 
> > The benefits of keeping virtio device emulation in QEMU compared
> > with virtio device PCI passthru include (but not limit to):
> > 
> > - consistent device interface for guest OS in the VM;
> > - max flexibility on the hardware (i.e. the accelerators) design;
> > - leveraging the existing virtio live-migration framework;
> > 
> > Why extend vhost-user for vDPA
> > ==============================
> > 
> > We have already implemented various virtual switches (e.g. OVS-DPDK)
> > based on vhost-user for VMs in the Cloud. They are purely software
> > running on CPU cores. When we have accelerators for such NFVi applications,
> > it's ideal if the applications could keep using the original interface
> > (i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
> > when and how to switch between CPU and accelerators within the interface.
> > And the switching (i.e. switch between CPU and accelerators) can be done
> > flexibly and quickly inside the applications.
> > 
> > More details about this can be found from the Cunming's discussions on
> > the RFC patch set.
> > 
> > Update notes
> > ============
> > 
> > IOMMU feature bit check is removed in this version, because:
> > 
> > The IOMMU feature is negotiable, when an accelerator is used and
> > it doesn't support virtual IOMMU, its driver just won't provide
> > this feature bit when vhost library querying its features. And if
> > it supports the virtual IOMMU, its driver can provide this feature
> > bit. It's not reasonable to add this limitation in this patch set.
> 
> Fair enough. Still:
> Can hardware on intel platforms actually support IOTLB requests?
> Don't you need to add support for vIOMMU shadowing instead?
> 

For the hardware I have, I guess they can't for now.

Best regards,
Tiwei Bie

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators
  2018-03-28 12:24     ` [virtio-dev] " Tiwei Bie
@ 2018-03-28 15:33       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-03-28 15:33 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Wed, Mar 28, 2018 at 08:24:07PM +0800, Tiwei Bie wrote:
> > > Update notes
> > > ============
> > > 
> > > IOMMU feature bit check is removed in this version, because:
> > > 
> > > The IOMMU feature is negotiable, when an accelerator is used and
> > > it doesn't support virtual IOMMU, its driver just won't provide
> > > this feature bit when vhost library querying its features. And if
> > > it supports the virtual IOMMU, its driver can provide this feature
> > > bit. It's not reasonable to add this limitation in this patch set.
> > 
> > Fair enough. Still:
> > Can hardware on intel platforms actually support IOTLB requests?
> > Don't you need to add support for vIOMMU shadowing instead?
> > 
> 
> For the hardware I have, I guess they can't for now.

So VFIO in QEMU has support for vIOMMU shadowing.
Can you use that somehow?

Ability to run dpdk within guest seems important.

-- 
MST

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [virtio-dev] Re: [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators
@ 2018-03-28 15:33       ` Michael S. Tsirkin
  0 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-03-28 15:33 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Wed, Mar 28, 2018 at 08:24:07PM +0800, Tiwei Bie wrote:
> > > Update notes
> > > ============
> > > 
> > > IOMMU feature bit check is removed in this version, because:
> > > 
> > > The IOMMU feature is negotiable, when an accelerator is used and
> > > it doesn't support virtual IOMMU, its driver just won't provide
> > > this feature bit when vhost library querying its features. And if
> > > it supports the virtual IOMMU, its driver can provide this feature
> > > bit. It's not reasonable to add this limitation in this patch set.
> > 
> > Fair enough. Still:
> > Can hardware on intel platforms actually support IOTLB requests?
> > Don't you need to add support for vIOMMU shadowing instead?
> > 
> 
> For the hardware I have, I guess they can't for now.

So VFIO in QEMU has support for vIOMMU shadowing.
Can you use that somehow?

Ability to run dpdk within guest seems important.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators
  2018-03-28 15:33       ` [virtio-dev] " Michael S. Tsirkin
@ 2018-03-29  3:33         ` Tiwei Bie
  -1 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-29  3:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Wed, Mar 28, 2018 at 06:33:01PM +0300, Michael S. Tsirkin wrote:
> On Wed, Mar 28, 2018 at 08:24:07PM +0800, Tiwei Bie wrote:
> > > > Update notes
> > > > ============
> > > > 
> > > > IOMMU feature bit check is removed in this version, because:
> > > > 
> > > > The IOMMU feature is negotiable, when an accelerator is used and
> > > > it doesn't support virtual IOMMU, its driver just won't provide
> > > > this feature bit when vhost library querying its features. And if
> > > > it supports the virtual IOMMU, its driver can provide this feature
> > > > bit. It's not reasonable to add this limitation in this patch set.
> > > 
> > > Fair enough. Still:
> > > Can hardware on intel platforms actually support IOTLB requests?
> > > Don't you need to add support for vIOMMU shadowing instead?
> > > 
> > 
> > For the hardware I have, I guess they can't for now.
> 
> So VFIO in QEMU has support for vIOMMU shadowing.
> Can you use that somehow?

Yeah, I guess we can use it in some way. Actually supporting
vIOMMU is a quite interesting feature. It would provide
better security, and for the hardware backend case there
would be no performance penalty with static mapping after
the backend got all the mappings. I think it could be done
as another work. Based on your previous suggestion in this
thread, I have split the guest notification offload and host
notification offload (I'll send the new version very soon).
And I plan to let this patch set just focus on fixing the
most critical performance issue - the host notification offload.
With this fix, using hardware backend in vhost-user could get
a very big performance boost and become much more practicable.
So maybe we can focus on fixing this critical performance issue
first. How do you think?

> 
> Ability to run dpdk within guest seems important.

I think vIOMMU isn't a must to run DPDK in guest. For Linux
guest we also have igb_uio and uio_pci_generic to run DPDK,
for FreeBSD guest we have nic_uio. They don't need vIOMMU,
and they could offer the best performance.

Best regards,
Tiwei Bie

> 
> -- 
> MST
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators
@ 2018-03-29  3:33         ` Tiwei Bie
  0 siblings, 0 replies; 46+ messages in thread
From: Tiwei Bie @ 2018-03-29  3:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Wed, Mar 28, 2018 at 06:33:01PM +0300, Michael S. Tsirkin wrote:
> On Wed, Mar 28, 2018 at 08:24:07PM +0800, Tiwei Bie wrote:
> > > > Update notes
> > > > ============
> > > > 
> > > > IOMMU feature bit check is removed in this version, because:
> > > > 
> > > > The IOMMU feature is negotiable, when an accelerator is used and
> > > > it doesn't support virtual IOMMU, its driver just won't provide
> > > > this feature bit when vhost library querying its features. And if
> > > > it supports the virtual IOMMU, its driver can provide this feature
> > > > bit. It's not reasonable to add this limitation in this patch set.
> > > 
> > > Fair enough. Still:
> > > Can hardware on intel platforms actually support IOTLB requests?
> > > Don't you need to add support for vIOMMU shadowing instead?
> > > 
> > 
> > For the hardware I have, I guess they can't for now.
> 
> So VFIO in QEMU has support for vIOMMU shadowing.
> Can you use that somehow?

Yeah, I guess we can use it in some way. Actually supporting
vIOMMU is a quite interesting feature. It would provide
better security, and for the hardware backend case there
would be no performance penalty with static mapping after
the backend got all the mappings. I think it could be done
as another work. Based on your previous suggestion in this
thread, I have split the guest notification offload and host
notification offload (I'll send the new version very soon).
And I plan to let this patch set just focus on fixing the
most critical performance issue - the host notification offload.
With this fix, using hardware backend in vhost-user could get
a very big performance boost and become much more practicable.
So maybe we can focus on fixing this critical performance issue
first. How do you think?

> 
> Ability to run dpdk within guest seems important.

I think vIOMMU isn't a must to run DPDK in guest. For Linux
guest we also have igb_uio and uio_pci_generic to run DPDK,
for FreeBSD guest we have nic_uio. They don't need vIOMMU,
and they could offer the best performance.

Best regards,
Tiwei Bie

> 
> -- 
> MST
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators
  2018-03-29  3:33         ` Tiwei Bie
@ 2018-03-29  4:16           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-03-29  4:16 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Mar 29, 2018 at 11:33:29AM +0800, Tiwei Bie wrote:
> On Wed, Mar 28, 2018 at 06:33:01PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Mar 28, 2018 at 08:24:07PM +0800, Tiwei Bie wrote:
> > > > > Update notes
> > > > > ============
> > > > > 
> > > > > IOMMU feature bit check is removed in this version, because:
> > > > > 
> > > > > The IOMMU feature is negotiable, when an accelerator is used and
> > > > > it doesn't support virtual IOMMU, its driver just won't provide
> > > > > this feature bit when vhost library querying its features. And if
> > > > > it supports the virtual IOMMU, its driver can provide this feature
> > > > > bit. It's not reasonable to add this limitation in this patch set.
> > > > 
> > > > Fair enough. Still:
> > > > Can hardware on intel platforms actually support IOTLB requests?
> > > > Don't you need to add support for vIOMMU shadowing instead?
> > > > 
> > > 
> > > For the hardware I have, I guess they can't for now.
> > 
> > So VFIO in QEMU has support for vIOMMU shadowing.
> > Can you use that somehow?
> 
> Yeah, I guess we can use it in some way. Actually supporting
> vIOMMU is a quite interesting feature. It would provide
> better security, and for the hardware backend case there
> would be no performance penalty with static mapping after
> the backend got all the mappings. I think it could be done
> as another work. Based on your previous suggestion in this
> thread, I have split the guest notification offload and host
> notification offload (I'll send the new version very soon).
> And I plan to let this patch set just focus on fixing the
> most critical performance issue - the host notification offload.
> With this fix, using hardware backend in vhost-user could get
> a very big performance boost and become much more practicable.
> So maybe we can focus on fixing this critical performance issue
> first. How do you think?

I think correctness and security go first before performance.
vIOMMU goes under security.

> > 
> > Ability to run dpdk within guest seems important.
> 
> I think vIOMMU isn't a must to run DPDK in guest.

Oh yes it is.

> For Linux
> guest we also have igb_uio and uio_pci_generic to run DPDK,
> for FreeBSD guest we have nic_uio.

These hacks offer no protection from a buggy userspace corrupting guest
kernel memory. Given DPDK is routinely linked into closed source
applications, this is not a configuration anyone can support.


> They don't need vIOMMU,
> and they could offer the best performance.
> 
> Best regards,
> Tiwei Bie
> 
> > 
> > -- 
> > MST
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators
@ 2018-03-29  4:16           ` Michael S. Tsirkin
  0 siblings, 0 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2018-03-29  4:16 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: qemu-devel, virtio-dev, alex.williamson, jasowang, pbonzini,
	stefanha, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Mar 29, 2018 at 11:33:29AM +0800, Tiwei Bie wrote:
> On Wed, Mar 28, 2018 at 06:33:01PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Mar 28, 2018 at 08:24:07PM +0800, Tiwei Bie wrote:
> > > > > Update notes
> > > > > ============
> > > > > 
> > > > > IOMMU feature bit check is removed in this version, because:
> > > > > 
> > > > > The IOMMU feature is negotiable, when an accelerator is used and
> > > > > it doesn't support virtual IOMMU, its driver just won't provide
> > > > > this feature bit when vhost library querying its features. And if
> > > > > it supports the virtual IOMMU, its driver can provide this feature
> > > > > bit. It's not reasonable to add this limitation in this patch set.
> > > > 
> > > > Fair enough. Still:
> > > > Can hardware on intel platforms actually support IOTLB requests?
> > > > Don't you need to add support for vIOMMU shadowing instead?
> > > > 
> > > 
> > > For the hardware I have, I guess they can't for now.
> > 
> > So VFIO in QEMU has support for vIOMMU shadowing.
> > Can you use that somehow?
> 
> Yeah, I guess we can use it in some way. Actually supporting
> vIOMMU is a quite interesting feature. It would provide
> better security, and for the hardware backend case there
> would be no performance penalty with static mapping after
> the backend got all the mappings. I think it could be done
> as another work. Based on your previous suggestion in this
> thread, I have split the guest notification offload and host
> notification offload (I'll send the new version very soon).
> And I plan to let this patch set just focus on fixing the
> most critical performance issue - the host notification offload.
> With this fix, using hardware backend in vhost-user could get
> a very big performance boost and become much more practicable.
> So maybe we can focus on fixing this critical performance issue
> first. How do you think?

I think correctness and security go first before performance.
vIOMMU goes under security.

> > 
> > Ability to run dpdk within guest seems important.
> 
> I think vIOMMU isn't a must to run DPDK in guest.

Oh yes it is.

> For Linux
> guest we also have igb_uio and uio_pci_generic to run DPDK,
> for FreeBSD guest we have nic_uio.

These hacks offer no protection from a buggy userspace corrupting guest
kernel memory. Given DPDK is routinely linked into closed source
applications, this is not a configuration anyone can support.


> They don't need vIOMMU,
> and they could offer the best performance.
> 
> Best regards,
> Tiwei Bie
> 
> > 
> > -- 
> > MST
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2018-03-29  4:16 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-19  7:15 [Qemu-devel] [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators Tiwei Bie
2018-03-19  7:15 ` [virtio-dev] " Tiwei Bie
2018-03-19  7:15 ` [Qemu-devel] [PATCH v2 1/6] vhost-user: support receiving file descriptors in slave_read Tiwei Bie
2018-03-19  7:15   ` [virtio-dev] " Tiwei Bie
2018-03-19  7:15 ` [Qemu-devel] [PATCH v2 2/6] vhost-user: introduce shared vhost-user state Tiwei Bie
2018-03-19  7:15   ` [virtio-dev] " Tiwei Bie
2018-03-22 15:13   ` [Qemu-devel] " Michael S. Tsirkin
2018-03-22 15:13     ` [virtio-dev] " Michael S. Tsirkin
2018-03-27 13:32     ` [Qemu-devel] " Tiwei Bie
2018-03-27 13:32       ` Tiwei Bie
2018-03-19  7:15 ` [Qemu-devel] [PATCH v2 3/6] virtio: support adding sub-regions for notify region Tiwei Bie
2018-03-19  7:15   ` [virtio-dev] " Tiwei Bie
2018-03-22 14:57   ` [Qemu-devel] " Michael S. Tsirkin
2018-03-22 14:57     ` [virtio-dev] " Michael S. Tsirkin
2018-03-27 13:47     ` [Qemu-devel] " Tiwei Bie
2018-03-27 13:47       ` [virtio-dev] " Tiwei Bie
2018-03-19  7:15 ` [Qemu-devel] [PATCH v2 4/6] vfio: support getting VFIOGroup from groupfd Tiwei Bie
2018-03-19  7:15   ` [virtio-dev] " Tiwei Bie
2018-03-19  7:15 ` [Qemu-devel] [PATCH v2 5/6] vfio: remove DPRINTF() definition from vfio-common.h Tiwei Bie
2018-03-19  7:15   ` [virtio-dev] " Tiwei Bie
2018-03-22 15:15   ` [Qemu-devel] " Michael S. Tsirkin
2018-03-22 15:15     ` [virtio-dev] " Michael S. Tsirkin
2018-03-27 13:33     ` [Qemu-devel] " Tiwei Bie
2018-03-27 13:33       ` Tiwei Bie
2018-03-19  7:15 ` [Qemu-devel] [PATCH v2 6/6] vhost-user: add VFIO based accelerators support Tiwei Bie
2018-03-19  7:15   ` [virtio-dev] " Tiwei Bie
2018-03-22 16:19   ` [Qemu-devel] " Michael S. Tsirkin
2018-03-22 16:19     ` [virtio-dev] " Michael S. Tsirkin
2018-03-27 11:06     ` [Qemu-devel] " Tiwei Bie
2018-03-27 11:06       ` [virtio-dev] " Tiwei Bie
2018-03-27 13:59     ` [Qemu-devel] " Tiwei Bie
2018-03-27 13:59       ` [virtio-dev] " Tiwei Bie
2018-03-22 14:55 ` [Qemu-devel] [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators Michael S. Tsirkin
2018-03-22 14:55   ` [virtio-dev] " Michael S. Tsirkin
2018-03-23  8:54   ` [Qemu-devel] " Tiwei Bie
2018-03-23  8:54     ` [virtio-dev] " Tiwei Bie
2018-03-22 16:40 ` [Qemu-devel] " Michael S. Tsirkin
2018-03-22 16:40   ` [virtio-dev] " Michael S. Tsirkin
2018-03-28 12:24   ` [Qemu-devel] " Tiwei Bie
2018-03-28 12:24     ` [virtio-dev] " Tiwei Bie
2018-03-28 15:33     ` [Qemu-devel] " Michael S. Tsirkin
2018-03-28 15:33       ` [virtio-dev] " Michael S. Tsirkin
2018-03-29  3:33       ` [Qemu-devel] " Tiwei Bie
2018-03-29  3:33         ` Tiwei Bie
2018-03-29  4:16         ` [Qemu-devel] " Michael S. Tsirkin
2018-03-29  4:16           ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.