[Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support
@ 2017-05-26 14:28 Maxime Coquelin
  2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 1/6] vhost: propagate errors in vhost_device_iotlb_miss() Maxime Coquelin
                   ` (7 more replies)
  0 siblings, 8 replies; 28+ messages in thread
From: Maxime Coquelin @ 2017-05-26 14:28 UTC (permalink / raw)
  To: mst, peterx, marcandre.lureau, vkaplans, jasowang, wexu,
	yuanhan.liu, qemu-devel, jfreiman
  Cc: Maxime Coquelin

This series aims at specifying ans implementing the protocol update
required to support device IOTLB with user backends.

In this second non-RFC version, main changes are:
 - spec fixes and clarification
 - rings information update has been restored back to ring enablement time
 - Work around GCC 4.4.7 limitation wrt assignment in unnamed union at
declaration time.

The series can be tested with vhost_iotlb_proto_v2 branch on my gitlab
account[0].

The slave requests channel part is re-used from Marc-André's series submitted
last year[1], with main changes from original version being request/feature
names renaming and addition of the REPLY_ACK feature support.

Regarding IOTLB protocol, one noticeable change is the IOTLB miss request
reply made optionnal (i.e. only if slave requests it by setting the
VHOST_USER_NEED_REPLY flag in the message header). This change provides
more flexibility in the backend implementation of the feature.

The protocol is very close to kernel backends, except that a new
communication channel is introduced to enable the slave to send
requests to the master.

[0]: https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_iotlb_proto_v2
[1]: https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg00095.html

Marc-André Lureau (2):
  vhost-user: add vhost_user to hold the chr
  vhost-user: add slave-req-fd support

Maxime Coquelin (4):
  vhost: propagate errors in vhost_device_iotlb_miss()
  vhost: rework IOTLB messaging
  vhost: extend ring information update for IOTLB to all rings
  spec/vhost-user spec: Add IOMMU support

 docs/specs/vhost-user.txt         | 118 ++++++++++++++++++++++++-
 hw/virtio/vhost-backend.c         | 130 ++++++++++++++++------------
 hw/virtio/vhost-user.c            | 177 +++++++++++++++++++++++++++++++++++++-
 hw/virtio/vhost.c                 |  27 ++++--
 include/hw/virtio/vhost-backend.h |  23 +++--
 include/hw/virtio/vhost.h         |   2 +-
 6 files changed, 397 insertions(+), 80 deletions(-)

-- 
2.9.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2 1/6] vhost: propagate errors in vhost_device_iotlb_miss()
  2017-05-26 14:28 [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support Maxime Coquelin
@ 2017-05-26 14:28 ` Maxime Coquelin
  2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 2/6] vhost: rework IOTLB messaging Maxime Coquelin
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 28+ messages in thread
From: Maxime Coquelin @ 2017-05-26 14:28 UTC (permalink / raw)
  To: mst, peterx, marcandre.lureau, vkaplans, jasowang, wexu,
	yuanhan.liu, qemu-devel, jfreiman
  Cc: Maxime Coquelin

Some backends might want to know when things went wrong.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 hw/virtio/vhost.c         | 15 ++++++++++-----
 include/hw/virtio/vhost.h |  2 +-
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 03a46a7..8fab12d 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -971,18 +971,20 @@ static int vhost_memory_region_lookup(struct vhost_dev *hdev,
     return -EFAULT;
 }
 
-void vhost_device_iotlb_miss(struct vhost_dev *dev, uint64_t iova, int write)
+int vhost_device_iotlb_miss(struct vhost_dev *dev, uint64_t iova, int write)
 {
     IOMMUTLBEntry iotlb;
     uint64_t uaddr, len;
+    int ret = -EFAULT;
 
     rcu_read_lock();
 
     iotlb = address_space_get_iotlb_entry(dev->vdev->dma_as,
                                           iova, write);
     if (iotlb.target_as != NULL) {
-        if (vhost_memory_region_lookup(dev, iotlb.translated_addr,
-                                       &uaddr, &len)) {
+        ret = vhost_memory_region_lookup(dev, iotlb.translated_addr,
+                                         &uaddr, &len);
+        if (ret) {
             error_report("Fail to lookup the translated address "
                          "%"PRIx64, iotlb.translated_addr);
             goto out;
@@ -991,14 +993,17 @@ void vhost_device_iotlb_miss(struct vhost_dev *dev, uint64_t iova, int write)
         len = MIN(iotlb.addr_mask + 1, len);
         iova = iova & ~iotlb.addr_mask;
 
-        if (dev->vhost_ops->vhost_update_device_iotlb(dev, iova, uaddr,
-                                                      len, iotlb.perm)) {
+        ret = dev->vhost_ops->vhost_update_device_iotlb(dev, iova, uaddr,
+                                                      len, iotlb.perm);
+        if (ret) {
             error_report("Fail to update device iotlb");
             goto out;
         }
     }
 out:
     rcu_read_unlock();
+
+    return ret;
 }
 
 static int vhost_virtqueue_start(struct vhost_dev *dev,
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index a450321..467dc77 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -105,5 +105,5 @@ bool vhost_has_free_slot(void);
 int vhost_net_set_backend(struct vhost_dev *hdev,
                           struct vhost_vring_file *file);
 
-void vhost_device_iotlb_miss(struct vhost_dev *dev, uint64_t iova, int write);
+int vhost_device_iotlb_miss(struct vhost_dev *dev, uint64_t iova, int write);
 #endif
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2 2/6] vhost: rework IOTLB messaging
  2017-05-26 14:28 [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support Maxime Coquelin
  2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 1/6] vhost: propagate errors in vhost_device_iotlb_miss() Maxime Coquelin
@ 2017-05-26 14:28 ` Maxime Coquelin
  2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 3/6] vhost: extend ring information update for IOTLB to all rings Maxime Coquelin
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 28+ messages in thread
From: Maxime Coquelin @ 2017-05-26 14:28 UTC (permalink / raw)
  To: mst, peterx, marcandre.lureau, vkaplans, jasowang, wexu,
	yuanhan.liu, qemu-devel, jfreiman
  Cc: Maxime Coquelin

This patch reworks IOTLB messaging to prepare for vhost-user
device IOTLB support.

IOTLB messages handling is extracted from vhost-kernel backend,
so that only the messages transport remains backend specifics.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
v2: Work around GCC 4.4.7 limitation wrt to assignment at declaration time
with unnamed unions:
https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg02704.html

 hw/virtio/vhost-backend.c         | 130 +++++++++++++++++++++-----------------
 hw/virtio/vhost.c                 |   8 +--
 include/hw/virtio/vhost-backend.h |  23 ++++---
 3 files changed, 92 insertions(+), 69 deletions(-)

diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index be927b8..4e31de1 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -192,7 +192,6 @@ static void vhost_kernel_iotlb_read(void *opaque)
     ssize_t len;
 
     while ((len = read((uintptr_t)dev->opaque, &msg, sizeof msg)) > 0) {
-        struct vhost_iotlb_msg *imsg = &msg.iotlb;
         if (len < sizeof msg) {
             error_report("Wrong vhost message len: %d", (int)len);
             break;
@@ -201,70 +200,21 @@ static void vhost_kernel_iotlb_read(void *opaque)
             error_report("Unknown vhost iotlb message type");
             break;
         }
-        switch (imsg->type) {
-        case VHOST_IOTLB_MISS:
-            vhost_device_iotlb_miss(dev, imsg->iova,
-                                    imsg->perm != VHOST_ACCESS_RO);
-            break;
-        case VHOST_IOTLB_UPDATE:
-        case VHOST_IOTLB_INVALIDATE:
-            error_report("Unexpected IOTLB message type");
-            break;
-        case VHOST_IOTLB_ACCESS_FAIL:
-            /* FIXME: report device iotlb error */
-            break;
-        default:
-            break;
-        }
-    }
-}
 
-static int vhost_kernel_update_device_iotlb(struct vhost_dev *dev,
-                                            uint64_t iova, uint64_t uaddr,
-                                            uint64_t len,
-                                            IOMMUAccessFlags perm)
-{
-    struct vhost_msg msg;
-    msg.type = VHOST_IOTLB_MSG;
-    msg.iotlb.iova =  iova;
-    msg.iotlb.uaddr = uaddr;
-    msg.iotlb.size = len;
-    msg.iotlb.type = VHOST_IOTLB_UPDATE;
-
-    switch (perm) {
-    case IOMMU_RO:
-        msg.iotlb.perm = VHOST_ACCESS_RO;
-        break;
-    case IOMMU_WO:
-        msg.iotlb.perm = VHOST_ACCESS_WO;
-        break;
-    case IOMMU_RW:
-        msg.iotlb.perm = VHOST_ACCESS_RW;
-        break;
-    default:
-        g_assert_not_reached();
-    }
-
-    if (write((uintptr_t)dev->opaque, &msg, sizeof msg) != sizeof msg) {
-        error_report("Fail to update device iotlb");
-        return -EFAULT;
+        vhost_backend_handle_iotlb_msg(dev, &msg.iotlb);
     }
-
-    return 0;
 }
 
-static int vhost_kernel_invalidate_device_iotlb(struct vhost_dev *dev,
-                                                uint64_t iova, uint64_t len)
+static int vhost_kernel_send_device_iotlb_msg(struct vhost_dev *dev,
+                                              struct vhost_iotlb_msg *imsg)
 {
     struct vhost_msg msg;
 
     msg.type = VHOST_IOTLB_MSG;
-    msg.iotlb.iova = iova;
-    msg.iotlb.size = len;
-    msg.iotlb.type = VHOST_IOTLB_INVALIDATE;
+    msg.iotlb = *imsg;
 
     if (write((uintptr_t)dev->opaque, &msg, sizeof msg) != sizeof msg) {
-        error_report("Fail to invalidate device iotlb");
+        error_report("Fail to update device iotlb");
         return -EFAULT;
     }
 
@@ -311,8 +261,7 @@ static const VhostOps kernel_ops = {
         .vhost_vsock_set_running = vhost_kernel_vsock_set_running,
 #endif /* CONFIG_VHOST_VSOCK */
         .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
-        .vhost_update_device_iotlb = vhost_kernel_update_device_iotlb,
-        .vhost_invalidate_device_iotlb = vhost_kernel_invalidate_device_iotlb,
+        .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg,
 };
 
 int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
@@ -333,3 +282,70 @@ int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
 
     return r;
 }
+
+int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
+                                             uint64_t iova, uint64_t uaddr,
+                                             uint64_t len,
+                                             IOMMUAccessFlags perm)
+{
+    struct vhost_iotlb_msg imsg;
+
+    imsg.iova =  iova;
+    imsg.uaddr = uaddr;
+    imsg.size = len;
+    imsg.type = VHOST_IOTLB_UPDATE;
+
+    switch (perm) {
+    case IOMMU_RO:
+        imsg.perm = VHOST_ACCESS_RO;
+        break;
+    case IOMMU_WO:
+        imsg.perm = VHOST_ACCESS_WO;
+        break;
+    case IOMMU_RW:
+        imsg.perm = VHOST_ACCESS_RW;
+        break;
+    default:
+        return -EINVAL;
+    }
+
+    return dev->vhost_ops->vhost_send_device_iotlb_msg(dev, &imsg);
+}
+
+int vhost_backend_invalidate_device_iotlb(struct vhost_dev *dev,
+                                                 uint64_t iova, uint64_t len)
+{
+    struct vhost_iotlb_msg imsg;
+
+    imsg.iova = iova;
+    imsg.size = len;
+    imsg.type = VHOST_IOTLB_INVALIDATE;
+
+    return dev->vhost_ops->vhost_send_device_iotlb_msg(dev, &imsg);
+}
+
+int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
+                                          struct vhost_iotlb_msg *imsg)
+{
+    int ret = 0;
+
+    switch (imsg->type) {
+    case VHOST_IOTLB_MISS:
+        ret = vhost_device_iotlb_miss(dev, imsg->iova,
+                                      imsg->perm != VHOST_ACCESS_RO);
+        break;
+    case VHOST_IOTLB_ACCESS_FAIL:
+        /* FIXME: report device iotlb error */
+        error_report("Access failure IOTLB message type not supported");
+        ret = -ENOTSUP;
+        break;
+    case VHOST_IOTLB_UPDATE:
+    case VHOST_IOTLB_INVALIDATE:
+    default:
+        error_report("Unexpected IOTLB message type");
+        ret = -EINVAL;
+        break;
+    }
+
+    return ret;
+}
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 8fab12d..6eddb09 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -724,8 +724,8 @@ static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     struct vhost_dev *hdev = iommu->hdev;
     hwaddr iova = iotlb->iova + iommu->iommu_offset;
 
-    if (hdev->vhost_ops->vhost_invalidate_device_iotlb(hdev, iova,
-                                                       iotlb->addr_mask + 1)) {
+    if (vhost_backend_invalidate_device_iotlb(hdev, iova,
+                                              iotlb->addr_mask + 1)) {
         error_report("Fail to invalidate device iotlb");
     }
 }
@@ -993,8 +993,8 @@ int vhost_device_iotlb_miss(struct vhost_dev *dev, uint64_t iova, int write)
         len = MIN(iotlb.addr_mask + 1, len);
         iova = iova & ~iotlb.addr_mask;
 
-        ret = dev->vhost_ops->vhost_update_device_iotlb(dev, iova, uaddr,
-                                                      len, iotlb.perm);
+        ret = vhost_backend_update_device_iotlb(dev, iova, uaddr,
+                                                len, iotlb.perm);
         if (ret) {
             error_report("Fail to update device iotlb");
             goto out;
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index c3cf4a7..a7a5f22 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -27,6 +27,7 @@ struct vhost_vring_file;
 struct vhost_vring_state;
 struct vhost_vring_addr;
 struct vhost_scsi_target;
+struct vhost_iotlb_msg;
 
 typedef int (*vhost_backend_init)(struct vhost_dev *dev, void *opaque);
 typedef int (*vhost_backend_cleanup)(struct vhost_dev *dev);
@@ -81,12 +82,8 @@ typedef int (*vhost_vsock_set_guest_cid_op)(struct vhost_dev *dev,
 typedef int (*vhost_vsock_set_running_op)(struct vhost_dev *dev, int start);
 typedef void (*vhost_set_iotlb_callback_op)(struct vhost_dev *dev,
                                            int enabled);
-typedef int (*vhost_update_device_iotlb_op)(struct vhost_dev *dev,
-                                            uint64_t iova, uint64_t uaddr,
-                                            uint64_t len,
-                                            IOMMUAccessFlags perm);
-typedef int (*vhost_invalidate_device_iotlb_op)(struct vhost_dev *dev,
-                                                uint64_t iova, uint64_t len);
+typedef int (*vhost_send_device_iotlb_msg_op)(struct vhost_dev *dev,
+                                              struct vhost_iotlb_msg *imsg);
 
 typedef struct VhostOps {
     VhostBackendType backend_type;
@@ -120,8 +117,7 @@ typedef struct VhostOps {
     vhost_vsock_set_guest_cid_op vhost_vsock_set_guest_cid;
     vhost_vsock_set_running_op vhost_vsock_set_running;
     vhost_set_iotlb_callback_op vhost_set_iotlb_callback;
-    vhost_update_device_iotlb_op vhost_update_device_iotlb;
-    vhost_invalidate_device_iotlb_op vhost_invalidate_device_iotlb;
+    vhost_send_device_iotlb_msg_op vhost_send_device_iotlb_msg;
 } VhostOps;
 
 extern const VhostOps user_ops;
@@ -129,4 +125,15 @@ extern const VhostOps user_ops;
 int vhost_set_backend_type(struct vhost_dev *dev,
                            VhostBackendType backend_type);
 
+int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
+                                             uint64_t iova, uint64_t uaddr,
+                                             uint64_t len,
+                                             IOMMUAccessFlags perm);
+
+int vhost_backend_invalidate_device_iotlb(struct vhost_dev *dev,
+                                                 uint64_t iova, uint64_t len);
+
+int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
+                                          struct vhost_iotlb_msg *imsg);
+
 #endif /* VHOST_BACKEND_H */
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2 3/6] vhost: extend ring information update for IOTLB to all rings
  2017-05-26 14:28 [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support Maxime Coquelin
  2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 1/6] vhost: propagate errors in vhost_device_iotlb_miss() Maxime Coquelin
  2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 2/6] vhost: rework IOTLB messaging Maxime Coquelin
@ 2017-05-26 14:28 ` Maxime Coquelin
  2017-05-30 18:12   ` Michael S. Tsirkin
  2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 4/6] vhost-user: add vhost_user to hold the chr Maxime Coquelin
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Maxime Coquelin @ 2017-05-26 14:28 UTC (permalink / raw)
  To: mst, peterx, marcandre.lureau, vkaplans, jasowang, wexu,
	yuanhan.liu, qemu-devel, jfreiman
  Cc: Maxime Coquelin

Vhost-kernel backend need to receive IOTLB entry for used ring
information early, which is done by triggering a miss event on
its address.

This patch extends this behaviour to all rings information, to be
compatible with vhost-user backend design.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
v2:
 - Revert back to existing behaviour, i.e. only send IOTLB updates
at ring enablement time, not at ring address setting time (mst).
 - Extend IOTLB misses to all ring addresses, not only used ring.

 hw/virtio/vhost.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 6eddb09..7867034 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1552,11 +1552,15 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
     if (vhost_dev_has_iommu(hdev)) {
         hdev->vhost_ops->vhost_set_iotlb_callback(hdev, true);
 
-        /* Update used ring information for IOTLB to work correctly,
-         * vhost-kernel code requires for this.*/
+        /*
+         * Update rings information for IOTLB to work correctly,
+         * vhost-kernel and vhost-user codes require for this.
+         */
         for (i = 0; i < hdev->nvqs; ++i) {
             struct vhost_virtqueue *vq = hdev->vqs + i;
             vhost_device_iotlb_miss(hdev, vq->used_phys, true);
+            vhost_device_iotlb_miss(hdev, vq->desc_phys, true);
+            vhost_device_iotlb_miss(hdev, vq->avail_phys, true);
         }
     }
     return 0;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2 4/6] vhost-user: add vhost_user to hold the chr
  2017-05-26 14:28 [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support Maxime Coquelin
                   ` (2 preceding siblings ...)
  2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 3/6] vhost: extend ring information update for IOTLB to all rings Maxime Coquelin
@ 2017-05-26 14:28 ` Maxime Coquelin
  2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 5/6] vhost-user: add slave-req-fd support Maxime Coquelin
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 28+ messages in thread
From: Maxime Coquelin @ 2017-05-26 14:28 UTC (permalink / raw)
  To: mst, peterx, marcandre.lureau, vkaplans, jasowang, wexu,
	yuanhan.liu, qemu-devel, jfreiman
  Cc: Marc-André Lureau, Maxime Coquelin

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Next patches will add more fields to the structure

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 hw/virtio/vhost-user.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index b87a176..8a602e0 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -110,6 +110,10 @@ static VhostUserMsg m __attribute__ ((unused));
 /* The version of the protocol we support */
 #define VHOST_USER_VERSION    (0x1)
 
+struct vhost_user {
+    CharBackend *chr;
+};
+
 static bool ioeventfd_enabled(void)
 {
     return kvm_enabled() && kvm_eventfds_enabled();
@@ -117,7 +121,8 @@ static bool ioeventfd_enabled(void)
 
 static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
 {
-    CharBackend *chr = dev->opaque;
+    struct vhost_user *u = dev->opaque;
+    CharBackend *chr = u->chr;
     uint8_t *p = (uint8_t *) msg;
     int r, size = VHOST_USER_HDR_SIZE;
 
@@ -202,7 +207,8 @@ static bool vhost_user_one_time_request(VhostUserRequest request)
 static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
                             int *fds, int fd_num)
 {
-    CharBackend *chr = dev->opaque;
+    struct vhost_user *u = dev->opaque;
+    CharBackend *chr = u->chr;
     int ret, size = VHOST_USER_HDR_SIZE + msg->size;
 
     /*
@@ -575,11 +581,14 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
 static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 {
     uint64_t features;
+    struct vhost_user *u;
     int err;
 
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
-    dev->opaque = opaque;
+    u = g_new0(struct vhost_user, 1);
+    u->chr = opaque;
+    dev->opaque = u;
 
     err = vhost_user_get_features(dev, &features);
     if (err < 0) {
@@ -624,8 +633,12 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 
 static int vhost_user_cleanup(struct vhost_dev *dev)
 {
+    struct vhost_user *u;
+
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
+    u = dev->opaque;
+    g_free(u);
     dev->opaque = 0;
 
     return 0;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2 5/6] vhost-user: add slave-req-fd support
  2017-05-26 14:28 [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support Maxime Coquelin
                   ` (3 preceding siblings ...)
  2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 4/6] vhost-user: add vhost_user to hold the chr Maxime Coquelin
@ 2017-05-26 14:28 ` Maxime Coquelin
  2017-05-30 18:17   ` Michael S. Tsirkin
  2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 6/6] spec/vhost-user spec: Add IOMMU support Maxime Coquelin
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Maxime Coquelin @ 2017-05-26 14:28 UTC (permalink / raw)
  To: mst, peterx, marcandre.lureau, vkaplans, jasowang, wexu,
	yuanhan.liu, qemu-devel, jfreiman
  Cc: Marc-André Lureau, Maxime Coquelin

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Learn to give a socket to the slave to let him make requests to the
master.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 docs/specs/vhost-user.txt |  32 +++++++++++-
 hw/virtio/vhost-user.c    | 127 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 157 insertions(+), 2 deletions(-)

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 036890f..5fa7016 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -139,6 +139,7 @@ in the ancillary data:
  * VHOST_USER_SET_VRING_KICK
  * VHOST_USER_SET_VRING_CALL
  * VHOST_USER_SET_VRING_ERR
+ * VHOST_USER_SET_SLAVE_REQ_FD
 
 If Master is unable to send the full message or receives a wrong reply it will
 close the connection. An optional reconnection mechanism can be implemented.
@@ -252,6 +253,18 @@ Once the source has finished migration, rings will be stopped by
 the source. No further update must be done before rings are
 restarted.
 
+Slave communication
+-------------------
+
+An optional communication channel is provided if the slave declares
+VHOST_USER_PROTOCOL_F_SLAVE_REQ protocol feature, to allow the slave to make
+requests to the master.
+
+The fd is provided via VHOST_USER_SET_SLAVE_REQ_FD ancillary data.
+
+A slave may then send VHOST_USER_SLAVE_* messages to the master
+using this fd communication channel.
+
 Protocol features
 -----------------
 
@@ -260,9 +273,10 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_RARP           2
 #define VHOST_USER_PROTOCOL_F_REPLY_ACK      3
 #define VHOST_USER_PROTOCOL_F_MTU            4
+#define VHOST_USER_PROTOCOL_F_SLAVE_REQ      5
 
-Message types
--------------
+Master message types
+--------------------
 
  * VHOST_USER_GET_FEATURES
 
@@ -486,6 +500,20 @@ Message types
       If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond
       with zero in case the specified MTU is valid, or non-zero otherwise.
 
+ * VHOST_USER_SET_SLAVE_REQ_FD
+
+      Id: 21
+      Equivalent ioctl: N/A
+      Master payload: N/A
+
+      Set the socket file descriptor for slave initiated requests. It is passed
+      in the ancillary data.
+      This request should be sent only when VHOST_USER_F_PROTOCOL_FEATURES
+      has been negotiated, and protocol feature bit VHOST_USER_PROTOCOL_F_SLAVE_REQ
+      bit is present in VHOST_USER_GET_PROTOCOL_FEATURES.
+      If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond
+      with zero for success, non-zero otherwise.
+
 VHOST_USER_PROTOCOL_F_REPLY_ACK:
 -------------------------------
 The original vhost-user specification only demands replies for certain
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 8a602e0..ea988fe 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -32,6 +32,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_RARP = 2,
     VHOST_USER_PROTOCOL_F_REPLY_ACK = 3,
     VHOST_USER_PROTOCOL_F_NET_MTU = 4,
+    VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5,
 
     VHOST_USER_PROTOCOL_F_MAX
 };
@@ -60,9 +61,15 @@ typedef enum VhostUserRequest {
     VHOST_USER_SET_VRING_ENABLE = 18,
     VHOST_USER_SEND_RARP = 19,
     VHOST_USER_NET_SET_MTU = 20,
+    VHOST_USER_SET_SLAVE_REQ_FD = 21,
     VHOST_USER_MAX
 } VhostUserRequest;
 
+typedef enum VhostUserSlaveRequest {
+    VHOST_USER_SLAVE_NONE = 0,
+    VHOST_USER_SLAVE_MAX
+}  VhostUserSlaveRequest;
+
 typedef struct VhostUserMemoryRegion {
     uint64_t guest_phys_addr;
     uint64_t memory_size;
@@ -112,6 +119,7 @@ static VhostUserMsg m __attribute__ ((unused));
 
 struct vhost_user {
     CharBackend *chr;
+    int slave_fd;
 };
 
 static bool ioeventfd_enabled(void)
@@ -578,6 +586,115 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
     return 0;
 }
 
+static void slave_read(void *opaque)
+{
+    struct vhost_dev *dev = opaque;
+    struct vhost_user *u = dev->opaque;
+    VhostUserMsg msg = { 0, };
+    int size, ret = 0;
+
+    /* Read header */
+    size = read(u->slave_fd, &msg, VHOST_USER_HDR_SIZE);
+    if (size != VHOST_USER_HDR_SIZE) {
+        error_report("Failed to read from slave.");
+        goto err;
+    }
+
+    if (msg.size > VHOST_USER_PAYLOAD_SIZE) {
+        error_report("Failed to read msg header."
+                " Size %d exceeds the maximum %zu.", msg.size,
+                VHOST_USER_PAYLOAD_SIZE);
+        goto err;
+    }
+
+    /* Read payload */
+    size = read(u->slave_fd, &msg.payload, msg.size);
+    if (size != msg.size) {
+        error_report("Failed to read payload from slave.");
+        goto err;
+    }
+
+    switch (msg.request) {
+    default:
+        error_report("Received unexpected msg type.");
+        ret = -EINVAL;
+    }
+
+    /*
+     * REPLY_ACK feature handling. Other reply types has to be managed
+     * directly in their request handlers.
+     */
+    if (msg.flags & VHOST_USER_NEED_REPLY_MASK) {
+        msg.flags &= ~VHOST_USER_NEED_REPLY_MASK;
+        msg.flags |= VHOST_USER_REPLY_MASK;
+
+        msg.payload.u64 = !!ret;
+        msg.size = sizeof(msg.payload.u64);
+
+        size = write(u->slave_fd, &msg, VHOST_USER_HDR_SIZE + msg.size);
+        if (size != VHOST_USER_HDR_SIZE + msg.size) {
+            error_report("Failed to send msg reply to slave.");
+            goto err;
+        }
+    }
+
+    return;
+
+err:
+    qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
+    close(u->slave_fd);
+    u->slave_fd = -1;
+    return;
+}
+
+static int vhost_setup_slave_channel(struct vhost_dev *dev)
+{
+    VhostUserMsg msg = {
+        .request = VHOST_USER_SET_SLAVE_REQ_FD,
+        .flags = VHOST_USER_VERSION,
+    };
+    struct vhost_user *u = dev->opaque;
+    int sv[2], ret = 0;
+    bool reply_supported = virtio_has_feature(dev->protocol_features,
+                                              VHOST_USER_PROTOCOL_F_REPLY_ACK);
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_SLAVE_REQ)) {
+        return 0;
+    }
+
+    if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
+        error_report("socketpair() failed");
+        return -1;
+    }
+
+    u->slave_fd = sv[0];
+    qemu_set_fd_handler(u->slave_fd, slave_read, NULL, dev);
+
+    if (reply_supported) {
+        msg.flags |= VHOST_USER_NEED_REPLY_MASK;
+    }
+
+    ret = vhost_user_write(dev, &msg, &sv[1], 1);
+    if (ret) {
+        goto out;
+    }
+
+    if (reply_supported) {
+        ret = process_message_reply(dev, msg);
+    }
+
+out:
+    close(sv[1]);
+    if (ret) {
+        qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
+        close(u->slave_fd);
+        u->slave_fd = -1;
+    }
+
+    return ret;
+}
+
 static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 {
     uint64_t features;
@@ -588,6 +705,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 
     u = g_new0(struct vhost_user, 1);
     u->chr = opaque;
+    u->slave_fd = -1;
     dev->opaque = u;
 
     err = vhost_user_get_features(dev, &features);
@@ -628,6 +746,11 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
                    "VHOST_USER_PROTOCOL_F_LOG_SHMFD feature.");
     }
 
+    err = vhost_setup_slave_channel(dev);
+    if (err < 0) {
+        return err;
+    }
+
     return 0;
 }
 
@@ -638,6 +761,10 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
     u = dev->opaque;
+    if (u->slave_fd >= 0) {
+        close(u->slave_fd);
+        u->slave_fd = -1;
+    }
     g_free(u);
     dev->opaque = 0;
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2 6/6] spec/vhost-user spec: Add IOMMU support
  2017-05-26 14:28 [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support Maxime Coquelin
                   ` (4 preceding siblings ...)
  2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 5/6] vhost-user: add slave-req-fd support Maxime Coquelin
@ 2017-05-26 14:28 ` Maxime Coquelin
  2017-05-30 18:08   ` Michael S. Tsirkin
  2017-05-30 16:15 ` [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support Maxime Coquelin
  2017-05-30 18:20 ` Michael S. Tsirkin
  7 siblings, 1 reply; 28+ messages in thread
From: Maxime Coquelin @ 2017-05-26 14:28 UTC (permalink / raw)
  To: mst, peterx, marcandre.lureau, vkaplans, jasowang, wexu,
	yuanhan.liu, qemu-devel, jfreiman
  Cc: Maxime Coquelin

This patch specifies and implements the master/slave communication
to support device IOTLB in slave.

The vhost_iotlb_msg structure introduced for kernel backends is
re-used, making the design close between the two backends.

An exception is the use of the secondary channel to enable the
slave to send IOTLB miss requests to the master.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---

v2:
 - spec: fixed possible permission field values
 - spec: rewrote "IOMMU support" section
 docs/specs/vhost-user.txt | 86 +++++++++++++++++++++++++++++++++++++++++++++++
 hw/virtio/vhost-user.c    | 31 +++++++++++++++++
 2 files changed, 117 insertions(+)

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 5fa7016..0799ef1 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -97,6 +97,25 @@ Depending on the request type, payload can be:
    log offset: offset from start of supplied file descriptor
        where logging starts (i.e. where guest address 0 would be logged)
 
+ * An IOTLB message
+   ---------------------------------------------------------
+   | iova | size | user address | permissions flags | type |
+   ---------------------------------------------------------
+
+   IOVA: a 64-bit I/O virtual address programmed by the guest
+   Size: a 64-bit size
+   User address: a 64-bit user address
+   Permissions: a 8-bit value:
+    - 0: No access
+    - 1: Read access
+    - 2: Write access
+    - 3: Read/Write access
+   Type: a 8-bit IOTLB message type:
+    - 1: IOTLB miss
+    - 2: IOTLB update
+    - 3: IOTLB invalidate
+    - 4: IOTLB access fail
+
 In QEMU the vhost-user message is implemented with the following struct:
 
 typedef struct VhostUserMsg {
@@ -109,6 +128,7 @@ typedef struct VhostUserMsg {
         struct vhost_vring_addr addr;
         VhostUserMemory memory;
         VhostUserLog log;
+        struct vhost_iotlb_msg iotlb;
     };
 } QEMU_PACKED VhostUserMsg;
 
@@ -253,6 +273,40 @@ Once the source has finished migration, rings will be stopped by
 the source. No further update must be done before rings are
 restarted.
 
+IOMMU support
+-------------
+
+When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated, the master
+sends IOTLB entries update & invalidation by sending VHOST_USER_IOTLB_MSG
+requests to the slave with a struct vhost_iotlb_msg as payload. For update
+events, the iotlb payload has to be filled with the update message type (2),
+the I/O virtual address, the size, the user virtual address, and the
+permissions flags. Addresses and size must be within vhost memory regions set
+via the VHOST_USER_SET_MEM_TABLE request. For invalidation events, the iotlb
+payload has to be filled with the invalidation message type (3), the I/O virtual
+address and the size. On success, the slave is expected to reply with a zero
+payload, non-zero otherwise.
+
+The slave relies on the slave communcation channel (see "Slave communication"
+section below) to send IOTLB miss and access failure events, by sending
+VHOST_USER_SLAVE_IOTLB_MSG requests to the master with a struct vhost_iotlb_msg
+as payload. For miss events, the iotlb payload has to be filled with the miss
+message type (1), the I/O virtual address and the permissions flags. For access
+failure event, the iotlb payload has to be filled with the access failure
+message type (4), the I/O virtual address and the permissions flags.
+For synchronization purpose, the slave may rely on the reply-ack feature,
+so the master may send a reply when operation is completed if the reply-ack
+feature is negotiated and slaves requests a reply. For miss events, completed
+operation means either master sent an update message containing the IOTLB entry
+containing requested address and permission, or master sent nothing if the IOTLB
+miss message is invalid (invalid IOVA or permission).
+
+The master isn't generally expected to take the initiative to send IOTLB update
+messages, as the slave sends IOTLB miss messages for the guest virtual memory
+areas it needs to access. The exception is the rings information addresses
+shared with the VHOST_USER_SET_VRING_ADDR request, for which the master must
+send corresponding IOTLB updates before the rings are enabled.
+
 Slave communication
 -------------------
 
@@ -514,6 +568,38 @@ Master message types
       If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond
       with zero for success, non-zero otherwise.
 
+ * VHOST_USER_IOTLB_MSG
+
+      Id: 22
+      Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
+      Master payload: struct vhost_iotlb_msg
+      Slave payload: u64
+
+      Send IOTLB messages with struct vhost_iotlb_msg as payload.
+      Master sends such requests to update and invalidate entries in the device
+      IOTLB. The slave has to acknowledge the request with sending zero as u64
+      payload for success, non-zero otherwise.
+      This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
+      has been successfully negotiated.
+
+Slave message types
+-------------------
+
+ * VHOST_USER_SLAVE_IOTLB_MSG
+
+      Id: 1
+      Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
+      Slave payload: struct vhost_iotlb_msg
+      Master payload: N/A
+
+      Send IOTLB messages with struct vhost_iotlb_msg as payload.
+      Slave sends such requests to notify of an IOTLB miss, or an IOTLB
+      access failure. If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated,
+      and slave set the VHOST_USER_NEED_REPLY flag, master must respond with
+      zero when operation is successfully completed, or non-zero otherwise.
+      This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
+      has been successfully negotiated.
+
 VHOST_USER_PROTOCOL_F_REPLY_ACK:
 -------------------------------
 The original vhost-user specification only demands replies for certain
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index ea988fe..170fa68 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -62,11 +62,13 @@ typedef enum VhostUserRequest {
     VHOST_USER_SEND_RARP = 19,
     VHOST_USER_NET_SET_MTU = 20,
     VHOST_USER_SET_SLAVE_REQ_FD = 21,
+    VHOST_USER_IOTLB_MSG = 22,
     VHOST_USER_MAX
 } VhostUserRequest;
 
 typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_NONE = 0,
+    VHOST_USER_SLAVE_IOTLB_MSG = 1,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -104,6 +106,7 @@ typedef struct VhostUserMsg {
         struct vhost_vring_addr addr;
         VhostUserMemory memory;
         VhostUserLog log;
+        struct vhost_iotlb_msg iotlb;
     } payload;
 } QEMU_PACKED VhostUserMsg;
 
@@ -615,6 +618,9 @@ static void slave_read(void *opaque)
     }
 
     switch (msg.request) {
+    case VHOST_USER_SLAVE_IOTLB_MSG:
+        ret = vhost_backend_handle_iotlb_msg(dev, &msg.payload.iotlb);
+        break;
     default:
         error_report("Received unexpected msg type.");
         ret = -EINVAL;
@@ -862,6 +868,29 @@ static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu)
     return 0;
 }
 
+static int vhost_user_send_device_iotlb_msg(struct vhost_dev *dev,
+                                            struct vhost_iotlb_msg *imsg)
+{
+    VhostUserMsg msg = {
+        .request = VHOST_USER_IOTLB_MSG,
+        .size = sizeof(msg.payload.iotlb),
+        .flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY_MASK,
+        .payload.iotlb = *imsg,
+    };
+
+    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+        return -EFAULT;
+    }
+
+    return process_message_reply(dev, msg);
+}
+
+
+static void vhost_user_set_iotlb_callback(struct vhost_dev *dev, int enabled)
+{
+    /* No-op as the receive channel is not dedicated to IOTLB messages. */
+}
+
 const VhostOps user_ops = {
         .backend_type = VHOST_BACKEND_TYPE_USER,
         .vhost_backend_init = vhost_user_init,
@@ -886,4 +915,6 @@ const VhostOps user_ops = {
         .vhost_migration_done = vhost_user_migration_done,
         .vhost_backend_can_merge = vhost_user_can_merge,
         .vhost_net_set_mtu = vhost_user_net_set_mtu,
+        .vhost_set_iotlb_callback = vhost_user_set_iotlb_callback,
+        .vhost_send_device_iotlb_msg = vhost_user_send_device_iotlb_msg,
 };
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support
  2017-05-26 14:28 [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support Maxime Coquelin
                   ` (5 preceding siblings ...)
  2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 6/6] spec/vhost-user spec: Add IOMMU support Maxime Coquelin
@ 2017-05-30 16:15 ` Maxime Coquelin
  2017-05-30 18:20 ` Michael S. Tsirkin
  7 siblings, 0 replies; 28+ messages in thread
From: Maxime Coquelin @ 2017-05-30 16:15 UTC (permalink / raw)
  To: mst, peterx, marcandre.lureau, vkaplans, jasowang, wexu,
	yuanhan.liu, qemu-devel, jfreiman

Hi,

On 05/26/2017 04:28 PM, Maxime Coquelin wrote:
> This series aims at specifying ans implementing the protocol update
> required to support device IOTLB with user backends.
> 
> In this second non-RFC version, main changes are:
>   - spec fixes and clarification
>   - rings information update has been restored back to ring enablement time
>   - Work around GCC 4.4.7 limitation wrt assignment in unnamed union at
> declaration time.
> 
> The series can be tested with vhost_iotlb_proto_v2 branch on my gitlab
> account[0].
> 
> The slave requests channel part is re-used from Marc-André's series submitted
> last year[1], with main changes from original version being request/feature
> names renaming and addition of the REPLY_ACK feature support.
> 
> Regarding IOTLB protocol, one noticeable change is the IOTLB miss request
> reply made optionnal (i.e. only if slave requests it by setting the
> VHOST_USER_NEED_REPLY flag in the message header). This change provides
> more flexibility in the backend implementation of the feature.
> 
> The protocol is very close to kernel backends, except that a new
> communication channel is introduced to enable the slave to send
> requests to the master.
> 
> [0]: https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_iotlb_proto_v2
> [1]: https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg00095.html
> 
> Marc-André Lureau (2):
>    vhost-user: add vhost_user to hold the chr
>    vhost-user: add slave-req-fd support
> 
> Maxime Coquelin (4):
>    vhost: propagate errors in vhost_device_iotlb_miss()
>    vhost: rework IOTLB messaging
>    vhost: extend ring information update for IOTLB to all rings
>    spec/vhost-user spec: Add IOMMU support
> 
>   docs/specs/vhost-user.txt         | 118 ++++++++++++++++++++++++-
>   hw/virtio/vhost-backend.c         | 130 ++++++++++++++++------------
>   hw/virtio/vhost-user.c            | 177 +++++++++++++++++++++++++++++++++++++-
>   hw/virtio/vhost.c                 |  27 ++++--
>   include/hw/virtio/vhost-backend.h |  23 +++--
>   include/hw/virtio/vhost.h         |   2 +-
>   6 files changed, 397 insertions(+), 80 deletions(-)
> 

I just noticed I missed the below change in the series.
I'll wait v2 is reviewed before posting the v3, or I can post v3 now if
you prefer.

Regards,
Maxime

------------------------------------------------------------------

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 22874a9..e037db6 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -77,6 +77,7 @@ static const int user_feature_bits[] = {
      VIRTIO_NET_F_HOST_UFO,
      VIRTIO_NET_F_MRG_RXBUF,
      VIRTIO_NET_F_MTU,
+    VIRTIO_F_IOMMU_PLATFORM,

      /* This bit implies RARP isn't sent by QEMU out of band */
      VIRTIO_NET_F_GUEST_ANNOUNCE,

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 6/6] spec/vhost-user spec: Add IOMMU support
  2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 6/6] spec/vhost-user spec: Add IOMMU support Maxime Coquelin
@ 2017-05-30 18:08   ` Michael S. Tsirkin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2017-05-30 18:08 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: peterx, marcandre.lureau, vkaplans, jasowang, wexu, yuanhan.liu,
	qemu-devel, jfreiman

On Fri, May 26, 2017 at 04:28:58PM +0200, Maxime Coquelin wrote:
> This patch specifies and implements the master/slave communication
> to support device IOTLB in slave.
> 
> The vhost_iotlb_msg structure introduced for kernel backends is
> re-used, making the design close between the two backends.
> 
> An exception is the use of the secondary channel to enable the
> slave to send IOTLB miss requests to the master.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> 
> v2:
>  - spec: fixed possible permission field values
>  - spec: rewrote "IOMMU support" section
>  docs/specs/vhost-user.txt | 86 +++++++++++++++++++++++++++++++++++++++++++++++
>  hw/virtio/vhost-user.c    | 31 +++++++++++++++++
>  2 files changed, 117 insertions(+)
> 
> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> index 5fa7016..0799ef1 100644
> --- a/docs/specs/vhost-user.txt
> +++ b/docs/specs/vhost-user.txt
> @@ -97,6 +97,25 @@ Depending on the request type, payload can be:
>     log offset: offset from start of supplied file descriptor
>         where logging starts (i.e. where guest address 0 would be logged)
>  
> + * An IOTLB message
> +   ---------------------------------------------------------
> +   | iova | size | user address | permissions flags | type |
> +   ---------------------------------------------------------
> +
> +   IOVA: a 64-bit I/O virtual address programmed by the guest
> +   Size: a 64-bit size
> +   User address: a 64-bit user address
> +   Permissions: a 8-bit value:
> +    - 0: No access
> +    - 1: Read access
> +    - 2: Write access
> +    - 3: Read/Write access
> +   Type: a 8-bit IOTLB message type:
> +    - 1: IOTLB miss
> +    - 2: IOTLB update
> +    - 3: IOTLB invalidate
> +    - 4: IOTLB access fail
> +
>  In QEMU the vhost-user message is implemented with the following struct:
>  
>  typedef struct VhostUserMsg {
> @@ -109,6 +128,7 @@ typedef struct VhostUserMsg {
>          struct vhost_vring_addr addr;
>          VhostUserMemory memory;
>          VhostUserLog log;
> +        struct vhost_iotlb_msg iotlb;
>      };
>  } QEMU_PACKED VhostUserMsg;
>  
> @@ -253,6 +273,40 @@ Once the source has finished migration, rings will be stopped by
>  the source. No further update must be done before rings are
>  restarted.
>  
> +IOMMU support
> +-------------
> +
> +When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated, the master
> +sends IOTLB entries update & invalidation by sending VHOST_USER_IOTLB_MSG
> +requests to the slave with a struct vhost_iotlb_msg as payload. For update
> +events, the iotlb payload has to be filled with the update message type (2),
> +the I/O virtual address, the size, the user virtual address, and the
> +permissions flags. Addresses and size must be within vhost memory regions set
> +via the VHOST_USER_SET_MEM_TABLE request. For invalidation events, the iotlb
> +payload has to be filled with the invalidation message type (3), the I/O virtual
> +address and the size. On success, the slave is expected to reply with a zero
> +payload, non-zero otherwise.
> +
> +The slave relies on the slave communcation channel (see "Slave communication"
> +section below) to send IOTLB miss and access failure events, by sending
> +VHOST_USER_SLAVE_IOTLB_MSG requests to the master with a struct vhost_iotlb_msg
> +as payload. For miss events, the iotlb payload has to be filled with the miss
> +message type (1), the I/O virtual address and the permissions flags. For access
> +failure event, the iotlb payload has to be filled with the access failure
> +message type (4), the I/O virtual address and the permissions flags.
> +For synchronization purpose, the slave may rely on the reply-ack feature,
> +so the master may send a reply when operation is completed if the reply-ack
> +feature is negotiated and slaves requests a reply. For miss events, completed
> +operation means either master sent an update message containing the IOTLB entry
> +containing requested address and permission, or master sent nothing if the IOTLB
> +miss message is invalid (invalid IOVA or permission).
> +
> +The master isn't generally expected to take the initiative to send IOTLB update
> +messages, as the slave sends IOTLB miss messages for the guest virtual memory
> +areas it needs to access. The exception is the rings information addresses
> +shared with the VHOST_USER_SET_VRING_ADDR request, for which the master must
> +send corresponding IOTLB updates before the rings are enabled.
> +

Is the implication that invalidates do not affect ring translations
true? I find this problematic.

If not then can we replace above with "may send"?

>  Slave communication
>  -------------------
>  
> @@ -514,6 +568,38 @@ Master message types
>        If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond
>        with zero for success, non-zero otherwise.
>  
> + * VHOST_USER_IOTLB_MSG
> +
> +      Id: 22
> +      Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
> +      Master payload: struct vhost_iotlb_msg
> +      Slave payload: u64
> +
> +      Send IOTLB messages with struct vhost_iotlb_msg as payload.
> +      Master sends such requests to update and invalidate entries in the device
> +      IOTLB. The slave has to acknowledge the request with sending zero as u64
> +      payload for success, non-zero otherwise.
> +      This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
> +      has been successfully negotiated.
> +
> +Slave message types
> +-------------------
> +
> + * VHOST_USER_SLAVE_IOTLB_MSG
> +
> +      Id: 1
> +      Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
> +      Slave payload: struct vhost_iotlb_msg
> +      Master payload: N/A
> +
> +      Send IOTLB messages with struct vhost_iotlb_msg as payload.
> +      Slave sends such requests to notify of an IOTLB miss, or an IOTLB
> +      access failure. If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated,
> +      and slave set the VHOST_USER_NEED_REPLY flag, master must respond with
> +      zero when operation is successfully completed, or non-zero otherwise.
> +      This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
> +      has been successfully negotiated.
> +
>  VHOST_USER_PROTOCOL_F_REPLY_ACK:
>  -------------------------------
>  The original vhost-user specification only demands replies for certain
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index ea988fe..170fa68 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -62,11 +62,13 @@ typedef enum VhostUserRequest {
>      VHOST_USER_SEND_RARP = 19,
>      VHOST_USER_NET_SET_MTU = 20,
>      VHOST_USER_SET_SLAVE_REQ_FD = 21,
> +    VHOST_USER_IOTLB_MSG = 22,
>      VHOST_USER_MAX
>  } VhostUserRequest;
>  
>  typedef enum VhostUserSlaveRequest {
>      VHOST_USER_SLAVE_NONE = 0,
> +    VHOST_USER_SLAVE_IOTLB_MSG = 1,
>      VHOST_USER_SLAVE_MAX
>  }  VhostUserSlaveRequest;
>  
> @@ -104,6 +106,7 @@ typedef struct VhostUserMsg {
>          struct vhost_vring_addr addr;
>          VhostUserMemory memory;
>          VhostUserLog log;
> +        struct vhost_iotlb_msg iotlb;
>      } payload;
>  } QEMU_PACKED VhostUserMsg;
>  
> @@ -615,6 +618,9 @@ static void slave_read(void *opaque)
>      }
>  
>      switch (msg.request) {
> +    case VHOST_USER_SLAVE_IOTLB_MSG:
> +        ret = vhost_backend_handle_iotlb_msg(dev, &msg.payload.iotlb);
> +        break;
>      default:
>          error_report("Received unexpected msg type.");
>          ret = -EINVAL;
> @@ -862,6 +868,29 @@ static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu)
>      return 0;
>  }
>  
> +static int vhost_user_send_device_iotlb_msg(struct vhost_dev *dev,
> +                                            struct vhost_iotlb_msg *imsg)
> +{
> +    VhostUserMsg msg = {
> +        .request = VHOST_USER_IOTLB_MSG,
> +        .size = sizeof(msg.payload.iotlb),
> +        .flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY_MASK,
> +        .payload.iotlb = *imsg,
> +    };
> +
> +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> +        return -EFAULT;
> +    }
> +
> +    return process_message_reply(dev, msg);
> +}
> +
> +
> +static void vhost_user_set_iotlb_callback(struct vhost_dev *dev, int enabled)
> +{
> +    /* No-op as the receive channel is not dedicated to IOTLB messages. */
> +}
> +
>  const VhostOps user_ops = {
>          .backend_type = VHOST_BACKEND_TYPE_USER,
>          .vhost_backend_init = vhost_user_init,
> @@ -886,4 +915,6 @@ const VhostOps user_ops = {
>          .vhost_migration_done = vhost_user_migration_done,
>          .vhost_backend_can_merge = vhost_user_can_merge,
>          .vhost_net_set_mtu = vhost_user_net_set_mtu,
> +        .vhost_set_iotlb_callback = vhost_user_set_iotlb_callback,
> +        .vhost_send_device_iotlb_msg = vhost_user_send_device_iotlb_msg,
>  };
> -- 
> 2.9.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/6] vhost: extend ring information update for IOTLB to all rings
  2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 3/6] vhost: extend ring information update for IOTLB to all rings Maxime Coquelin
@ 2017-05-30 18:12   ` Michael S. Tsirkin
  2017-05-30 21:06     ` Maxime Coquelin
  2017-05-31  8:48     ` Jason Wang
  0 siblings, 2 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2017-05-30 18:12 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: peterx, marcandre.lureau, vkaplans, jasowang, wexu, yuanhan.liu,
	qemu-devel, jfreiman

On Fri, May 26, 2017 at 04:28:55PM +0200, Maxime Coquelin wrote:
> Vhost-kernel backend need

needs

> to receive IOTLB entry for used ring
> information early, which is done by triggering a miss event on
> its address.
> 
> This patch extends this behaviour to all rings information, to be
> compatible with vhost-user backend design.

Why does vhost-user need it though?

> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> v2:
>  - Revert back to existing behaviour, i.e. only send IOTLB updates
> at ring enablement time, not at ring address setting time (mst).
>  - Extend IOTLB misses to all ring addresses, not only used ring.
> 
>  hw/virtio/vhost.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 6eddb09..7867034 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1552,11 +1552,15 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
>      if (vhost_dev_has_iommu(hdev)) {
>          hdev->vhost_ops->vhost_set_iotlb_callback(hdev, true);
>  
> -        /* Update used ring information for IOTLB to work correctly,
> -         * vhost-kernel code requires for this.*/
> +        /*
> +         * Update rings information for IOTLB to work correctly,
> +         * vhost-kernel and vhost-user codes require for this.

Better just say "Update ring info for vhost iotlb."

The rest isn't really informative.



> +         */
>          for (i = 0; i < hdev->nvqs; ++i) {
>              struct vhost_virtqueue *vq = hdev->vqs + i;
>              vhost_device_iotlb_miss(hdev, vq->used_phys, true);
> +            vhost_device_iotlb_miss(hdev, vq->desc_phys, true);
> +            vhost_device_iotlb_miss(hdev, vq->avail_phys, true);

So I don't remember why does vhost in kernel want miss on used
at start time.

Jason, could you comment on this please?



>          }
>      }
>      return 0;
> -- 
> 2.9.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/6] vhost-user: add slave-req-fd support
  2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 5/6] vhost-user: add slave-req-fd support Maxime Coquelin
@ 2017-05-30 18:17   ` Michael S. Tsirkin
  2017-05-30 21:26     ` Maxime Coquelin
  0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2017-05-30 18:17 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: peterx, marcandre.lureau, vkaplans, jasowang, wexu, yuanhan.liu,
	qemu-devel, jfreiman, Marc-André Lureau

On Fri, May 26, 2017 at 04:28:57PM +0200, Maxime Coquelin wrote:
> From: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> Learn to give a socket to the slave to let him make requests to the
> master.
> 
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  docs/specs/vhost-user.txt |  32 +++++++++++-
>  hw/virtio/vhost-user.c    | 127 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 157 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> index 036890f..5fa7016 100644
> --- a/docs/specs/vhost-user.txt
> +++ b/docs/specs/vhost-user.txt
> @@ -139,6 +139,7 @@ in the ancillary data:
>   * VHOST_USER_SET_VRING_KICK
>   * VHOST_USER_SET_VRING_CALL
>   * VHOST_USER_SET_VRING_ERR
> + * VHOST_USER_SET_SLAVE_REQ_FD
>  
>  If Master is unable to send the full message or receives a wrong reply it will
>  close the connection. An optional reconnection mechanism can be implemented.
> @@ -252,6 +253,18 @@ Once the source has finished migration, rings will be stopped by
>  the source. No further update must be done before rings are
>  restarted.
>  
> +Slave communication
> +-------------------
> +
> +An optional communication channel is provided if the slave declares
> +VHOST_USER_PROTOCOL_F_SLAVE_REQ protocol feature, to allow the slave to make
> +requests to the master.
> +
> +The fd is provided via VHOST_USER_SET_SLAVE_REQ_FD ancillary data.
> +
> +A slave may then send VHOST_USER_SLAVE_* messages to the master
> +using this fd communication channel.
> +
>  Protocol features
>  -----------------
>  
> @@ -260,9 +273,10 @@ Protocol features
>  #define VHOST_USER_PROTOCOL_F_RARP           2
>  #define VHOST_USER_PROTOCOL_F_REPLY_ACK      3
>  #define VHOST_USER_PROTOCOL_F_MTU            4
> +#define VHOST_USER_PROTOCOL_F_SLAVE_REQ      5
>  
> -Message types
> --------------
> +Master message types
> +--------------------
>  
>   * VHOST_USER_GET_FEATURES
>  


So down the road, we should make sure a device without
VHOST_USER_PROTOCOL_F_SLAVE_REQ does not advertise IOMMU
since you don't handle these messages otherwise.


> @@ -486,6 +500,20 @@ Message types
>        If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond
>        with zero in case the specified MTU is valid, or non-zero otherwise.
>  
> + * VHOST_USER_SET_SLAVE_REQ_FD
> +
> +      Id: 21
> +      Equivalent ioctl: N/A
> +      Master payload: N/A
> +
> +      Set the socket file descriptor for slave initiated requests. It is passed
> +      in the ancillary data.
> +      This request should be sent only when VHOST_USER_F_PROTOCOL_FEATURES
> +      has been negotiated, and protocol feature bit VHOST_USER_PROTOCOL_F_SLAVE_REQ
> +      bit is present in VHOST_USER_GET_PROTOCOL_FEATURES.
> +      If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond
> +      with zero for success, non-zero otherwise.
> +
>  VHOST_USER_PROTOCOL_F_REPLY_ACK:
>  -------------------------------
>  The original vhost-user specification only demands replies for certain
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 8a602e0..ea988fe 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -32,6 +32,7 @@ enum VhostUserProtocolFeature {
>      VHOST_USER_PROTOCOL_F_RARP = 2,
>      VHOST_USER_PROTOCOL_F_REPLY_ACK = 3,
>      VHOST_USER_PROTOCOL_F_NET_MTU = 4,
> +    VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5,
>  
>      VHOST_USER_PROTOCOL_F_MAX
>  };
> @@ -60,9 +61,15 @@ typedef enum VhostUserRequest {
>      VHOST_USER_SET_VRING_ENABLE = 18,
>      VHOST_USER_SEND_RARP = 19,
>      VHOST_USER_NET_SET_MTU = 20,
> +    VHOST_USER_SET_SLAVE_REQ_FD = 21,
>      VHOST_USER_MAX
>  } VhostUserRequest;
>  
> +typedef enum VhostUserSlaveRequest {
> +    VHOST_USER_SLAVE_NONE = 0,
> +    VHOST_USER_SLAVE_MAX
> +}  VhostUserSlaveRequest;
> +
>  typedef struct VhostUserMemoryRegion {
>      uint64_t guest_phys_addr;
>      uint64_t memory_size;
> @@ -112,6 +119,7 @@ static VhostUserMsg m __attribute__ ((unused));
>  
>  struct vhost_user {
>      CharBackend *chr;
> +    int slave_fd;
>  };
>  
>  static bool ioeventfd_enabled(void)
> @@ -578,6 +586,115 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
>      return 0;
>  }
>  
> +static void slave_read(void *opaque)
> +{
> +    struct vhost_dev *dev = opaque;
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserMsg msg = { 0, };
> +    int size, ret = 0;
> +
> +    /* Read header */
> +    size = read(u->slave_fd, &msg, VHOST_USER_HDR_SIZE);
> +    if (size != VHOST_USER_HDR_SIZE) {
> +        error_report("Failed to read from slave.");
> +        goto err;
> +    }
> +
> +    if (msg.size > VHOST_USER_PAYLOAD_SIZE) {
> +        error_report("Failed to read msg header."
> +                " Size %d exceeds the maximum %zu.", msg.size,
> +                VHOST_USER_PAYLOAD_SIZE);
> +        goto err;
> +    }
> +
> +    /* Read payload */
> +    size = read(u->slave_fd, &msg.payload, msg.size);
> +    if (size != msg.size) {
> +        error_report("Failed to read payload from slave.");
> +        goto err;
> +    }
> +
> +    switch (msg.request) {
> +    default:
> +        error_report("Received unexpected msg type.");
> +        ret = -EINVAL;
> +    }
> +
> +    /*
> +     * REPLY_ACK feature handling. Other reply types has to be managed
> +     * directly in their request handlers.
> +     */
> +    if (msg.flags & VHOST_USER_NEED_REPLY_MASK) {
> +        msg.flags &= ~VHOST_USER_NEED_REPLY_MASK;
> +        msg.flags |= VHOST_USER_REPLY_MASK;
> +
> +        msg.payload.u64 = !!ret;
> +        msg.size = sizeof(msg.payload.u64);
> +
> +        size = write(u->slave_fd, &msg, VHOST_USER_HDR_SIZE + msg.size);
> +        if (size != VHOST_USER_HDR_SIZE + msg.size) {
> +            error_report("Failed to send msg reply to slave.");
> +            goto err;
> +        }
> +    }
> +
> +    return;
> +
> +err:
> +    qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
> +    close(u->slave_fd);
> +    u->slave_fd = -1;
> +    return;
> +}
> +
> +static int vhost_setup_slave_channel(struct vhost_dev *dev)
> +{
> +    VhostUserMsg msg = {
> +        .request = VHOST_USER_SET_SLAVE_REQ_FD,
> +        .flags = VHOST_USER_VERSION,
> +    };
> +    struct vhost_user *u = dev->opaque;
> +    int sv[2], ret = 0;
> +    bool reply_supported = virtio_has_feature(dev->protocol_features,
> +                                              VHOST_USER_PROTOCOL_F_REPLY_ACK);
> +
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_SLAVE_REQ)) {
> +        return 0;
> +    }
> +
> +    if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
> +        error_report("socketpair() failed");
> +        return -1;
> +    }
> +
> +    u->slave_fd = sv[0];
> +    qemu_set_fd_handler(u->slave_fd, slave_read, NULL, dev);
> +
> +    if (reply_supported) {
> +        msg.flags |= VHOST_USER_NEED_REPLY_MASK;
> +    }
> +
> +    ret = vhost_user_write(dev, &msg, &sv[1], 1);
> +    if (ret) {
> +        goto out;
> +    }
> +
> +    if (reply_supported) {
> +        ret = process_message_reply(dev, msg);
> +    }
> +
> +out:
> +    close(sv[1]);
> +    if (ret) {
> +        qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
> +        close(u->slave_fd);
> +        u->slave_fd = -1;
> +    }
> +
> +    return ret;
> +}
> +
>  static int vhost_user_init(struct vhost_dev *dev, void *opaque)
>  {
>      uint64_t features;
> @@ -588,6 +705,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
>  
>      u = g_new0(struct vhost_user, 1);
>      u->chr = opaque;
> +    u->slave_fd = -1;
>      dev->opaque = u;
>  
>      err = vhost_user_get_features(dev, &features);
> @@ -628,6 +746,11 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
>                     "VHOST_USER_PROTOCOL_F_LOG_SHMFD feature.");
>      }
>  
> +    err = vhost_setup_slave_channel(dev);
> +    if (err < 0) {
> +        return err;
> +    }
> +
>      return 0;
>  }
>  
> @@ -638,6 +761,10 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
>      assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
>  
>      u = dev->opaque;
> +    if (u->slave_fd >= 0) {
> +        close(u->slave_fd);
> +        u->slave_fd = -1;
> +    }
>      g_free(u);
>      dev->opaque = 0;
>  
> -- 
> 2.9.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support
  2017-05-26 14:28 [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support Maxime Coquelin
                   ` (6 preceding siblings ...)
  2017-05-30 16:15 ` [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support Maxime Coquelin
@ 2017-05-30 18:20 ` Michael S. Tsirkin
  2017-05-31  8:33   ` Jason Wang
  7 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2017-05-30 18:20 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: peterx, marcandre.lureau, vkaplans, jasowang, wexu, yuanhan.liu,
	qemu-devel, jfreiman

On Fri, May 26, 2017 at 04:28:52PM +0200, Maxime Coquelin wrote:
> This series aims at specifying ans implementing the protocol update
> required to support device IOTLB with user backends.
> 
> In this second non-RFC version, main changes are:
>  - spec fixes and clarification
>  - rings information update has been restored back to ring enablement time
>  - Work around GCC 4.4.7 limitation wrt assignment in unnamed union at
> declaration time.
> 
> The series can be tested with vhost_iotlb_proto_v2 branch on my gitlab
> account[0].
> 
> The slave requests channel part is re-used from Marc-André's series submitted
> last year[1], with main changes from original version being request/feature
> names renaming and addition of the REPLY_ACK feature support.
> 
> Regarding IOTLB protocol, one noticeable change is the IOTLB miss request
> reply made optionnal (i.e. only if slave requests it by setting the
> VHOST_USER_NEED_REPLY flag in the message header). This change provides
> more flexibility in the backend implementation of the feature.
> 
> The protocol is very close to kernel backends, except that a new
> communication channel is introduced to enable the slave to send
> requests to the master.
> 
> [0]: https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_iotlb_proto_v2
> [1]: https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg00095.html

Overall, this looks good to me. I do think patch 3 isn't a good idea
though, if slave wants something let it request it.

Need to find out why does vhost in kernel want the used ring iotlb at
start time - especially considering we aren't even guaranteed one entry
covers the whole ring, and invalidates should affect all addresses at
least in theory.


> Marc-André Lureau (2):
>   vhost-user: add vhost_user to hold the chr
>   vhost-user: add slave-req-fd support
> 
> Maxime Coquelin (4):
>   vhost: propagate errors in vhost_device_iotlb_miss()
>   vhost: rework IOTLB messaging
>   vhost: extend ring information update for IOTLB to all rings
>   spec/vhost-user spec: Add IOMMU support
> 
>  docs/specs/vhost-user.txt         | 118 ++++++++++++++++++++++++-
>  hw/virtio/vhost-backend.c         | 130 ++++++++++++++++------------
>  hw/virtio/vhost-user.c            | 177 +++++++++++++++++++++++++++++++++++++-
>  hw/virtio/vhost.c                 |  27 ++++--
>  include/hw/virtio/vhost-backend.h |  23 +++--
>  include/hw/virtio/vhost.h         |   2 +-
>  6 files changed, 397 insertions(+), 80 deletions(-)
> 
> -- 
> 2.9.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/6] vhost: extend ring information update for IOTLB to all rings
  2017-05-30 18:12   ` Michael S. Tsirkin
@ 2017-05-30 21:06     ` Maxime Coquelin
  2017-05-30 21:11       ` Maxime Coquelin
  2017-06-01 13:54       ` Michael S. Tsirkin
  2017-05-31  8:48     ` Jason Wang
  1 sibling, 2 replies; 28+ messages in thread
From: Maxime Coquelin @ 2017-05-30 21:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: peterx, marcandre.lureau, vkaplans, jasowang, wexu, yuanhan.liu,
	qemu-devel, jfreiman

Hi Michael,

On 05/30/2017 08:12 PM, Michael S. Tsirkin wrote:
> On Fri, May 26, 2017 at 04:28:55PM +0200, Maxime Coquelin wrote:
>> Vhost-kernel backend need
> 
> needs
> 
>> to receive IOTLB entry for used ring
>> information early, which is done by triggering a miss event on
>> its address.
>>
>> This patch extends this behaviour to all rings information, to be
>> compatible with vhost-user backend design.
> 
> Why does vhost-user need it though?

For vhost-user, this simplifies the backend design because generally,
the backend sends IOTLB miss requests from processing threads through
the slave channel, and receives resulting IOTLB updates in vhost-user
protocol thread.

The only exception is for these rings info, where IOTLB miss requests
are sent from vhost-user protocol thread (in the SET_VRING_ENABLE
request handler), so the resulting IOTLB update is only handled by
the backend when the rings are enabled, which is too late.

It could be possible to overcome this issue, but I think it would
make the design more complex or less efficient. I see several options:

1. Change the IOTLB miss request so that the master sends the IOTLB
update as reply, instead of the reply-ack. It would mean that IOTLB
updates/invalidations would be sent either via the master channel or
the slave channel. On QEMU side, it means diverging from kernel backend
implementation. On backend side, it means having possibly multiple
threads writing to the IOTLB cache.

2. In vhost-user protocol thread, when handling SET_VRING_ENABLE, send
IOTLB miss request without setting the reply-ack flag, and poll the
vhost-user socket to read the resulting IOTLB update. The problem is
that other requests could be pending in the socket's buffer, and so it
would end-up nesting multiple requests handling.

3. Don't interpret rings info in the vhost-user protocol thread, but
only in the processing threads. The advantage is that it would address
the remark you made on patch 6 that invalidates are not affecting ring
info. The downside being the overhead induced by checking whether the
ring info are valid every time it is being used. I haven't prototyped
this solution, but I expected the performance regression to be a
blocker.

4. In SET_VRING_ENABLE, don't enable the ring if needed entries are not 
in IOTLB cache. Just send the IOTLB misses without reply-ack flag and 
postpone enable when handling IOTLB updates. It will be a little more 
complex solution than current one, but it may be the less impacting 
compared to the other 3 options.

Thinking again, maybe trying solution would be worth the effort, and 
could be extended also to disable the rings when receiving invalidates
that affect rings info.

What do you think?

> 
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>> v2:
>>   - Revert back to existing behaviour, i.e. only send IOTLB updates
>> at ring enablement time, not at ring address setting time (mst).
>>   - Extend IOTLB misses to all ring addresses, not only used ring.
>>
>>   hw/virtio/vhost.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index 6eddb09..7867034 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -1552,11 +1552,15 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
>>       if (vhost_dev_has_iommu(hdev)) {
>>           hdev->vhost_ops->vhost_set_iotlb_callback(hdev, true);
>>   
>> -        /* Update used ring information for IOTLB to work correctly,
>> -         * vhost-kernel code requires for this.*/
>> +        /*
>> +         * Update rings information for IOTLB to work correctly,
>> +         * vhost-kernel and vhost-user codes require for this.
> 
> Better just say "Update ring info for vhost iotlb."
> 
> The rest isn't really informative.

Ok.

> 
> 
> 
>> +         */
>>           for (i = 0; i < hdev->nvqs; ++i) {
>>               struct vhost_virtqueue *vq = hdev->vqs + i;
>>               vhost_device_iotlb_miss(hdev, vq->used_phys, true);
>> +            vhost_device_iotlb_miss(hdev, vq->desc_phys, true);
>> +            vhost_device_iotlb_miss(hdev, vq->avail_phys, true);
> 
> So I don't remember why does vhost in kernel want miss on used
> at start time.
> 
> Jason, could you comment on this please?
> 
> 
> 
>>           }
>>       }
>>       return 0;
>> -- 
>> 2.9.4

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/6] vhost: extend ring information update for IOTLB to all rings
  2017-05-30 21:06     ` Maxime Coquelin
@ 2017-05-30 21:11       ` Maxime Coquelin
  2017-05-31 15:20         ` Maxime Coquelin
  2017-06-01 13:54       ` Michael S. Tsirkin
  1 sibling, 1 reply; 28+ messages in thread
From: Maxime Coquelin @ 2017-05-30 21:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: peterx, marcandre.lureau, vkaplans, jasowang, wexu, yuanhan.liu,
	qemu-devel, jfreiman



On 05/30/2017 11:06 PM, Maxime Coquelin wrote:
> Hi Michael,
> 
> On 05/30/2017 08:12 PM, Michael S. Tsirkin wrote:
>> On Fri, May 26, 2017 at 04:28:55PM +0200, Maxime Coquelin wrote:
>>> Vhost-kernel backend need
>>
>> needs
>>
>>> to receive IOTLB entry for used ring
>>> information early, which is done by triggering a miss event on
>>> its address.
>>>
>>> This patch extends this behaviour to all rings information, to be
>>> compatible with vhost-user backend design.
>>
>> Why does vhost-user need it though?
> 
> For vhost-user, this simplifies the backend design because generally,
> the backend sends IOTLB miss requests from processing threads through
> the slave channel, and receives resulting IOTLB updates in vhost-user
> protocol thread.
> 
> The only exception is for these rings info, where IOTLB miss requests
> are sent from vhost-user protocol thread (in the SET_VRING_ENABLE
> request handler), so the resulting IOTLB update is only handled by
> the backend when the rings are enabled, which is too late.
> 
> It could be possible to overcome this issue, but I think it would
> make the design more complex or less efficient. I see several options:
> 
> 1. Change the IOTLB miss request so that the master sends the IOTLB
> update as reply, instead of the reply-ack. It would mean that IOTLB
> updates/invalidations would be sent either via the master channel or
> the slave channel. On QEMU side, it means diverging from kernel backend
> implementation. On backend side, it means having possibly multiple
> threads writing to the IOTLB cache.
> 
> 2. In vhost-user protocol thread, when handling SET_VRING_ENABLE, send
> IOTLB miss request without setting the reply-ack flag, and poll the
> vhost-user socket to read the resulting IOTLB update. The problem is
> that other requests could be pending in the socket's buffer, and so it
> would end-up nesting multiple requests handling.
> 
> 3. Don't interpret rings info in the vhost-user protocol thread, but
> only in the processing threads. The advantage is that it would address
> the remark you made on patch 6 that invalidates are not affecting ring
> info. The downside being the overhead induced by checking whether the
> ring info are valid every time it is being used. I haven't prototyped
> this solution, but I expected the performance regression to be a
> blocker.
> 
> 4. In SET_VRING_ENABLE, don't enable the ring if needed entries are not 
> in IOTLB cache. Just send the IOTLB misses without reply-ack flag and 
> postpone enable when handling IOTLB updates. It will be a little more 
> complex solution than current one, but it may be the less impacting 
> compared to the other 3 options.
> 
> 
> Thinking again, maybe trying solution would be worth the effort, and 

s/solution/solution 4/

> could be extended also to disable the rings when receiving invalidates
> that affect rings info.
> 
> What do you think?
> 
>>
>>>
>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>> ---
>>> v2:
>>>   - Revert back to existing behaviour, i.e. only send IOTLB updates
>>> at ring enablement time, not at ring address setting time (mst).
>>>   - Extend IOTLB misses to all ring addresses, not only used ring.
>>>
>>>   hw/virtio/vhost.c | 8 ++++++--
>>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>> index 6eddb09..7867034 100644
>>> --- a/hw/virtio/vhost.c
>>> +++ b/hw/virtio/vhost.c
>>> @@ -1552,11 +1552,15 @@ int vhost_dev_start(struct vhost_dev *hdev, 
>>> VirtIODevice *vdev)
>>>       if (vhost_dev_has_iommu(hdev)) {
>>>           hdev->vhost_ops->vhost_set_iotlb_callback(hdev, true);
>>> -        /* Update used ring information for IOTLB to work correctly,
>>> -         * vhost-kernel code requires for this.*/
>>> +        /*
>>> +         * Update rings information for IOTLB to work correctly,
>>> +         * vhost-kernel and vhost-user codes require for this.
>>
>> Better just say "Update ring info for vhost iotlb."
>>
>> The rest isn't really informative.
> 
> Ok.
> 
>>
>>
>>
>>> +         */
>>>           for (i = 0; i < hdev->nvqs; ++i) {
>>>               struct vhost_virtqueue *vq = hdev->vqs + i;
>>>               vhost_device_iotlb_miss(hdev, vq->used_phys, true);
>>> +            vhost_device_iotlb_miss(hdev, vq->desc_phys, true);
>>> +            vhost_device_iotlb_miss(hdev, vq->avail_phys, true);
>>
>> So I don't remember why does vhost in kernel want miss on used
>> at start time.
>>
>> Jason, could you comment on this please?
>>
>>
>>
>>>           }
>>>       }
>>>       return 0;
>>> -- 
>>> 2.9.4
> 
> Thanks,
> Maxime

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/6] vhost-user: add slave-req-fd support
  2017-05-30 18:17   ` Michael S. Tsirkin
@ 2017-05-30 21:26     ` Maxime Coquelin
  0 siblings, 0 replies; 28+ messages in thread
From: Maxime Coquelin @ 2017-05-30 21:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: peterx, marcandre.lureau, vkaplans, jasowang, wexu, yuanhan.liu,
	qemu-devel, jfreiman, Marc-André Lureau



On 05/30/2017 08:17 PM, Michael S. Tsirkin wrote:
> On Fri, May 26, 2017 at 04:28:57PM +0200, Maxime Coquelin wrote:
>> From: Marc-André Lureau<marcandre.lureau@redhat.com>
>>
>> Learn to give a socket to the slave to let him make requests to the
>> master.
>>
>> Signed-off-by: Marc-André Lureau<marcandre.lureau@redhat.com>
>> Signed-off-by: Maxime Coquelin<maxime.coquelin@redhat.com>
>> ---
>>   docs/specs/vhost-user.txt |  32 +++++++++++-
>>   hw/virtio/vhost-user.c    | 127 ++++++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 157 insertions(+), 2 deletions(-)
>>
>> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
>> index 036890f..5fa7016 100644
>> --- a/docs/specs/vhost-user.txt
>> +++ b/docs/specs/vhost-user.txt
>> @@ -139,6 +139,7 @@ in the ancillary data:
>>    * VHOST_USER_SET_VRING_KICK
>>    * VHOST_USER_SET_VRING_CALL
>>    * VHOST_USER_SET_VRING_ERR
>> + * VHOST_USER_SET_SLAVE_REQ_FD
>>   
>>   If Master is unable to send the full message or receives a wrong reply it will
>>   close the connection. An optional reconnection mechanism can be implemented.
>> @@ -252,6 +253,18 @@ Once the source has finished migration, rings will be stopped by
>>   the source. No further update must be done before rings are
>>   restarted.
>>   
>> +Slave communication
>> +-------------------
>> +
>> +An optional communication channel is provided if the slave declares
>> +VHOST_USER_PROTOCOL_F_SLAVE_REQ protocol feature, to allow the slave to make
>> +requests to the master.
>> +
>> +The fd is provided via VHOST_USER_SET_SLAVE_REQ_FD ancillary data.
>> +
>> +A slave may then send VHOST_USER_SLAVE_* messages to the master
>> +using this fd communication channel.
>> +
>>   Protocol features
>>   -----------------
>>   
>> @@ -260,9 +273,10 @@ Protocol features
>>   #define VHOST_USER_PROTOCOL_F_RARP           2
>>   #define VHOST_USER_PROTOCOL_F_REPLY_ACK      3
>>   #define VHOST_USER_PROTOCOL_F_MTU            4
>> +#define VHOST_USER_PROTOCOL_F_SLAVE_REQ      5
>>   
>> -Message types
>> --------------
>> +Master message types
>> +--------------------
>>   
>>    * VHOST_USER_GET_FEATURES
>>   
> 
> So down the road, we should make sure a device without
> VHOST_USER_PROTOCOL_F_SLAVE_REQ does not advertise IOMMU
> since you don't handle these messages otherwise.

Right, I will add a check to ensure this, and make this clearer
in the IOMMU spec part.

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support
  2017-05-30 18:20 ` Michael S. Tsirkin
@ 2017-05-31  8:33   ` Jason Wang
  2017-05-31 15:32     ` Maxime Coquelin
  2017-06-01 13:59     ` Michael S. Tsirkin
  0 siblings, 2 replies; 28+ messages in thread
From: Jason Wang @ 2017-05-31  8:33 UTC (permalink / raw)
  To: Michael S. Tsirkin, Maxime Coquelin
  Cc: peterx, marcandre.lureau, vkaplans, wexu, yuanhan.liu,
	qemu-devel, jfreiman



On 2017年05月31日 02:20, Michael S. Tsirkin wrote:
> On Fri, May 26, 2017 at 04:28:52PM +0200, Maxime Coquelin wrote:
>> This series aims at specifying ans implementing the protocol update
>> required to support device IOTLB with user backends.
>>
>> In this second non-RFC version, main changes are:
>>   - spec fixes and clarification
>>   - rings information update has been restored back to ring enablement time
>>   - Work around GCC 4.4.7 limitation wrt assignment in unnamed union at
>> declaration time.
>>
>> The series can be tested with vhost_iotlb_proto_v2 branch on my gitlab
>> account[0].
>>
>> The slave requests channel part is re-used from Marc-André's series submitted
>> last year[1], with main changes from original version being request/feature
>> names renaming and addition of the REPLY_ACK feature support.
>>
>> Regarding IOTLB protocol, one noticeable change is the IOTLB miss request
>> reply made optionnal (i.e. only if slave requests it by setting the
>> VHOST_USER_NEED_REPLY flag in the message header). This change provides
>> more flexibility in the backend implementation of the feature.
>>
>> The protocol is very close to kernel backends, except that a new
>> communication channel is introduced to enable the slave to send
>> requests to the master.
>>
>> [0]:https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_iotlb_proto_v2
>> [1]:https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg00095.html
> Overall, this looks good to me. I do think patch 3 isn't a good idea
> though, if slave wants something let it request it.
>
> Need to find out why does vhost in kernel want the used ring iotlb at
> start time - especially considering we aren't even guaranteed one entry
> covers the whole ring, and invalidates should affect all addresses at
> least in theory.
>
>

The reason is probably we want to verify whether or not we could 
correctly access used ring in vhost_vq_init_access(). It was there since 
vhost_net is introduced. We can think to remove this limitation maybe.

Thanks

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/6] vhost: extend ring information update for IOTLB to all rings
  2017-05-30 18:12   ` Michael S. Tsirkin
  2017-05-30 21:06     ` Maxime Coquelin
@ 2017-05-31  8:48     ` Jason Wang
  1 sibling, 0 replies; 28+ messages in thread
From: Jason Wang @ 2017-05-31  8:48 UTC (permalink / raw)
  To: Michael S. Tsirkin, Maxime Coquelin
  Cc: yuanhan.liu, qemu-devel, peterx, marcandre.lureau, wexu,
	vkaplans, jfreiman



On 2017年05月31日 02:12, Michael S. Tsirkin wrote:
> On Fri, May 26, 2017 at 04:28:55PM +0200, Maxime Coquelin wrote:
>> Vhost-kernel backend need
> needs
>
>> to receive IOTLB entry for used ring
>> information early, which is done by triggering a miss event on
>> its address.
>>
>> This patch extends this behaviour to all rings information, to be
>> compatible with vhost-user backend design.
> Why does vhost-user need it though?
>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>> v2:
>>   - Revert back to existing behaviour, i.e. only send IOTLB updates
>> at ring enablement time, not at ring address setting time (mst).
>>   - Extend IOTLB misses to all ring addresses, not only used ring.
>>
>>   hw/virtio/vhost.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index 6eddb09..7867034 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -1552,11 +1552,15 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
>>       if (vhost_dev_has_iommu(hdev)) {
>>           hdev->vhost_ops->vhost_set_iotlb_callback(hdev, true);
>>   
>> -        /* Update used ring information for IOTLB to work correctly,
>> -         * vhost-kernel code requires for this.*/
>> +        /*
>> +         * Update rings information for IOTLB to work correctly,
>> +         * vhost-kernel and vhost-user codes require for this.
> Better just say "Update ring info for vhost iotlb."
>
> The rest isn't really informative.
>
>
>
>> +         */
>>           for (i = 0; i < hdev->nvqs; ++i) {
>>               struct vhost_virtqueue *vq = hdev->vqs + i;
>>               vhost_device_iotlb_miss(hdev, vq->used_phys, true);
>> +            vhost_device_iotlb_miss(hdev, vq->desc_phys, true);
>> +            vhost_device_iotlb_miss(hdev, vq->avail_phys, true);
> So I don't remember why does vhost in kernel want miss on used
> at start time.
>
> Jason, could you comment on this please?

In vhost_vq_init_access() we try to update used flags and fetch 
last_used_idx, this requires we cache its translation in advance.

We don't support IOTLB miss on control path (ioctl) now, so I choose to 
update IOTLB.

Thanks

>
>
>
>>           }
>>       }
>>       return 0;
>> -- 
>> 2.9.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/6] vhost: extend ring information update for IOTLB to all rings
  2017-05-30 21:11       ` Maxime Coquelin
@ 2017-05-31 15:20         ` Maxime Coquelin
  2017-06-01 13:55           ` Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: Maxime Coquelin @ 2017-05-31 15:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: peterx, marcandre.lureau, vkaplans, jasowang, wexu, yuanhan.liu,
	qemu-devel, jfreiman

Hi Michael,

On 05/30/2017 11:11 PM, Maxime Coquelin wrote:
> 
> 
> On 05/30/2017 11:06 PM, Maxime Coquelin wrote:
>> Hi Michael,
>>
>> On 05/30/2017 08:12 PM, Michael S. Tsirkin wrote:
>>> On Fri, May 26, 2017 at 04:28:55PM +0200, Maxime Coquelin wrote:
>>>> Vhost-kernel backend need
>>>
>>> needs
>>>
>>>> to receive IOTLB entry for used ring
>>>> information early, which is done by triggering a miss event on
>>>> its address.
>>>>
>>>> This patch extends this behaviour to all rings information, to be
>>>> compatible with vhost-user backend design.
>>>
>>> Why does vhost-user need it though?
>>
>> For vhost-user, this simplifies the backend design because generally,
>> the backend sends IOTLB miss requests from processing threads through
>> the slave channel, and receives resulting IOTLB updates in vhost-user
>> protocol thread.
>>
>> The only exception is for these rings info, where IOTLB miss requests
>> are sent from vhost-user protocol thread (in the SET_VRING_ENABLE
>> request handler), so the resulting IOTLB update is only handled by
>> the backend when the rings are enabled, which is too late.
>>
>> It could be possible to overcome this issue, but I think it would
>> make the design more complex or less efficient. I see several options:
>>
>> 1. Change the IOTLB miss request so that the master sends the IOTLB
>> update as reply, instead of the reply-ack. It would mean that IOTLB
>> updates/invalidations would be sent either via the master channel or
>> the slave channel. On QEMU side, it means diverging from kernel backend
>> implementation. On backend side, it means having possibly multiple
>> threads writing to the IOTLB cache.
>>
>> 2. In vhost-user protocol thread, when handling SET_VRING_ENABLE, send
>> IOTLB miss request without setting the reply-ack flag, and poll the
>> vhost-user socket to read the resulting IOTLB update. The problem is
>> that other requests could be pending in the socket's buffer, and so it
>> would end-up nesting multiple requests handling.
>>
>> 3. Don't interpret rings info in the vhost-user protocol thread, but
>> only in the processing threads. The advantage is that it would address
>> the remark you made on patch 6 that invalidates are not affecting ring
>> info. The downside being the overhead induced by checking whether the
>> ring info are valid every time it is being used. I haven't prototyped
>> this solution, but I expected the performance regression to be a
>> blocker.
>>
>> 4. In SET_VRING_ENABLE, don't enable the ring if needed entries are 
>> not in IOTLB cache. Just send the IOTLB misses without reply-ack flag 
>> and postpone enable when handling IOTLB updates. It will be a little 
>> more complex solution than current one, but it may be the less 
>> impacting compared to the other 3 options.
>>
>>
>> Thinking again, maybe trying solution would be worth the effort, and 
> 
> s/solution/solution 4/
> 
>> could be extended also to disable the rings when receiving invalidates
>> that affect rings info.
>>
>> What do you think?

I have made some tests to see if solution 4 would work, and while it
could work most of the times, it is really fragile as deadlock would
happen if either slave or master sockets buffers are full. This is issue
also impact solution 1 above.

Regards,
Maxime

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support
  2017-05-31  8:33   ` Jason Wang
@ 2017-05-31 15:32     ` Maxime Coquelin
  2017-06-01  7:04       ` Jason Wang
  2017-06-01 13:59     ` Michael S. Tsirkin
  1 sibling, 1 reply; 28+ messages in thread
From: Maxime Coquelin @ 2017-05-31 15:32 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: peterx, marcandre.lureau, vkaplans, wexu, yuanhan.liu,
	qemu-devel, jfreiman

Hi,

On 05/31/2017 10:33 AM, Jason Wang wrote:
> 
> 
> On 2017年05月31日 02:20, Michael S. Tsirkin wrote:
>> On Fri, May 26, 2017 at 04:28:52PM +0200, Maxime Coquelin wrote:
>>> This series aims at specifying ans implementing the protocol update
>>> required to support device IOTLB with user backends.
>>>
>>> In this second non-RFC version, main changes are:
>>>   - spec fixes and clarification
>>>   - rings information update has been restored back to ring 
>>> enablement time
>>>   - Work around GCC 4.4.7 limitation wrt assignment in unnamed union at
>>> declaration time.
>>>
>>> The series can be tested with vhost_iotlb_proto_v2 branch on my gitlab
>>> account[0].
>>>
>>> The slave requests channel part is re-used from Marc-André's series 
>>> submitted
>>> last year[1], with main changes from original version being 
>>> request/feature
>>> names renaming and addition of the REPLY_ACK feature support.
>>>
>>> Regarding IOTLB protocol, one noticeable change is the IOTLB miss 
>>> request
>>> reply made optionnal (i.e. only if slave requests it by setting the
>>> VHOST_USER_NEED_REPLY flag in the message header). This change provides
>>> more flexibility in the backend implementation of the feature.
>>>
>>> The protocol is very close to kernel backends, except that a new
>>> communication channel is introduced to enable the slave to send
>>> requests to the master.
>>>
>>> [0]:https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_iotlb_proto_v2 
>>>
>>> [1]:https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg00095.html
>> Overall, this looks good to me. I do think patch 3 isn't a good idea
>> though, if slave wants something let it request it.
>>
>> Need to find out why does vhost in kernel want the used ring iotlb at
>> start time - especially considering we aren't even guaranteed one entry
>> covers the whole ring, and invalidates should affect all addresses at
>> least in theory.
>>
>>
> 
> The reason is probably we want to verify whether or not we could 
> correctly access used ring in vhost_vq_init_access(). It was there since 
> vhost_net is introduced. We can think to remove this limitation maybe.

Even if we remove the limitation on Kernel side, we will still have to
keep this workaround for compatibility with older kernels. Having done 
the test, I can confirm it is currently necessary.

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support
  2017-05-31 15:32     ` Maxime Coquelin
@ 2017-06-01  7:04       ` Jason Wang
  2017-06-01  8:39         ` Maxime Coquelin
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Wang @ 2017-06-01  7:04 UTC (permalink / raw)
  To: Maxime Coquelin, Michael S. Tsirkin
  Cc: yuanhan.liu, qemu-devel, peterx, marcandre.lureau, wexu,
	vkaplans, jfreiman



On 2017年05月31日 23:32, Maxime Coquelin wrote:
>>>> [0]:https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_iotlb_proto_v2 
>>>>
>>>> [1]:https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg00095.html 
>>>>
>>> Overall, this looks good to me. I do think patch 3 isn't a good idea
>>> though, if slave wants something let it request it.
>>>
>>> Need to find out why does vhost in kernel want the used ring iotlb at
>>> start time - especially considering we aren't even guaranteed one entry
>>> covers the whole ring, and invalidates should affect all addresses at
>>> least in theory.
>>>
>>>
>>
>> The reason is probably we want to verify whether or not we could 
>> correctly access used ring in vhost_vq_init_access(). It was there 
>> since vhost_net is introduced. We can think to remove this limitation 
>> maybe.
>
> Even if we remove the limitation on Kernel side, we will still have to
> keep this workaround for compatibility with older kernels. Having done 
> the test, I can confirm it is currently necessary.
>
> Thanks,
> Maxime 

Right, it was probably too late for the change.

Thanks

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support
  2017-06-01  7:04       ` Jason Wang
@ 2017-06-01  8:39         ` Maxime Coquelin
  0 siblings, 0 replies; 28+ messages in thread
From: Maxime Coquelin @ 2017-06-01  8:39 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: yuanhan.liu, qemu-devel, peterx, marcandre.lureau, wexu,
	vkaplans, jfreiman



On 06/01/2017 09:04 AM, Jason Wang wrote:
> 
> 
> On 2017年05月31日 23:32, Maxime Coquelin wrote:
>>>>> [0]:https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_iotlb_proto_v2 
>>>>>
>>>>> [1]:https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg00095.html 
>>>>>
>>>> Overall, this looks good to me. I do think patch 3 isn't a good idea
>>>> though, if slave wants something let it request it.
>>>>
>>>> Need to find out why does vhost in kernel want the used ring iotlb at
>>>> start time - especially considering we aren't even guaranteed one entry
>>>> covers the whole ring, and invalidates should affect all addresses at
>>>> least in theory.
>>>>
>>>>
>>>
>>> The reason is probably we want to verify whether or not we could 
>>> correctly access used ring in vhost_vq_init_access(). It was there 
>>> since vhost_net is introduced. We can think to remove this limitation 
>>> maybe.
>>
>> Even if we remove the limitation on Kernel side, we will still have to
>> keep this workaround for compatibility with older kernels. Having done 
>> the test, I can confirm it is currently necessary.
>>
>> Thanks,
>> Maxime 
> 
> Right, it was probably too late for the change.

So, I wonder if we could keep patch 3, that just extends this
workaround for vhost-user. Else, we would need to do the ring-related 
translations in the data path, which will be costly.

Michael, what do you think?

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/6] vhost: extend ring information update for IOTLB to all rings
  2017-05-30 21:06     ` Maxime Coquelin
  2017-05-30 21:11       ` Maxime Coquelin
@ 2017-06-01 13:54       ` Michael S. Tsirkin
  1 sibling, 0 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2017-06-01 13:54 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: peterx, marcandre.lureau, vkaplans, jasowang, wexu, yuanhan.liu,
	qemu-devel, jfreiman

On Tue, May 30, 2017 at 11:06:54PM +0200, Maxime Coquelin wrote:
> Hi Michael,
> 
> On 05/30/2017 08:12 PM, Michael S. Tsirkin wrote:
> > On Fri, May 26, 2017 at 04:28:55PM +0200, Maxime Coquelin wrote:
> > > Vhost-kernel backend need
> > 
> > needs
> > 
> > > to receive IOTLB entry for used ring
> > > information early, which is done by triggering a miss event on
> > > its address.
> > > 
> > > This patch extends this behaviour to all rings information, to be
> > > compatible with vhost-user backend design.
> > 
> > Why does vhost-user need it though?
> 
> For vhost-user, this simplifies the backend design because generally,
> the backend sends IOTLB miss requests from processing threads through
> the slave channel, and receives resulting IOTLB updates in vhost-user
> protocol thread.
> 
> The only exception is for these rings info, where IOTLB miss requests
> are sent from vhost-user protocol thread (in the SET_VRING_ENABLE
> request handler), so the resulting IOTLB update is only handled by
> the backend when the rings are enabled, which is too late.
> 
> It could be possible to overcome this issue, but I think it would
> make the design more complex or less efficient. I see several options:
> 
> 1. Change the IOTLB miss request so that the master sends the IOTLB
> update as reply, instead of the reply-ack. It would mean that IOTLB
> updates/invalidations would be sent either via the master channel or
> the slave channel. On QEMU side, it means diverging from kernel backend
> implementation. On backend side, it means having possibly multiple
> threads writing to the IOTLB cache.
> 
> 2. In vhost-user protocol thread, when handling SET_VRING_ENABLE, send
> IOTLB miss request without setting the reply-ack flag, and poll the
> vhost-user socket to read the resulting IOTLB update. The problem is
> that other requests could be pending in the socket's buffer, and so it
> would end-up nesting multiple requests handling.
> 
> 3. Don't interpret rings info in the vhost-user protocol thread, but
> only in the processing threads. The advantage is that it would address
> the remark you made on patch 6 that invalidates are not affecting ring
> info. The downside being the overhead induced by checking whether the
> ring info are valid every time it is being used. I haven't prototyped
> this solution, but I expected the performance regression to be a
> blocker.
> 
> 4. In SET_VRING_ENABLE, don't enable the ring if needed entries are not in
> IOTLB cache. Just send the IOTLB misses without reply-ack flag and postpone
> enable when handling IOTLB updates. It will be a little more complex
> solution than current one, but it may be the less impacting compared to the
> other 3 options.
> 
> 
> Thinking again, maybe trying solution would be worth the effort, and could
> be extended also to disable the rings when receiving invalidates
> that affect rings info.
> 
> What do you think?

I'm fine with 3 or 4 generally. But pls note that if the ring crosses a
page boundary (e.g. ring size > page size) and when not using
hugetlbfs, there is no guarantee a single DMA address covers the
whole ring.

> > 
> > > 
> > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> > > ---
> > > v2:
> > >   - Revert back to existing behaviour, i.e. only send IOTLB updates
> > > at ring enablement time, not at ring address setting time (mst).
> > >   - Extend IOTLB misses to all ring addresses, not only used ring.
> > > 
> > >   hw/virtio/vhost.c | 8 ++++++--
> > >   1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > index 6eddb09..7867034 100644
> > > --- a/hw/virtio/vhost.c
> > > +++ b/hw/virtio/vhost.c
> > > @@ -1552,11 +1552,15 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
> > >       if (vhost_dev_has_iommu(hdev)) {
> > >           hdev->vhost_ops->vhost_set_iotlb_callback(hdev, true);
> > > -        /* Update used ring information for IOTLB to work correctly,
> > > -         * vhost-kernel code requires for this.*/
> > > +        /*
> > > +         * Update rings information for IOTLB to work correctly,
> > > +         * vhost-kernel and vhost-user codes require for this.
> > 
> > Better just say "Update ring info for vhost iotlb."
> > 
> > The rest isn't really informative.
> 
> Ok.
> 
> > 
> > 
> > 
> > > +         */
> > >           for (i = 0; i < hdev->nvqs; ++i) {
> > >               struct vhost_virtqueue *vq = hdev->vqs + i;
> > >               vhost_device_iotlb_miss(hdev, vq->used_phys, true);
> > > +            vhost_device_iotlb_miss(hdev, vq->desc_phys, true);
> > > +            vhost_device_iotlb_miss(hdev, vq->avail_phys, true);
> > 
> > So I don't remember why does vhost in kernel want miss on used
> > at start time.
> > 
> > Jason, could you comment on this please?
> > 
> > 
> > 
> > >           }
> > >       }
> > >       return 0;
> > > -- 
> > > 2.9.4
> 
> Thanks,
> Maxime

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/6] vhost: extend ring information update for IOTLB to all rings
  2017-05-31 15:20         ` Maxime Coquelin
@ 2017-06-01 13:55           ` Michael S. Tsirkin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2017-06-01 13:55 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: peterx, marcandre.lureau, vkaplans, jasowang, wexu, yuanhan.liu,
	qemu-devel, jfreiman

On Wed, May 31, 2017 at 05:20:21PM +0200, Maxime Coquelin wrote:
> Hi Michael,
> 
> On 05/30/2017 11:11 PM, Maxime Coquelin wrote:
> > 
> > 
> > On 05/30/2017 11:06 PM, Maxime Coquelin wrote:
> > > Hi Michael,
> > > 
> > > On 05/30/2017 08:12 PM, Michael S. Tsirkin wrote:
> > > > On Fri, May 26, 2017 at 04:28:55PM +0200, Maxime Coquelin wrote:
> > > > > Vhost-kernel backend need
> > > > 
> > > > needs
> > > > 
> > > > > to receive IOTLB entry for used ring
> > > > > information early, which is done by triggering a miss event on
> > > > > its address.
> > > > > 
> > > > > This patch extends this behaviour to all rings information, to be
> > > > > compatible with vhost-user backend design.
> > > > 
> > > > Why does vhost-user need it though?
> > > 
> > > For vhost-user, this simplifies the backend design because generally,
> > > the backend sends IOTLB miss requests from processing threads through
> > > the slave channel, and receives resulting IOTLB updates in vhost-user
> > > protocol thread.
> > > 
> > > The only exception is for these rings info, where IOTLB miss requests
> > > are sent from vhost-user protocol thread (in the SET_VRING_ENABLE
> > > request handler), so the resulting IOTLB update is only handled by
> > > the backend when the rings are enabled, which is too late.
> > > 
> > > It could be possible to overcome this issue, but I think it would
> > > make the design more complex or less efficient. I see several options:
> > > 
> > > 1. Change the IOTLB miss request so that the master sends the IOTLB
> > > update as reply, instead of the reply-ack. It would mean that IOTLB
> > > updates/invalidations would be sent either via the master channel or
> > > the slave channel. On QEMU side, it means diverging from kernel backend
> > > implementation. On backend side, it means having possibly multiple
> > > threads writing to the IOTLB cache.
> > > 
> > > 2. In vhost-user protocol thread, when handling SET_VRING_ENABLE, send
> > > IOTLB miss request without setting the reply-ack flag, and poll the
> > > vhost-user socket to read the resulting IOTLB update. The problem is
> > > that other requests could be pending in the socket's buffer, and so it
> > > would end-up nesting multiple requests handling.
> > > 
> > > 3. Don't interpret rings info in the vhost-user protocol thread, but
> > > only in the processing threads. The advantage is that it would address
> > > the remark you made on patch 6 that invalidates are not affecting ring
> > > info. The downside being the overhead induced by checking whether the
> > > ring info are valid every time it is being used. I haven't prototyped
> > > this solution, but I expected the performance regression to be a
> > > blocker.
> > > 
> > > 4. In SET_VRING_ENABLE, don't enable the ring if needed entries are
> > > not in IOTLB cache. Just send the IOTLB misses without reply-ack
> > > flag and postpone enable when handling IOTLB updates. It will be a
> > > little more complex solution than current one, but it may be the
> > > less impacting compared to the other 3 options.
> > > 
> > > 
> > > Thinking again, maybe trying solution would be worth the effort, and
> > 
> > s/solution/solution 4/
> > 
> > > could be extended also to disable the rings when receiving invalidates
> > > that affect rings info.
> > > 
> > > What do you think?
> 
> I have made some tests to see if solution 4 would work, and while it
> could work most of the times, it is really fragile as deadlock would
> happen if either slave or master sockets buffers are full. This is issue
> also impact solution 1 above.
> 
> Regards,
> Maxime

Pls try 3 above. I don't see why would a single conditional
branch be so expensive.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support
  2017-05-31  8:33   ` Jason Wang
  2017-05-31 15:32     ` Maxime Coquelin
@ 2017-06-01 13:59     ` Michael S. Tsirkin
  2017-06-02  5:53       ` Jason Wang
  1 sibling, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2017-06-01 13:59 UTC (permalink / raw)
  To: Jason Wang
  Cc: Maxime Coquelin, peterx, marcandre.lureau, vkaplans, wexu,
	yuanhan.liu, qemu-devel, jfreiman

On Wed, May 31, 2017 at 04:33:33PM +0800, Jason Wang wrote:
> 
> 
> On 2017年05月31日 02:20, Michael S. Tsirkin wrote:
> > On Fri, May 26, 2017 at 04:28:52PM +0200, Maxime Coquelin wrote:
> > > This series aims at specifying ans implementing the protocol update
> > > required to support device IOTLB with user backends.
> > > 
> > > In this second non-RFC version, main changes are:
> > >   - spec fixes and clarification
> > >   - rings information update has been restored back to ring enablement time
> > >   - Work around GCC 4.4.7 limitation wrt assignment in unnamed union at
> > > declaration time.
> > > 
> > > The series can be tested with vhost_iotlb_proto_v2 branch on my gitlab
> > > account[0].
> > > 
> > > The slave requests channel part is re-used from Marc-André's series submitted
> > > last year[1], with main changes from original version being request/feature
> > > names renaming and addition of the REPLY_ACK feature support.
> > > 
> > > Regarding IOTLB protocol, one noticeable change is the IOTLB miss request
> > > reply made optionnal (i.e. only if slave requests it by setting the
> > > VHOST_USER_NEED_REPLY flag in the message header). This change provides
> > > more flexibility in the backend implementation of the feature.
> > > 
> > > The protocol is very close to kernel backends, except that a new
> > > communication channel is introduced to enable the slave to send
> > > requests to the master.
> > > 
> > > [0]:https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_iotlb_proto_v2
> > > [1]:https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg00095.html
> > Overall, this looks good to me. I do think patch 3 isn't a good idea
> > though, if slave wants something let it request it.
> > 
> > Need to find out why does vhost in kernel want the used ring iotlb at
> > start time - especially considering we aren't even guaranteed one entry
> > covers the whole ring, and invalidates should affect all addresses at
> > least in theory.
> > 
> > 
> 
> The reason is probably we want to verify whether or not we could correctly
> access used ring in vhost_vq_init_access(). It was there since vhost_net is
> introduced. We can think to remove this limitation maybe.
> 
> Thanks


Well that's only called if iotlb is disabled:

        if (!vq->iotlb &&
            !access_ok(VERIFY_READ, &vq->used->idx, sizeof vq->used->idx)) {
                r = -EFAULT;
                goto err;
        }

Could you try removing that and see what breaks?

-- 
MST

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support
  2017-06-01 13:59     ` Michael S. Tsirkin
@ 2017-06-02  5:53       ` Jason Wang
  2017-06-02 15:24         ` Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Wang @ 2017-06-02  5:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: yuanhan.liu, qemu-devel, peterx, Maxime Coquelin,
	marcandre.lureau, wexu, vkaplans, jfreiman



On 2017年06月01日 21:59, Michael S. Tsirkin wrote:
> On Wed, May 31, 2017 at 04:33:33PM +0800, Jason Wang wrote:
>>
>> On 2017年05月31日 02:20, Michael S. Tsirkin wrote:
>>> On Fri, May 26, 2017 at 04:28:52PM +0200, Maxime Coquelin wrote:
>>>> This series aims at specifying ans implementing the protocol update
>>>> required to support device IOTLB with user backends.
>>>>
>>>> In this second non-RFC version, main changes are:
>>>>    - spec fixes and clarification
>>>>    - rings information update has been restored back to ring enablement time
>>>>    - Work around GCC 4.4.7 limitation wrt assignment in unnamed union at
>>>> declaration time.
>>>>
>>>> The series can be tested with vhost_iotlb_proto_v2 branch on my gitlab
>>>> account[0].
>>>>
>>>> The slave requests channel part is re-used from Marc-André's series submitted
>>>> last year[1], with main changes from original version being request/feature
>>>> names renaming and addition of the REPLY_ACK feature support.
>>>>
>>>> Regarding IOTLB protocol, one noticeable change is the IOTLB miss request
>>>> reply made optionnal (i.e. only if slave requests it by setting the
>>>> VHOST_USER_NEED_REPLY flag in the message header). This change provides
>>>> more flexibility in the backend implementation of the feature.
>>>>
>>>> The protocol is very close to kernel backends, except that a new
>>>> communication channel is introduced to enable the slave to send
>>>> requests to the master.
>>>>
>>>> [0]:https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_iotlb_proto_v2
>>>> [1]:https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg00095.html
>>> Overall, this looks good to me. I do think patch 3 isn't a good idea
>>> though, if slave wants something let it request it.
>>>
>>> Need to find out why does vhost in kernel want the used ring iotlb at
>>> start time - especially considering we aren't even guaranteed one entry
>>> covers the whole ring, and invalidates should affect all addresses at
>>> least in theory.
>>>
>>>
>> The reason is probably we want to verify whether or not we could correctly
>> access used ring in vhost_vq_init_access(). It was there since vhost_net is
>> introduced. We can think to remove this limitation maybe.
>>
>> Thanks
>
> Well that's only called if iotlb is disabled:
>
>          if (!vq->iotlb &&
>              !access_ok(VERIFY_READ, &vq->used->idx, sizeof vq->used->idx)) {
>                  r = -EFAULT;
>                  goto err;
>          }
>
> Could you try removing that and see what breaks?
>

Looks not, the issue is vhost_update_used_flags() which needs device 
IOTLB translation. If we don't fill IOTLB in advance, it will return 
-EFAULT.

For simplicity, I don't implement control path device IOTLB miss. If you 
care about the incomplete length, we can refine vhost_iotlb_miss() to 
make sure it covers all size.

Thanks

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support
  2017-06-02  5:53       ` Jason Wang
@ 2017-06-02 15:24         ` Michael S. Tsirkin
  2017-06-05  8:51           ` Jason Wang
  0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2017-06-02 15:24 UTC (permalink / raw)
  To: Jason Wang
  Cc: yuanhan.liu, qemu-devel, peterx, Maxime Coquelin,
	marcandre.lureau, wexu, vkaplans, jfreiman

On Fri, Jun 02, 2017 at 01:53:11PM +0800, Jason Wang wrote:
> 
> 
> On 2017年06月01日 21:59, Michael S. Tsirkin wrote:
> > On Wed, May 31, 2017 at 04:33:33PM +0800, Jason Wang wrote:
> > > 
> > > On 2017年05月31日 02:20, Michael S. Tsirkin wrote:
> > > > On Fri, May 26, 2017 at 04:28:52PM +0200, Maxime Coquelin wrote:
> > > > > This series aims at specifying ans implementing the protocol update
> > > > > required to support device IOTLB with user backends.
> > > > > 
> > > > > In this second non-RFC version, main changes are:
> > > > >    - spec fixes and clarification
> > > > >    - rings information update has been restored back to ring enablement time
> > > > >    - Work around GCC 4.4.7 limitation wrt assignment in unnamed union at
> > > > > declaration time.
> > > > > 
> > > > > The series can be tested with vhost_iotlb_proto_v2 branch on my gitlab
> > > > > account[0].
> > > > > 
> > > > > The slave requests channel part is re-used from Marc-André's series submitted
> > > > > last year[1], with main changes from original version being request/feature
> > > > > names renaming and addition of the REPLY_ACK feature support.
> > > > > 
> > > > > Regarding IOTLB protocol, one noticeable change is the IOTLB miss request
> > > > > reply made optionnal (i.e. only if slave requests it by setting the
> > > > > VHOST_USER_NEED_REPLY flag in the message header). This change provides
> > > > > more flexibility in the backend implementation of the feature.
> > > > > 
> > > > > The protocol is very close to kernel backends, except that a new
> > > > > communication channel is introduced to enable the slave to send
> > > > > requests to the master.
> > > > > 
> > > > > [0]:https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_iotlb_proto_v2
> > > > > [1]:https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg00095.html
> > > > Overall, this looks good to me. I do think patch 3 isn't a good idea
> > > > though, if slave wants something let it request it.
> > > > 
> > > > Need to find out why does vhost in kernel want the used ring iotlb at
> > > > start time - especially considering we aren't even guaranteed one entry
> > > > covers the whole ring, and invalidates should affect all addresses at
> > > > least in theory.
> > > > 
> > > > 
> > > The reason is probably we want to verify whether or not we could correctly
> > > access used ring in vhost_vq_init_access(). It was there since vhost_net is
> > > introduced. We can think to remove this limitation maybe.
> > > 
> > > Thanks
> > 
> > Well that's only called if iotlb is disabled:
> > 
> >          if (!vq->iotlb &&
> >              !access_ok(VERIFY_READ, &vq->used->idx, sizeof vq->used->idx)) {
> >                  r = -EFAULT;
> >                  goto err;
> >          }
> > 
> > Could you try removing that and see what breaks?
> > 
> 
> Looks not, the issue is vhost_update_used_flags() which needs device IOTLB
> translation. If we don't fill IOTLB in advance, it will return -EFAULT.

Same for vhost_get_used, right?

> 
> For simplicity, I don't implement control path device IOTLB miss.


OK so this should be documented in vhost.h.  SET_BACKEND immediately
writes and reads used ring. User must know this and pre-fault used flags
and index before setting backend.

> If you
> care about the incomplete length, we can refine vhost_iotlb_miss() to make
> sure it covers all size.
> 
> Thanks

No need imho, it's only the used flags field that's written, and the
index that's read right?  BTW I don't really know why do we do the write
when event index is setup.  We probably shouldn't, should we?

It's worth considering whether we want this write into used ring at all.
I put it there originally to help make sure we don't miss the first exit, but
event index seems to get by fine without this. So why does non event
index code want it?

I think that if we could get rid of both accesses, it would be
nice. Would need a feature bit naturally and we'd need to
support old kernels but at least it will be contained and
well documented.

-- 
MST

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support
  2017-06-02 15:24         ` Michael S. Tsirkin
@ 2017-06-05  8:51           ` Jason Wang
  2017-06-05 15:08             ` Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Wang @ 2017-06-05  8:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: yuanhan.liu, qemu-devel, peterx, Maxime Coquelin,
	marcandre.lureau, wexu, vkaplans, jfreiman



On 2017年06月02日 23:24, Michael S. Tsirkin wrote:
> On Fri, Jun 02, 2017 at 01:53:11PM +0800, Jason Wang wrote:
>>
>> On 2017年06月01日 21:59, Michael S. Tsirkin wrote:
>>> On Wed, May 31, 2017 at 04:33:33PM +0800, Jason Wang wrote:
>>>> On 2017年05月31日 02:20, Michael S. Tsirkin wrote:
>>>>> On Fri, May 26, 2017 at 04:28:52PM +0200, Maxime Coquelin wrote:
>>>>>> This series aims at specifying ans implementing the protocol update
>>>>>> required to support device IOTLB with user backends.
>>>>>>
>>>>>> In this second non-RFC version, main changes are:
>>>>>>     - spec fixes and clarification
>>>>>>     - rings information update has been restored back to ring enablement time
>>>>>>     - Work around GCC 4.4.7 limitation wrt assignment in unnamed union at
>>>>>> declaration time.
>>>>>>
>>>>>> The series can be tested with vhost_iotlb_proto_v2 branch on my gitlab
>>>>>> account[0].
>>>>>>
>>>>>> The slave requests channel part is re-used from Marc-André's series submitted
>>>>>> last year[1], with main changes from original version being request/feature
>>>>>> names renaming and addition of the REPLY_ACK feature support.
>>>>>>
>>>>>> Regarding IOTLB protocol, one noticeable change is the IOTLB miss request
>>>>>> reply made optionnal (i.e. only if slave requests it by setting the
>>>>>> VHOST_USER_NEED_REPLY flag in the message header). This change provides
>>>>>> more flexibility in the backend implementation of the feature.
>>>>>>
>>>>>> The protocol is very close to kernel backends, except that a new
>>>>>> communication channel is introduced to enable the slave to send
>>>>>> requests to the master.
>>>>>>
>>>>>> [0]:https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_iotlb_proto_v2
>>>>>> [1]:https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg00095.html
>>>>> Overall, this looks good to me. I do think patch 3 isn't a good idea
>>>>> though, if slave wants something let it request it.
>>>>>
>>>>> Need to find out why does vhost in kernel want the used ring iotlb at
>>>>> start time - especially considering we aren't even guaranteed one entry
>>>>> covers the whole ring, and invalidates should affect all addresses at
>>>>> least in theory.
>>>>>
>>>>>
>>>> The reason is probably we want to verify whether or not we could correctly
>>>> access used ring in vhost_vq_init_access(). It was there since vhost_net is
>>>> introduced. We can think to remove this limitation maybe.
>>>>
>>>> Thanks
>>> Well that's only called if iotlb is disabled:
>>>
>>>           if (!vq->iotlb &&
>>>               !access_ok(VERIFY_READ, &vq->used->idx, sizeof vq->used->idx)) {
>>>                   r = -EFAULT;
>>>                   goto err;
>>>           }
>>>
>>> Could you try removing that and see what breaks?
>>>
>> Looks not, the issue is vhost_update_used_flags() which needs device IOTLB
>> translation. If we don't fill IOTLB in advance, it will return -EFAULT.
> Same for vhost_get_used, right?

Yes.

>
>> For simplicity, I don't implement control path device IOTLB miss.
>
> OK so this should be documented in vhost.h.  SET_BACKEND immediately
> writes and reads used ring. User must know this and pre-fault used flags
> and index before setting backend.

Ok.

>> If you
>> care about the incomplete length, we can refine vhost_iotlb_miss() to make
>> sure it covers all size.
>>
>> Thanks
> No need imho, it's only the used flags field that's written, and the
> index that's read right?

Yes.

>    BTW I don't really know why do we do the write
> when event index is setup.  We probably shouldn't, should we?

Yes, looks like we shouldn't.

>
> It's worth considering whether we want this write into used ring at all.
> I put it there originally to help make sure we don't miss the first exit, but
> event index seems to get by fine without this. So why does non event
> index code want it?

Spec said driver must initialize it to zero, so unless we want to 
workaround a buggy driver we can remove this.

>
> I think that if we could get rid of both accesses, it would be
> nice. Would need a feature bit naturally and we'd need to
> support old kernels but at least it will be contained and
> well documented.
>

Technically we can, but we still need a workaround for old kernels. So 
I'm not sure it's worth to do.

Thanks

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support
  2017-06-05  8:51           ` Jason Wang
@ 2017-06-05 15:08             ` Michael S. Tsirkin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2017-06-05 15:08 UTC (permalink / raw)
  To: Jason Wang
  Cc: yuanhan.liu, qemu-devel, peterx, Maxime Coquelin,
	marcandre.lureau, wexu, vkaplans, jfreiman

On Mon, Jun 05, 2017 at 04:51:55PM +0800, Jason Wang wrote:
> > 
> > I think that if we could get rid of both accesses, it would be
> > nice. Would need a feature bit naturally and we'd need to
> > support old kernels but at least it will be contained and
> > well documented.
> > 
> 
> Technically we can, but we still need a workaround for old kernels. So I'm
> not sure it's worth to do.
> 
> Thanks

I see it as valuable for documentation purposes.

-- 
MST

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-06-05 15:08 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-26 14:28 [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support Maxime Coquelin
2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 1/6] vhost: propagate errors in vhost_device_iotlb_miss() Maxime Coquelin
2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 2/6] vhost: rework IOTLB messaging Maxime Coquelin
2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 3/6] vhost: extend ring information update for IOTLB to all rings Maxime Coquelin
2017-05-30 18:12   ` Michael S. Tsirkin
2017-05-30 21:06     ` Maxime Coquelin
2017-05-30 21:11       ` Maxime Coquelin
2017-05-31 15:20         ` Maxime Coquelin
2017-06-01 13:55           ` Michael S. Tsirkin
2017-06-01 13:54       ` Michael S. Tsirkin
2017-05-31  8:48     ` Jason Wang
2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 4/6] vhost-user: add vhost_user to hold the chr Maxime Coquelin
2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 5/6] vhost-user: add slave-req-fd support Maxime Coquelin
2017-05-30 18:17   ` Michael S. Tsirkin
2017-05-30 21:26     ` Maxime Coquelin
2017-05-26 14:28 ` [Qemu-devel] [PATCH v2 6/6] spec/vhost-user spec: Add IOMMU support Maxime Coquelin
2017-05-30 18:08   ` Michael S. Tsirkin
2017-05-30 16:15 ` [Qemu-devel] [PATCH v2 0/6] vhost-user: Specify and implement device IOTLB support Maxime Coquelin
2017-05-30 18:20 ` Michael S. Tsirkin
2017-05-31  8:33   ` Jason Wang
2017-05-31 15:32     ` Maxime Coquelin
2017-06-01  7:04       ` Jason Wang
2017-06-01  8:39         ` Maxime Coquelin
2017-06-01 13:59     ` Michael S. Tsirkin
2017-06-02  5:53       ` Jason Wang
2017-06-02 15:24         ` Michael S. Tsirkin
2017-06-05  8:51           ` Jason Wang
2017-06-05 15:08             ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.