All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/6] Extend vhost-user to support registering external host notifiers
@ 2018-04-12 15:12 ` Tiwei Bie
  0 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-04-12 15:12 UTC (permalink / raw)
  To: mst, jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

The original subject is: Extend vhost-user to support VFIO based accelerators

Update notes
============

Now, this patch set just focuses on adding the support for
registering memory region based host notifiers. With this
support, guest driver in the VM will be able to notify the
hardware device at the vhost backend directly.

It's one of the most important things in vDPA -- the host
notification offload. Because, normally, the hardware device
heavily depends on the notifications. Without this support,
there will be a lot of VM-Exit happen due to the notifications
from guest driver (it will drop the VM performance) and a
lot of CPU resources wasted to do the notification relay
(it will make the hardware offload less attractive, because
one important goal of hardware offload is to free the CPU
resources).

More backgrounds of this patch set can be found from the
cover letter of the previous versions:

RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html
v2:  http://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg05009.html

v2 -> v3:
- A better implementation of the shared vhost-user state (MST);
- Use bus callback to add/delete subregions for notification (MST);
- Refine APIs' names which add/delete subregions for notification (MST);
- Refine the doc of the new vhost-user types and messages (MST);
- Separate host notification offload from the guest notification offload (MST);
- Drop the guest notification offload support from this patch;
- Add memory filter for vhost backend to filter the sections they can handle;

v1 -> v2:
- Add some explanations about why extend vhost-user in commit log (Paolo);
- Bug fix in slave_read() according to Stefan's fix in DPDK;
- Remove IOMMU feature check and related commit log;
- Some minor refinements;
- Rebase to the latest QEMU;

RFC -> v1:
- Add some details about how vDPA works in cover letter (Alexey)
- Add some details about the OVS offload use-case in cover letter (Jason)
- Move PCI specific stuffs out of vhost-user (Jason)
- Handle the virtual IOMMU case (Jason)
- Move VFIO group management code into vfio/common.c (Alex)
- Various refinements;
(approximately sorted by comment posting time)

Tiwei Bie (6):
  vhost-user: add Net prefix to internal state structure
  vhost-user: introduce shared vhost-user state
  vhost-user: support receiving file descriptors in slave_read
  virtio: support setting memory region based host notifier
  vhost: allow backends to filter memory sections
  vhost-user: support registering external host notifiers

 backends/cryptodev-vhost-user.c     |  20 +++-
 docs/interop/vhost-user.txt         |  33 ++++++
 hw/block/vhost-user-blk.c           |  22 +++-
 hw/scsi/vhost-user-scsi.c           |  20 +++-
 hw/virtio/Makefile.objs             |   2 +-
 hw/virtio/vhost-stub.c              |  10 ++
 hw/virtio/vhost-user.c              | 206 ++++++++++++++++++++++++++++++++++--
 hw/virtio/vhost.c                   |   9 +-
 hw/virtio/virtio-pci.c              |  22 ++++
 hw/virtio/virtio.c                  |  13 +++
 include/hw/virtio/vhost-backend.h   |   4 +
 include/hw/virtio/vhost-user-blk.h  |   2 +
 include/hw/virtio/vhost-user-scsi.h |   2 +
 include/hw/virtio/vhost-user.h      |  28 +++++
 include/hw/virtio/virtio-bus.h      |   2 +
 include/hw/virtio/virtio.h          |   2 +
 net/vhost-user.c                    |  78 +++++++++-----
 17 files changed, 433 insertions(+), 42 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user.h

-- 
2.11.0

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] [PATCH v3 0/6] Extend vhost-user to support registering external host notifiers
@ 2018-04-12 15:12 ` Tiwei Bie
  0 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-04-12 15:12 UTC (permalink / raw)
  To: mst, jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

The original subject is: Extend vhost-user to support VFIO based accelerators

Update notes
============

Now, this patch set just focuses on adding the support for
registering memory region based host notifiers. With this
support, guest driver in the VM will be able to notify the
hardware device at the vhost backend directly.

It's one of the most important things in vDPA -- the host
notification offload. Because, normally, the hardware device
heavily depends on the notifications. Without this support,
there will be a lot of VM-Exit happen due to the notifications
from guest driver (it will drop the VM performance) and a
lot of CPU resources wasted to do the notification relay
(it will make the hardware offload less attractive, because
one important goal of hardware offload is to free the CPU
resources).

More backgrounds of this patch set can be found from the
cover letter of the previous versions:

RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html
v2:  http://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg05009.html

v2 -> v3:
- A better implementation of the shared vhost-user state (MST);
- Use bus callback to add/delete subregions for notification (MST);
- Refine APIs' names which add/delete subregions for notification (MST);
- Refine the doc of the new vhost-user types and messages (MST);
- Separate host notification offload from the guest notification offload (MST);
- Drop the guest notification offload support from this patch;
- Add memory filter for vhost backend to filter the sections they can handle;

v1 -> v2:
- Add some explanations about why extend vhost-user in commit log (Paolo);
- Bug fix in slave_read() according to Stefan's fix in DPDK;
- Remove IOMMU feature check and related commit log;
- Some minor refinements;
- Rebase to the latest QEMU;

RFC -> v1:
- Add some details about how vDPA works in cover letter (Alexey)
- Add some details about the OVS offload use-case in cover letter (Jason)
- Move PCI specific stuffs out of vhost-user (Jason)
- Handle the virtual IOMMU case (Jason)
- Move VFIO group management code into vfio/common.c (Alex)
- Various refinements;
(approximately sorted by comment posting time)

Tiwei Bie (6):
  vhost-user: add Net prefix to internal state structure
  vhost-user: introduce shared vhost-user state
  vhost-user: support receiving file descriptors in slave_read
  virtio: support setting memory region based host notifier
  vhost: allow backends to filter memory sections
  vhost-user: support registering external host notifiers

 backends/cryptodev-vhost-user.c     |  20 +++-
 docs/interop/vhost-user.txt         |  33 ++++++
 hw/block/vhost-user-blk.c           |  22 +++-
 hw/scsi/vhost-user-scsi.c           |  20 +++-
 hw/virtio/Makefile.objs             |   2 +-
 hw/virtio/vhost-stub.c              |  10 ++
 hw/virtio/vhost-user.c              | 206 ++++++++++++++++++++++++++++++++++--
 hw/virtio/vhost.c                   |   9 +-
 hw/virtio/virtio-pci.c              |  22 ++++
 hw/virtio/virtio.c                  |  13 +++
 include/hw/virtio/vhost-backend.h   |   4 +
 include/hw/virtio/vhost-user-blk.h  |   2 +
 include/hw/virtio/vhost-user-scsi.h |   2 +
 include/hw/virtio/vhost-user.h      |  28 +++++
 include/hw/virtio/virtio-bus.h      |   2 +
 include/hw/virtio/virtio.h          |   2 +
 net/vhost-user.c                    |  78 +++++++++-----
 17 files changed, 433 insertions(+), 42 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user.h

-- 
2.11.0


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* [Qemu-devel] [PATCH v3 1/6] vhost-user: add Net prefix to internal state structure
  2018-04-12 15:12 ` [virtio-dev] " Tiwei Bie
@ 2018-04-12 15:12   ` Tiwei Bie
  -1 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-04-12 15:12 UTC (permalink / raw)
  To: mst, jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

We are going to introduce a shared vhost user state which
will be named as 'VhostUserState'. So add 'Net' prefix to
the existing internal state structure in the vhost-user
netdev to avoid conflict.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 net/vhost-user.c | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index e0f16c895b..fa28aad12d 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -20,38 +20,38 @@
 #include "qemu/option.h"
 #include "trace.h"
 
-typedef struct VhostUserState {
+typedef struct NetVhostUserState {
     NetClientState nc;
     CharBackend chr; /* only queue index 0 */
     VHostNetState *vhost_net;
     guint watch;
     uint64_t acked_features;
     bool started;
-} VhostUserState;
+} NetVhostUserState;
 
 VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
 {
-    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+    NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
     assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_USER);
     return s->vhost_net;
 }
 
 uint64_t vhost_user_get_acked_features(NetClientState *nc)
 {
-    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+    NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
     assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_USER);
     return s->acked_features;
 }
 
 static void vhost_user_stop(int queues, NetClientState *ncs[])
 {
-    VhostUserState *s;
+    NetVhostUserState *s;
     int i;
 
     for (i = 0; i < queues; i++) {
         assert(ncs[i]->info->type == NET_CLIENT_DRIVER_VHOST_USER);
 
-        s = DO_UPCAST(VhostUserState, nc, ncs[i]);
+        s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
 
         if (s->vhost_net) {
             /* save acked features */
@@ -68,7 +68,7 @@ static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
 {
     VhostNetOptions options;
     struct vhost_net *net = NULL;
-    VhostUserState *s;
+    NetVhostUserState *s;
     int max_queues;
     int i;
 
@@ -77,7 +77,7 @@ static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
     for (i = 0; i < queues; i++) {
         assert(ncs[i]->info->type == NET_CLIENT_DRIVER_VHOST_USER);
 
-        s = DO_UPCAST(VhostUserState, nc, ncs[i]);
+        s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
 
         options.net_backend = ncs[i];
         options.opaque      = be;
@@ -123,7 +123,7 @@ static ssize_t vhost_user_receive(NetClientState *nc, const uint8_t *buf,
        without GUEST_ANNOUNCE capability.
      */
     if (size == 60) {
-        VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+        NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
         int r;
         static int display_rarp_failure = 1;
         char mac_addr[6];
@@ -146,7 +146,7 @@ static ssize_t vhost_user_receive(NetClientState *nc, const uint8_t *buf,
 
 static void vhost_user_cleanup(NetClientState *nc)
 {
-    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+    NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
 
     if (s->vhost_net) {
         vhost_net_cleanup(s->vhost_net);
@@ -180,7 +180,7 @@ static bool vhost_user_has_ufo(NetClientState *nc)
 
 static NetClientInfo net_vhost_user_info = {
         .type = NET_CLIENT_DRIVER_VHOST_USER,
-        .size = sizeof(VhostUserState),
+        .size = sizeof(NetVhostUserState),
         .receive = vhost_user_receive,
         .cleanup = vhost_user_cleanup,
         .has_vnet_hdr = vhost_user_has_vnet_hdr,
@@ -190,7 +190,7 @@ static NetClientInfo net_vhost_user_info = {
 static gboolean net_vhost_user_watch(GIOChannel *chan, GIOCondition cond,
                                            void *opaque)
 {
-    VhostUserState *s = opaque;
+    NetVhostUserState *s = opaque;
 
     qemu_chr_fe_disconnect(&s->chr);
 
@@ -203,7 +203,7 @@ static void chr_closed_bh(void *opaque)
 {
     const char *name = opaque;
     NetClientState *ncs[MAX_QUEUE_NUM];
-    VhostUserState *s;
+    NetVhostUserState *s;
     Error *err = NULL;
     int queues;
 
@@ -212,7 +212,7 @@ static void chr_closed_bh(void *opaque)
                                           MAX_QUEUE_NUM);
     assert(queues < MAX_QUEUE_NUM);
 
-    s = DO_UPCAST(VhostUserState, nc, ncs[0]);
+    s = DO_UPCAST(NetVhostUserState, nc, ncs[0]);
 
     qmp_set_link(name, false, &err);
     vhost_user_stop(queues, ncs);
@@ -229,7 +229,7 @@ static void net_vhost_user_event(void *opaque, int event)
 {
     const char *name = opaque;
     NetClientState *ncs[MAX_QUEUE_NUM];
-    VhostUserState *s;
+    NetVhostUserState *s;
     Chardev *chr;
     Error *err = NULL;
     int queues;
@@ -239,7 +239,7 @@ static void net_vhost_user_event(void *opaque, int event)
                                           MAX_QUEUE_NUM);
     assert(queues < MAX_QUEUE_NUM);
 
-    s = DO_UPCAST(VhostUserState, nc, ncs[0]);
+    s = DO_UPCAST(NetVhostUserState, nc, ncs[0]);
     chr = qemu_chr_fe_get_driver(&s->chr);
     trace_vhost_user_event(chr->label, event);
     switch (event) {
@@ -283,7 +283,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
 {
     Error *err = NULL;
     NetClientState *nc, *nc0 = NULL;
-    VhostUserState *s;
+    NetVhostUserState *s;
     int i;
 
     assert(name);
@@ -296,7 +296,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
         nc->queue_index = i;
         if (!nc0) {
             nc0 = nc;
-            s = DO_UPCAST(VhostUserState, nc, nc);
+            s = DO_UPCAST(NetVhostUserState, nc, nc);
             if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
                 error_report_err(err);
                 return -1;
@@ -305,7 +305,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
 
     }
 
-    s = DO_UPCAST(VhostUserState, nc, nc0);
+    s = DO_UPCAST(NetVhostUserState, nc, nc0);
     do {
         if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
             error_report_err(err);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [virtio-dev] [PATCH v3 1/6] vhost-user: add Net prefix to internal state structure
@ 2018-04-12 15:12   ` Tiwei Bie
  0 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-04-12 15:12 UTC (permalink / raw)
  To: mst, jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

We are going to introduce a shared vhost user state which
will be named as 'VhostUserState'. So add 'Net' prefix to
the existing internal state structure in the vhost-user
netdev to avoid conflict.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 net/vhost-user.c | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index e0f16c895b..fa28aad12d 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -20,38 +20,38 @@
 #include "qemu/option.h"
 #include "trace.h"
 
-typedef struct VhostUserState {
+typedef struct NetVhostUserState {
     NetClientState nc;
     CharBackend chr; /* only queue index 0 */
     VHostNetState *vhost_net;
     guint watch;
     uint64_t acked_features;
     bool started;
-} VhostUserState;
+} NetVhostUserState;
 
 VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
 {
-    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+    NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
     assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_USER);
     return s->vhost_net;
 }
 
 uint64_t vhost_user_get_acked_features(NetClientState *nc)
 {
-    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+    NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
     assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_USER);
     return s->acked_features;
 }
 
 static void vhost_user_stop(int queues, NetClientState *ncs[])
 {
-    VhostUserState *s;
+    NetVhostUserState *s;
     int i;
 
     for (i = 0; i < queues; i++) {
         assert(ncs[i]->info->type == NET_CLIENT_DRIVER_VHOST_USER);
 
-        s = DO_UPCAST(VhostUserState, nc, ncs[i]);
+        s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
 
         if (s->vhost_net) {
             /* save acked features */
@@ -68,7 +68,7 @@ static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
 {
     VhostNetOptions options;
     struct vhost_net *net = NULL;
-    VhostUserState *s;
+    NetVhostUserState *s;
     int max_queues;
     int i;
 
@@ -77,7 +77,7 @@ static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
     for (i = 0; i < queues; i++) {
         assert(ncs[i]->info->type == NET_CLIENT_DRIVER_VHOST_USER);
 
-        s = DO_UPCAST(VhostUserState, nc, ncs[i]);
+        s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
 
         options.net_backend = ncs[i];
         options.opaque      = be;
@@ -123,7 +123,7 @@ static ssize_t vhost_user_receive(NetClientState *nc, const uint8_t *buf,
        without GUEST_ANNOUNCE capability.
      */
     if (size == 60) {
-        VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+        NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
         int r;
         static int display_rarp_failure = 1;
         char mac_addr[6];
@@ -146,7 +146,7 @@ static ssize_t vhost_user_receive(NetClientState *nc, const uint8_t *buf,
 
 static void vhost_user_cleanup(NetClientState *nc)
 {
-    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+    NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
 
     if (s->vhost_net) {
         vhost_net_cleanup(s->vhost_net);
@@ -180,7 +180,7 @@ static bool vhost_user_has_ufo(NetClientState *nc)
 
 static NetClientInfo net_vhost_user_info = {
         .type = NET_CLIENT_DRIVER_VHOST_USER,
-        .size = sizeof(VhostUserState),
+        .size = sizeof(NetVhostUserState),
         .receive = vhost_user_receive,
         .cleanup = vhost_user_cleanup,
         .has_vnet_hdr = vhost_user_has_vnet_hdr,
@@ -190,7 +190,7 @@ static NetClientInfo net_vhost_user_info = {
 static gboolean net_vhost_user_watch(GIOChannel *chan, GIOCondition cond,
                                            void *opaque)
 {
-    VhostUserState *s = opaque;
+    NetVhostUserState *s = opaque;
 
     qemu_chr_fe_disconnect(&s->chr);
 
@@ -203,7 +203,7 @@ static void chr_closed_bh(void *opaque)
 {
     const char *name = opaque;
     NetClientState *ncs[MAX_QUEUE_NUM];
-    VhostUserState *s;
+    NetVhostUserState *s;
     Error *err = NULL;
     int queues;
 
@@ -212,7 +212,7 @@ static void chr_closed_bh(void *opaque)
                                           MAX_QUEUE_NUM);
     assert(queues < MAX_QUEUE_NUM);
 
-    s = DO_UPCAST(VhostUserState, nc, ncs[0]);
+    s = DO_UPCAST(NetVhostUserState, nc, ncs[0]);
 
     qmp_set_link(name, false, &err);
     vhost_user_stop(queues, ncs);
@@ -229,7 +229,7 @@ static void net_vhost_user_event(void *opaque, int event)
 {
     const char *name = opaque;
     NetClientState *ncs[MAX_QUEUE_NUM];
-    VhostUserState *s;
+    NetVhostUserState *s;
     Chardev *chr;
     Error *err = NULL;
     int queues;
@@ -239,7 +239,7 @@ static void net_vhost_user_event(void *opaque, int event)
                                           MAX_QUEUE_NUM);
     assert(queues < MAX_QUEUE_NUM);
 
-    s = DO_UPCAST(VhostUserState, nc, ncs[0]);
+    s = DO_UPCAST(NetVhostUserState, nc, ncs[0]);
     chr = qemu_chr_fe_get_driver(&s->chr);
     trace_vhost_user_event(chr->label, event);
     switch (event) {
@@ -283,7 +283,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
 {
     Error *err = NULL;
     NetClientState *nc, *nc0 = NULL;
-    VhostUserState *s;
+    NetVhostUserState *s;
     int i;
 
     assert(name);
@@ -296,7 +296,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
         nc->queue_index = i;
         if (!nc0) {
             nc0 = nc;
-            s = DO_UPCAST(VhostUserState, nc, nc);
+            s = DO_UPCAST(NetVhostUserState, nc, nc);
             if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
                 error_report_err(err);
                 return -1;
@@ -305,7 +305,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
 
     }
 
-    s = DO_UPCAST(VhostUserState, nc, nc0);
+    s = DO_UPCAST(NetVhostUserState, nc, nc0);
     do {
         if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
             error_report_err(err);
-- 
2.11.0


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [Qemu-devel] [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
  2018-04-12 15:12 ` [virtio-dev] " Tiwei Bie
@ 2018-04-12 15:12   ` Tiwei Bie
  -1 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-04-12 15:12 UTC (permalink / raw)
  To: mst, jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

When multi queue is enabled e.g. for a virtio-net device,
each queue pair will have a vhost_dev, and the only thing
shared between vhost devs currently is the chardev. This
patch introduces a vhost-user state structure which will
be shared by all vhost devs of the same virtio device.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
 hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
 hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
 hw/virtio/Makefile.objs             |  2 +-
 hw/virtio/vhost-stub.c              | 10 ++++++++++
 hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
 include/hw/virtio/vhost-user-blk.h  |  2 ++
 include/hw/virtio/vhost-user-scsi.h |  2 ++
 include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
 net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
 10 files changed, 149 insertions(+), 20 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user.h

diff --git a/backends/cryptodev-vhost-user.c b/backends/cryptodev-vhost-user.c
index 862d4f2580..d52daccfcd 100644
--- a/backends/cryptodev-vhost-user.c
+++ b/backends/cryptodev-vhost-user.c
@@ -26,6 +26,7 @@
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/error-report.h"
+#include "hw/virtio/vhost-user.h"
 #include "standard-headers/linux/virtio_crypto.h"
 #include "sysemu/cryptodev-vhost.h"
 #include "chardev/char-fe.h"
@@ -46,6 +47,7 @@
 typedef struct CryptoDevBackendVhostUser {
     CryptoDevBackend parent_obj;
 
+    VhostUserState *vhost_user;
     CharBackend chr;
     char *chr_name;
     bool opened;
@@ -102,7 +104,7 @@ cryptodev_vhost_user_start(int queues,
             continue;
         }
 
-        options.opaque = &s->chr;
+        options.opaque = s->vhost_user;
         options.backend_type = VHOST_BACKEND_TYPE_USER;
         options.cc = b->conf.peers.ccs[i];
         s->vhost_crypto[i] = cryptodev_vhost_init(&options);
@@ -185,6 +187,7 @@ static void cryptodev_vhost_user_init(
     size_t i;
     Error *local_err = NULL;
     Chardev *chr;
+    VhostUserState *user;
     CryptoDevBackendClient *cc;
     CryptoDevBackendVhostUser *s =
                       CRYPTODEV_BACKEND_VHOST_USER(backend);
@@ -215,6 +218,15 @@ static void cryptodev_vhost_user_init(
         }
     }
 
+    user = vhost_user_init();
+    if (!user) {
+        error_setg(errp, "Failed to init vhost_user");
+        return;
+    }
+
+    user->chr = &s->chr;
+    s->vhost_user = user;
+
     qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
                      cryptodev_vhost_user_event, NULL, s, NULL, true);
 
@@ -299,6 +311,12 @@ static void cryptodev_vhost_user_cleanup(
             backend->conf.peers.ccs[i] = NULL;
         }
     }
+
+    if (s->vhost_user) {
+        vhost_user_cleanup(s->vhost_user);
+        g_free(s->vhost_user);
+        s->vhost_user = NULL;
+    }
 }
 
 static void cryptodev_vhost_user_set_chardev(Object *obj,
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 262baca432..4021d71c31 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -229,6 +229,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
+    VhostUserState *user;
     int i, ret;
 
     if (!s->chardev.chr) {
@@ -246,6 +247,15 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    user = vhost_user_init();
+    if (!user) {
+        error_setg(errp, "vhost-user-blk: failed to init vhost_user");
+        return;
+    }
+
+    user->chr = &s->chardev;
+    s->vhost_user = user;
+
     virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
                 sizeof(struct virtio_blk_config));
 
@@ -261,7 +271,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
 
     vhost_dev_set_config_notifier(&s->dev, &blk_ops);
 
-    ret = vhost_dev_init(&s->dev, &s->chardev, VHOST_BACKEND_TYPE_USER, 0);
+    ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
     if (ret < 0) {
         error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
                    strerror(-ret));
@@ -286,6 +296,10 @@ vhost_err:
 virtio_err:
     g_free(s->dev.vqs);
     virtio_cleanup(vdev);
+
+    vhost_user_cleanup(user);
+    g_free(user);
+    s->vhost_user = NULL;
 }
 
 static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
@@ -297,6 +311,12 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
     vhost_dev_cleanup(&s->dev);
     g_free(s->dev.vqs);
     virtio_cleanup(vdev);
+
+    if (s->vhost_user) {
+        vhost_user_cleanup(s->vhost_user);
+        g_free(s->vhost_user);
+        s->vhost_user = NULL;
+    }
 }
 
 static void vhost_user_blk_instance_init(Object *obj)
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index 9389ed48e0..9355cfdf07 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -69,6 +69,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
     VHostUserSCSI *s = VHOST_USER_SCSI(dev);
     VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
+    VhostUserState *user;
     Error *err = NULL;
     int ret;
 
@@ -85,19 +86,30 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    user = vhost_user_init();
+    if (!user) {
+        error_setg(errp, "vhost-user-scsi: failed to init vhost_user");
+        return;
+    }
+    user->chr = &vs->conf.chardev;
+
     vsc->dev.nvqs = 2 + vs->conf.num_queues;
     vsc->dev.vqs = g_new(struct vhost_virtqueue, vsc->dev.nvqs);
     vsc->dev.vq_index = 0;
     vsc->dev.backend_features = 0;
 
-    ret = vhost_dev_init(&vsc->dev, (void *)&vs->conf.chardev,
+    ret = vhost_dev_init(&vsc->dev, user,
                          VHOST_BACKEND_TYPE_USER, 0);
     if (ret < 0) {
         error_setg(errp, "vhost-user-scsi: vhost initialization failed: %s",
                    strerror(-ret));
+        vhost_user_cleanup(user);
+        g_free(user);
         return;
     }
 
+    s->vhost_user = user;
+
     /* Channel and lun both are 0 for bootable vhost-user-scsi disk */
     vsc->channel = 0;
     vsc->lun = 0;
@@ -117,6 +129,12 @@ static void vhost_user_scsi_unrealize(DeviceState *dev, Error **errp)
     g_free(vsc->dev.vqs);
 
     virtio_scsi_common_unrealize(dev, errp);
+
+    if (s->vhost_user) {
+        vhost_user_cleanup(s->vhost_user);
+        g_free(s->vhost_user);
+        s->vhost_user = NULL;
+    }
 }
 
 static uint64_t vhost_user_scsi_get_features(VirtIODevice *vdev,
diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index 765d363c1f..030969e28c 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -11,5 +11,5 @@ obj-y += virtio-crypto.o
 obj-$(CONFIG_VIRTIO_PCI) += virtio-crypto-pci.o
 endif
 
-common-obj-$(call lnot,$(CONFIG_LINUX)) += vhost-stub.o
+common-obj-$(call lnot,$(call land,$(CONFIG_VIRTIO),$(CONFIG_LINUX))) += vhost-stub.o
 common-obj-$(CONFIG_ALL) += vhost-stub.o
diff --git a/hw/virtio/vhost-stub.c b/hw/virtio/vhost-stub.c
index 2d76cdebdc..049089b5e2 100644
--- a/hw/virtio/vhost-stub.c
+++ b/hw/virtio/vhost-stub.c
@@ -1,7 +1,17 @@
 #include "qemu/osdep.h"
 #include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
 
 bool vhost_has_free_slot(void)
 {
     return true;
 }
+
+VhostUserState *vhost_user_init(void)
+{
+    return NULL;
+}
+
+void vhost_user_cleanup(VhostUserState *user)
+{
+}
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 38da8692bb..91edd95453 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -11,6 +11,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
 #include "hw/virtio/vhost-backend.h"
 #include "hw/virtio/virtio-net.h"
 #include "chardev/char-fe.h"
@@ -173,7 +174,8 @@ static VhostUserMsg m __attribute__ ((unused));
 
 struct vhost_user {
     struct vhost_dev *dev;
-    CharBackend *chr;
+    /* Shared between vhost devs of the same virtio device */
+    VhostUserState *user;
     int slave_fd;
     NotifierWithReturn postcopy_notifier;
     struct PostCopyFD  postcopy_fd;
@@ -199,7 +201,7 @@ static bool ioeventfd_enabled(void)
 static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
 {
     struct vhost_user *u = dev->opaque;
-    CharBackend *chr = u->chr;
+    CharBackend *chr = u->user->chr;
     uint8_t *p = (uint8_t *) msg;
     int r, size = VHOST_USER_HDR_SIZE;
 
@@ -285,7 +287,7 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
                             int *fds, int fd_num)
 {
     struct vhost_user *u = dev->opaque;
-    CharBackend *chr = u->chr;
+    CharBackend *chr = u->user->chr;
     int ret, size = VHOST_USER_HDR_SIZE + msg->hdr.size;
 
     /*
@@ -1044,7 +1046,7 @@ static int vhost_user_postcopy_waker(struct PostCopyFD *pcfd, RAMBlock *rb,
 static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
 {
     struct vhost_user *u = dev->opaque;
-    CharBackend *chr = u->chr;
+    CharBackend *chr = u->user->chr;
     int ufd;
     VhostUserMsg msg = {
         .hdr.request = VHOST_USER_POSTCOPY_ADVISE,
@@ -1182,7 +1184,7 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
     return 0;
 }
 
-static int vhost_user_init(struct vhost_dev *dev, void *opaque)
+static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
 {
     uint64_t features, protocol_features;
     struct vhost_user *u;
@@ -1191,7 +1193,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
     u = g_new0(struct vhost_user, 1);
-    u->chr = opaque;
+    u->user = opaque;
     u->slave_fd = -1;
     u->dev = dev;
     dev->opaque = u;
@@ -1267,7 +1269,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
     return 0;
 }
 
-static int vhost_user_cleanup(struct vhost_dev *dev)
+static int vhost_user_backend_cleanup(struct vhost_dev *dev)
 {
     struct vhost_user *u;
 
@@ -1581,10 +1583,21 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
     return 0;
 }
 
+VhostUserState *vhost_user_init(void)
+{
+    VhostUserState *user = g_new0(struct VhostUserState, 1);
+
+    return user;
+}
+
+void vhost_user_cleanup(VhostUserState *user)
+{
+}
+
 const VhostOps user_ops = {
         .backend_type = VHOST_BACKEND_TYPE_USER,
-        .vhost_backend_init = vhost_user_init,
-        .vhost_backend_cleanup = vhost_user_cleanup,
+        .vhost_backend_init = vhost_user_backend_init,
+        .vhost_backend_cleanup = vhost_user_backend_cleanup,
         .vhost_backend_memslots_limit = vhost_user_memslots_limit,
         .vhost_set_log_base = vhost_user_set_log_base,
         .vhost_set_mem_table = vhost_user_set_mem_table,
diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
index 5804cc904a..f1258ae545 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -21,6 +21,7 @@
 #include "hw/block/block.h"
 #include "chardev/char-fe.h"
 #include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
 
 #define TYPE_VHOST_USER_BLK "vhost-user-blk"
 #define VHOST_USER_BLK(obj) \
@@ -36,6 +37,7 @@ typedef struct VHostUserBlk {
     uint32_t config_wce;
     uint32_t config_ro;
     struct vhost_dev dev;
+    VhostUserState *vhost_user;
 } VHostUserBlk;
 
 #endif
diff --git a/include/hw/virtio/vhost-user-scsi.h b/include/hw/virtio/vhost-user-scsi.h
index 01861f78d0..3ec34ae867 100644
--- a/include/hw/virtio/vhost-user-scsi.h
+++ b/include/hw/virtio/vhost-user-scsi.h
@@ -21,6 +21,7 @@
 #include "hw/qdev.h"
 #include "hw/virtio/virtio-scsi.h"
 #include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
 #include "hw/virtio/vhost-scsi-common.h"
 
 #define TYPE_VHOST_USER_SCSI "vhost-user-scsi"
@@ -30,6 +31,7 @@
 typedef struct VHostUserSCSI {
     VHostSCSICommon parent_obj;
     uint64_t host_features;
+    VhostUserState *vhost_user;
 } VHostUserSCSI;
 
 #endif /* VHOST_USER_SCSI_H */
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
new file mode 100644
index 0000000000..eb8bc0d90d
--- /dev/null
+++ b/include/hw/virtio/vhost-user.h
@@ -0,0 +1,20 @@
+/*
+ * Copyright (c) 2017-2018 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_VIRTIO_VHOST_USER_H
+#define HW_VIRTIO_VHOST_USER_H
+
+#include "chardev/char-fe.h"
+
+typedef struct VhostUserState {
+    CharBackend *chr;
+} VhostUserState;
+
+VhostUserState *vhost_user_init(void);
+void vhost_user_cleanup(VhostUserState *user);
+
+#endif
diff --git a/net/vhost-user.c b/net/vhost-user.c
index fa28aad12d..525a061acf 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -12,6 +12,7 @@
 #include "clients.h"
 #include "net/vhost_net.h"
 #include "net/vhost-user.h"
+#include "hw/virtio/vhost-user.h"
 #include "chardev/char-fe.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-net.h"
@@ -23,6 +24,7 @@
 typedef struct NetVhostUserState {
     NetClientState nc;
     CharBackend chr; /* only queue index 0 */
+    VhostUserState *vhost_user;
     VHostNetState *vhost_net;
     guint watch;
     uint64_t acked_features;
@@ -64,7 +66,8 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
     }
 }
 
-static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
+static int vhost_user_start(int queues, NetClientState *ncs[],
+                            VhostUserState *be)
 {
     VhostNetOptions options;
     struct vhost_net *net = NULL;
@@ -144,7 +147,7 @@ static ssize_t vhost_user_receive(NetClientState *nc, const uint8_t *buf,
     return size;
 }
 
-static void vhost_user_cleanup(NetClientState *nc)
+static void net_vhost_user_cleanup(NetClientState *nc)
 {
     NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
 
@@ -153,6 +156,11 @@ static void vhost_user_cleanup(NetClientState *nc)
         g_free(s->vhost_net);
         s->vhost_net = NULL;
     }
+    if (s->vhost_user) {
+        vhost_user_cleanup(s->vhost_user);
+        g_free(s->vhost_user);
+        s->vhost_user = NULL;
+    }
     if (nc->queue_index == 0) {
         if (s->watch) {
             g_source_remove(s->watch);
@@ -182,7 +190,7 @@ static NetClientInfo net_vhost_user_info = {
         .type = NET_CLIENT_DRIVER_VHOST_USER,
         .size = sizeof(NetVhostUserState),
         .receive = vhost_user_receive,
-        .cleanup = vhost_user_cleanup,
+        .cleanup = net_vhost_user_cleanup,
         .has_vnet_hdr = vhost_user_has_vnet_hdr,
         .has_ufo = vhost_user_has_ufo,
 };
@@ -244,7 +252,7 @@ static void net_vhost_user_event(void *opaque, int event)
     trace_vhost_user_event(chr->label, event);
     switch (event) {
     case CHR_EVENT_OPENED:
-        if (vhost_user_start(queues, ncs, &s->chr) < 0) {
+        if (vhost_user_start(queues, ncs, s->vhost_user) < 0) {
             qemu_chr_fe_disconnect(&s->chr);
             return;
         }
@@ -283,12 +291,19 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
 {
     Error *err = NULL;
     NetClientState *nc, *nc0 = NULL;
+    VhostUserState *user = NULL;
     NetVhostUserState *s;
     int i;
 
     assert(name);
     assert(queues > 0);
 
+    user = vhost_user_init();
+    if (!user) {
+        error_report("failed to init vhost_user");
+        goto err;
+    }
+
     for (i = 0; i < queues; i++) {
         nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
         snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to %s",
@@ -299,17 +314,19 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
             s = DO_UPCAST(NetVhostUserState, nc, nc);
             if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
                 error_report_err(err);
-                return -1;
+                goto err;
             }
+            user->chr = &s->chr;
         }
-
+        s = DO_UPCAST(NetVhostUserState, nc, nc);
+        s->vhost_user = user;
     }
 
     s = DO_UPCAST(NetVhostUserState, nc, nc0);
     do {
         if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
             error_report_err(err);
-            return -1;
+            goto err;
         }
         qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
                                  net_vhost_user_event, NULL, nc0->name, NULL,
@@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
     assert(s->vhost_net);
 
     return 0;
+
+err:
+    if (user) {
+        vhost_user_cleanup(user);
+        g_free(user);
+        s->vhost_user = NULL;
+    }
+
+    return -1;
 }
 
 static Chardev *net_vhost_claim_chardev(
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [virtio-dev] [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
@ 2018-04-12 15:12   ` Tiwei Bie
  0 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-04-12 15:12 UTC (permalink / raw)
  To: mst, jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

When multi queue is enabled e.g. for a virtio-net device,
each queue pair will have a vhost_dev, and the only thing
shared between vhost devs currently is the chardev. This
patch introduces a vhost-user state structure which will
be shared by all vhost devs of the same virtio device.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
 hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
 hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
 hw/virtio/Makefile.objs             |  2 +-
 hw/virtio/vhost-stub.c              | 10 ++++++++++
 hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
 include/hw/virtio/vhost-user-blk.h  |  2 ++
 include/hw/virtio/vhost-user-scsi.h |  2 ++
 include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
 net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
 10 files changed, 149 insertions(+), 20 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user.h

diff --git a/backends/cryptodev-vhost-user.c b/backends/cryptodev-vhost-user.c
index 862d4f2580..d52daccfcd 100644
--- a/backends/cryptodev-vhost-user.c
+++ b/backends/cryptodev-vhost-user.c
@@ -26,6 +26,7 @@
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/error-report.h"
+#include "hw/virtio/vhost-user.h"
 #include "standard-headers/linux/virtio_crypto.h"
 #include "sysemu/cryptodev-vhost.h"
 #include "chardev/char-fe.h"
@@ -46,6 +47,7 @@
 typedef struct CryptoDevBackendVhostUser {
     CryptoDevBackend parent_obj;
 
+    VhostUserState *vhost_user;
     CharBackend chr;
     char *chr_name;
     bool opened;
@@ -102,7 +104,7 @@ cryptodev_vhost_user_start(int queues,
             continue;
         }
 
-        options.opaque = &s->chr;
+        options.opaque = s->vhost_user;
         options.backend_type = VHOST_BACKEND_TYPE_USER;
         options.cc = b->conf.peers.ccs[i];
         s->vhost_crypto[i] = cryptodev_vhost_init(&options);
@@ -185,6 +187,7 @@ static void cryptodev_vhost_user_init(
     size_t i;
     Error *local_err = NULL;
     Chardev *chr;
+    VhostUserState *user;
     CryptoDevBackendClient *cc;
     CryptoDevBackendVhostUser *s =
                       CRYPTODEV_BACKEND_VHOST_USER(backend);
@@ -215,6 +218,15 @@ static void cryptodev_vhost_user_init(
         }
     }
 
+    user = vhost_user_init();
+    if (!user) {
+        error_setg(errp, "Failed to init vhost_user");
+        return;
+    }
+
+    user->chr = &s->chr;
+    s->vhost_user = user;
+
     qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
                      cryptodev_vhost_user_event, NULL, s, NULL, true);
 
@@ -299,6 +311,12 @@ static void cryptodev_vhost_user_cleanup(
             backend->conf.peers.ccs[i] = NULL;
         }
     }
+
+    if (s->vhost_user) {
+        vhost_user_cleanup(s->vhost_user);
+        g_free(s->vhost_user);
+        s->vhost_user = NULL;
+    }
 }
 
 static void cryptodev_vhost_user_set_chardev(Object *obj,
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 262baca432..4021d71c31 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -229,6 +229,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
+    VhostUserState *user;
     int i, ret;
 
     if (!s->chardev.chr) {
@@ -246,6 +247,15 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    user = vhost_user_init();
+    if (!user) {
+        error_setg(errp, "vhost-user-blk: failed to init vhost_user");
+        return;
+    }
+
+    user->chr = &s->chardev;
+    s->vhost_user = user;
+
     virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
                 sizeof(struct virtio_blk_config));
 
@@ -261,7 +271,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
 
     vhost_dev_set_config_notifier(&s->dev, &blk_ops);
 
-    ret = vhost_dev_init(&s->dev, &s->chardev, VHOST_BACKEND_TYPE_USER, 0);
+    ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
     if (ret < 0) {
         error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
                    strerror(-ret));
@@ -286,6 +296,10 @@ vhost_err:
 virtio_err:
     g_free(s->dev.vqs);
     virtio_cleanup(vdev);
+
+    vhost_user_cleanup(user);
+    g_free(user);
+    s->vhost_user = NULL;
 }
 
 static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
@@ -297,6 +311,12 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
     vhost_dev_cleanup(&s->dev);
     g_free(s->dev.vqs);
     virtio_cleanup(vdev);
+
+    if (s->vhost_user) {
+        vhost_user_cleanup(s->vhost_user);
+        g_free(s->vhost_user);
+        s->vhost_user = NULL;
+    }
 }
 
 static void vhost_user_blk_instance_init(Object *obj)
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index 9389ed48e0..9355cfdf07 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -69,6 +69,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
     VHostUserSCSI *s = VHOST_USER_SCSI(dev);
     VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
+    VhostUserState *user;
     Error *err = NULL;
     int ret;
 
@@ -85,19 +86,30 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    user = vhost_user_init();
+    if (!user) {
+        error_setg(errp, "vhost-user-scsi: failed to init vhost_user");
+        return;
+    }
+    user->chr = &vs->conf.chardev;
+
     vsc->dev.nvqs = 2 + vs->conf.num_queues;
     vsc->dev.vqs = g_new(struct vhost_virtqueue, vsc->dev.nvqs);
     vsc->dev.vq_index = 0;
     vsc->dev.backend_features = 0;
 
-    ret = vhost_dev_init(&vsc->dev, (void *)&vs->conf.chardev,
+    ret = vhost_dev_init(&vsc->dev, user,
                          VHOST_BACKEND_TYPE_USER, 0);
     if (ret < 0) {
         error_setg(errp, "vhost-user-scsi: vhost initialization failed: %s",
                    strerror(-ret));
+        vhost_user_cleanup(user);
+        g_free(user);
         return;
     }
 
+    s->vhost_user = user;
+
     /* Channel and lun both are 0 for bootable vhost-user-scsi disk */
     vsc->channel = 0;
     vsc->lun = 0;
@@ -117,6 +129,12 @@ static void vhost_user_scsi_unrealize(DeviceState *dev, Error **errp)
     g_free(vsc->dev.vqs);
 
     virtio_scsi_common_unrealize(dev, errp);
+
+    if (s->vhost_user) {
+        vhost_user_cleanup(s->vhost_user);
+        g_free(s->vhost_user);
+        s->vhost_user = NULL;
+    }
 }
 
 static uint64_t vhost_user_scsi_get_features(VirtIODevice *vdev,
diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index 765d363c1f..030969e28c 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -11,5 +11,5 @@ obj-y += virtio-crypto.o
 obj-$(CONFIG_VIRTIO_PCI) += virtio-crypto-pci.o
 endif
 
-common-obj-$(call lnot,$(CONFIG_LINUX)) += vhost-stub.o
+common-obj-$(call lnot,$(call land,$(CONFIG_VIRTIO),$(CONFIG_LINUX))) += vhost-stub.o
 common-obj-$(CONFIG_ALL) += vhost-stub.o
diff --git a/hw/virtio/vhost-stub.c b/hw/virtio/vhost-stub.c
index 2d76cdebdc..049089b5e2 100644
--- a/hw/virtio/vhost-stub.c
+++ b/hw/virtio/vhost-stub.c
@@ -1,7 +1,17 @@
 #include "qemu/osdep.h"
 #include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
 
 bool vhost_has_free_slot(void)
 {
     return true;
 }
+
+VhostUserState *vhost_user_init(void)
+{
+    return NULL;
+}
+
+void vhost_user_cleanup(VhostUserState *user)
+{
+}
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 38da8692bb..91edd95453 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -11,6 +11,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
 #include "hw/virtio/vhost-backend.h"
 #include "hw/virtio/virtio-net.h"
 #include "chardev/char-fe.h"
@@ -173,7 +174,8 @@ static VhostUserMsg m __attribute__ ((unused));
 
 struct vhost_user {
     struct vhost_dev *dev;
-    CharBackend *chr;
+    /* Shared between vhost devs of the same virtio device */
+    VhostUserState *user;
     int slave_fd;
     NotifierWithReturn postcopy_notifier;
     struct PostCopyFD  postcopy_fd;
@@ -199,7 +201,7 @@ static bool ioeventfd_enabled(void)
 static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
 {
     struct vhost_user *u = dev->opaque;
-    CharBackend *chr = u->chr;
+    CharBackend *chr = u->user->chr;
     uint8_t *p = (uint8_t *) msg;
     int r, size = VHOST_USER_HDR_SIZE;
 
@@ -285,7 +287,7 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
                             int *fds, int fd_num)
 {
     struct vhost_user *u = dev->opaque;
-    CharBackend *chr = u->chr;
+    CharBackend *chr = u->user->chr;
     int ret, size = VHOST_USER_HDR_SIZE + msg->hdr.size;
 
     /*
@@ -1044,7 +1046,7 @@ static int vhost_user_postcopy_waker(struct PostCopyFD *pcfd, RAMBlock *rb,
 static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
 {
     struct vhost_user *u = dev->opaque;
-    CharBackend *chr = u->chr;
+    CharBackend *chr = u->user->chr;
     int ufd;
     VhostUserMsg msg = {
         .hdr.request = VHOST_USER_POSTCOPY_ADVISE,
@@ -1182,7 +1184,7 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
     return 0;
 }
 
-static int vhost_user_init(struct vhost_dev *dev, void *opaque)
+static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
 {
     uint64_t features, protocol_features;
     struct vhost_user *u;
@@ -1191,7 +1193,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
     u = g_new0(struct vhost_user, 1);
-    u->chr = opaque;
+    u->user = opaque;
     u->slave_fd = -1;
     u->dev = dev;
     dev->opaque = u;
@@ -1267,7 +1269,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
     return 0;
 }
 
-static int vhost_user_cleanup(struct vhost_dev *dev)
+static int vhost_user_backend_cleanup(struct vhost_dev *dev)
 {
     struct vhost_user *u;
 
@@ -1581,10 +1583,21 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
     return 0;
 }
 
+VhostUserState *vhost_user_init(void)
+{
+    VhostUserState *user = g_new0(struct VhostUserState, 1);
+
+    return user;
+}
+
+void vhost_user_cleanup(VhostUserState *user)
+{
+}
+
 const VhostOps user_ops = {
         .backend_type = VHOST_BACKEND_TYPE_USER,
-        .vhost_backend_init = vhost_user_init,
-        .vhost_backend_cleanup = vhost_user_cleanup,
+        .vhost_backend_init = vhost_user_backend_init,
+        .vhost_backend_cleanup = vhost_user_backend_cleanup,
         .vhost_backend_memslots_limit = vhost_user_memslots_limit,
         .vhost_set_log_base = vhost_user_set_log_base,
         .vhost_set_mem_table = vhost_user_set_mem_table,
diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
index 5804cc904a..f1258ae545 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -21,6 +21,7 @@
 #include "hw/block/block.h"
 #include "chardev/char-fe.h"
 #include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
 
 #define TYPE_VHOST_USER_BLK "vhost-user-blk"
 #define VHOST_USER_BLK(obj) \
@@ -36,6 +37,7 @@ typedef struct VHostUserBlk {
     uint32_t config_wce;
     uint32_t config_ro;
     struct vhost_dev dev;
+    VhostUserState *vhost_user;
 } VHostUserBlk;
 
 #endif
diff --git a/include/hw/virtio/vhost-user-scsi.h b/include/hw/virtio/vhost-user-scsi.h
index 01861f78d0..3ec34ae867 100644
--- a/include/hw/virtio/vhost-user-scsi.h
+++ b/include/hw/virtio/vhost-user-scsi.h
@@ -21,6 +21,7 @@
 #include "hw/qdev.h"
 #include "hw/virtio/virtio-scsi.h"
 #include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
 #include "hw/virtio/vhost-scsi-common.h"
 
 #define TYPE_VHOST_USER_SCSI "vhost-user-scsi"
@@ -30,6 +31,7 @@
 typedef struct VHostUserSCSI {
     VHostSCSICommon parent_obj;
     uint64_t host_features;
+    VhostUserState *vhost_user;
 } VHostUserSCSI;
 
 #endif /* VHOST_USER_SCSI_H */
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
new file mode 100644
index 0000000000..eb8bc0d90d
--- /dev/null
+++ b/include/hw/virtio/vhost-user.h
@@ -0,0 +1,20 @@
+/*
+ * Copyright (c) 2017-2018 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_VIRTIO_VHOST_USER_H
+#define HW_VIRTIO_VHOST_USER_H
+
+#include "chardev/char-fe.h"
+
+typedef struct VhostUserState {
+    CharBackend *chr;
+} VhostUserState;
+
+VhostUserState *vhost_user_init(void);
+void vhost_user_cleanup(VhostUserState *user);
+
+#endif
diff --git a/net/vhost-user.c b/net/vhost-user.c
index fa28aad12d..525a061acf 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -12,6 +12,7 @@
 #include "clients.h"
 #include "net/vhost_net.h"
 #include "net/vhost-user.h"
+#include "hw/virtio/vhost-user.h"
 #include "chardev/char-fe.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-net.h"
@@ -23,6 +24,7 @@
 typedef struct NetVhostUserState {
     NetClientState nc;
     CharBackend chr; /* only queue index 0 */
+    VhostUserState *vhost_user;
     VHostNetState *vhost_net;
     guint watch;
     uint64_t acked_features;
@@ -64,7 +66,8 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
     }
 }
 
-static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
+static int vhost_user_start(int queues, NetClientState *ncs[],
+                            VhostUserState *be)
 {
     VhostNetOptions options;
     struct vhost_net *net = NULL;
@@ -144,7 +147,7 @@ static ssize_t vhost_user_receive(NetClientState *nc, const uint8_t *buf,
     return size;
 }
 
-static void vhost_user_cleanup(NetClientState *nc)
+static void net_vhost_user_cleanup(NetClientState *nc)
 {
     NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
 
@@ -153,6 +156,11 @@ static void vhost_user_cleanup(NetClientState *nc)
         g_free(s->vhost_net);
         s->vhost_net = NULL;
     }
+    if (s->vhost_user) {
+        vhost_user_cleanup(s->vhost_user);
+        g_free(s->vhost_user);
+        s->vhost_user = NULL;
+    }
     if (nc->queue_index == 0) {
         if (s->watch) {
             g_source_remove(s->watch);
@@ -182,7 +190,7 @@ static NetClientInfo net_vhost_user_info = {
         .type = NET_CLIENT_DRIVER_VHOST_USER,
         .size = sizeof(NetVhostUserState),
         .receive = vhost_user_receive,
-        .cleanup = vhost_user_cleanup,
+        .cleanup = net_vhost_user_cleanup,
         .has_vnet_hdr = vhost_user_has_vnet_hdr,
         .has_ufo = vhost_user_has_ufo,
 };
@@ -244,7 +252,7 @@ static void net_vhost_user_event(void *opaque, int event)
     trace_vhost_user_event(chr->label, event);
     switch (event) {
     case CHR_EVENT_OPENED:
-        if (vhost_user_start(queues, ncs, &s->chr) < 0) {
+        if (vhost_user_start(queues, ncs, s->vhost_user) < 0) {
             qemu_chr_fe_disconnect(&s->chr);
             return;
         }
@@ -283,12 +291,19 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
 {
     Error *err = NULL;
     NetClientState *nc, *nc0 = NULL;
+    VhostUserState *user = NULL;
     NetVhostUserState *s;
     int i;
 
     assert(name);
     assert(queues > 0);
 
+    user = vhost_user_init();
+    if (!user) {
+        error_report("failed to init vhost_user");
+        goto err;
+    }
+
     for (i = 0; i < queues; i++) {
         nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
         snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to %s",
@@ -299,17 +314,19 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
             s = DO_UPCAST(NetVhostUserState, nc, nc);
             if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
                 error_report_err(err);
-                return -1;
+                goto err;
             }
+            user->chr = &s->chr;
         }
-
+        s = DO_UPCAST(NetVhostUserState, nc, nc);
+        s->vhost_user = user;
     }
 
     s = DO_UPCAST(NetVhostUserState, nc, nc0);
     do {
         if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
             error_report_err(err);
-            return -1;
+            goto err;
         }
         qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
                                  net_vhost_user_event, NULL, nc0->name, NULL,
@@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
     assert(s->vhost_net);
 
     return 0;
+
+err:
+    if (user) {
+        vhost_user_cleanup(user);
+        g_free(user);
+        s->vhost_user = NULL;
+    }
+
+    return -1;
 }
 
 static Chardev *net_vhost_claim_chardev(
-- 
2.11.0


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [Qemu-devel] [PATCH v3 3/6] vhost-user: support receiving file descriptors in slave_read
  2018-04-12 15:12 ` [virtio-dev] " Tiwei Bie
@ 2018-04-12 15:12   ` Tiwei Bie
  -1 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-04-12 15:12 UTC (permalink / raw)
  To: mst, jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/virtio/vhost-user.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 91edd95453..9cea2c8c51 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -854,14 +854,44 @@ static void slave_read(void *opaque)
     VhostUserHeader hdr = { 0, };
     VhostUserPayload payload = { 0, };
     int size, ret = 0;
+    struct iovec iov;
+    struct msghdr msgh;
+    int fd = -1;
+    char control[CMSG_SPACE(sizeof(fd))];
+    struct cmsghdr *cmsg;
+    size_t fdsize;
+
+    memset(&msgh, 0, sizeof(msgh));
+    msgh.msg_iov = &iov;
+    msgh.msg_iovlen = 1;
+    msgh.msg_control = control;
+    msgh.msg_controllen = sizeof(control);
 
     /* Read header */
-    size = read(u->slave_fd, &hdr, VHOST_USER_HDR_SIZE);
+    iov.iov_base = &hdr;
+    iov.iov_len = VHOST_USER_HDR_SIZE;
+
+    size = recvmsg(u->slave_fd, &msgh, 0);
     if (size != VHOST_USER_HDR_SIZE) {
         error_report("Failed to read from slave.");
         goto err;
     }
 
+    if (msgh.msg_flags & MSG_CTRUNC) {
+        error_report("Truncated message.");
+        goto err;
+    }
+
+    for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
+         cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
+            if (cmsg->cmsg_level == SOL_SOCKET &&
+                cmsg->cmsg_type == SCM_RIGHTS) {
+                    fdsize = cmsg->cmsg_len - CMSG_LEN(0);
+                    memcpy(&fd, CMSG_DATA(cmsg), fdsize);
+                    break;
+            }
+    }
+
     if (hdr.size > VHOST_USER_PAYLOAD_SIZE) {
         error_report("Failed to read msg header."
                 " Size %d exceeds the maximum %zu.", hdr.size,
@@ -885,9 +915,15 @@ static void slave_read(void *opaque)
         break;
     default:
         error_report("Received unexpected msg type.");
+        if (fd != -1) {
+            close(fd);
+        }
         ret = -EINVAL;
     }
 
+    /* Message handlers need to make sure that fd will be consumed. */
+    fd = -1;
+
     /*
      * REPLY_ACK feature handling. Other reply types has to be managed
      * directly in their request handlers.
@@ -920,6 +956,9 @@ err:
     qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
     close(u->slave_fd);
     u->slave_fd = -1;
+    if (fd != -1) {
+        close(fd);
+    }
     return;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [virtio-dev] [PATCH v3 3/6] vhost-user: support receiving file descriptors in slave_read
@ 2018-04-12 15:12   ` Tiwei Bie
  0 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-04-12 15:12 UTC (permalink / raw)
  To: mst, jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/virtio/vhost-user.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 91edd95453..9cea2c8c51 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -854,14 +854,44 @@ static void slave_read(void *opaque)
     VhostUserHeader hdr = { 0, };
     VhostUserPayload payload = { 0, };
     int size, ret = 0;
+    struct iovec iov;
+    struct msghdr msgh;
+    int fd = -1;
+    char control[CMSG_SPACE(sizeof(fd))];
+    struct cmsghdr *cmsg;
+    size_t fdsize;
+
+    memset(&msgh, 0, sizeof(msgh));
+    msgh.msg_iov = &iov;
+    msgh.msg_iovlen = 1;
+    msgh.msg_control = control;
+    msgh.msg_controllen = sizeof(control);
 
     /* Read header */
-    size = read(u->slave_fd, &hdr, VHOST_USER_HDR_SIZE);
+    iov.iov_base = &hdr;
+    iov.iov_len = VHOST_USER_HDR_SIZE;
+
+    size = recvmsg(u->slave_fd, &msgh, 0);
     if (size != VHOST_USER_HDR_SIZE) {
         error_report("Failed to read from slave.");
         goto err;
     }
 
+    if (msgh.msg_flags & MSG_CTRUNC) {
+        error_report("Truncated message.");
+        goto err;
+    }
+
+    for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
+         cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
+            if (cmsg->cmsg_level == SOL_SOCKET &&
+                cmsg->cmsg_type == SCM_RIGHTS) {
+                    fdsize = cmsg->cmsg_len - CMSG_LEN(0);
+                    memcpy(&fd, CMSG_DATA(cmsg), fdsize);
+                    break;
+            }
+    }
+
     if (hdr.size > VHOST_USER_PAYLOAD_SIZE) {
         error_report("Failed to read msg header."
                 " Size %d exceeds the maximum %zu.", hdr.size,
@@ -885,9 +915,15 @@ static void slave_read(void *opaque)
         break;
     default:
         error_report("Received unexpected msg type.");
+        if (fd != -1) {
+            close(fd);
+        }
         ret = -EINVAL;
     }
 
+    /* Message handlers need to make sure that fd will be consumed. */
+    fd = -1;
+
     /*
      * REPLY_ACK feature handling. Other reply types has to be managed
      * directly in their request handlers.
@@ -920,6 +956,9 @@ err:
     qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
     close(u->slave_fd);
     u->slave_fd = -1;
+    if (fd != -1) {
+        close(fd);
+    }
     return;
 }
 
-- 
2.11.0


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [Qemu-devel] [PATCH v3 4/6] virtio: support setting memory region based host notifier
  2018-04-12 15:12 ` [virtio-dev] " Tiwei Bie
@ 2018-04-12 15:12   ` Tiwei Bie
  -1 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-04-12 15:12 UTC (permalink / raw)
  To: mst, jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

This patch introduces the support for setting memory region
based host notifiers for virtio device. This is helpful when
using a hardware accelerator for a virtio device, because
hardware heavily depends on the notification, this will allow
the guest driver in the VM to notify the hardware directly.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/virtio/virtio-pci.c         | 22 ++++++++++++++++++++++
 hw/virtio/virtio.c             | 13 +++++++++++++
 include/hw/virtio/virtio-bus.h |  2 ++
 include/hw/virtio/virtio.h     |  2 ++
 4 files changed, 39 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 1e8ab7bbc5..5eb0c323ca 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1037,6 +1037,27 @@ assign_error:
     return r;
 }
 
+static int virtio_pci_set_host_notifier_mr(DeviceState *d, int n,
+                                           MemoryRegion *mr, bool assign)
+{
+    VirtIOPCIProxy *proxy = to_virtio_pci_proxy(d);
+    int offset;
+
+    if (n >= VIRTIO_QUEUE_MAX || !virtio_pci_modern(proxy) ||
+        virtio_pci_queue_mem_mult(proxy) != memory_region_size(mr)) {
+        return -1;
+    }
+
+    if (assign) {
+        offset = virtio_pci_queue_mem_mult(proxy) * n;
+        memory_region_add_subregion_overlap(&proxy->notify.mr, offset, mr, 1);
+    } else {
+        memory_region_del_subregion(&proxy->notify.mr, mr);
+    }
+
+    return 0;
+}
+
 static void virtio_pci_vmstate_change(DeviceState *d, bool running)
 {
     VirtIOPCIProxy *proxy = to_virtio_pci_proxy(d);
@@ -2652,6 +2673,7 @@ static void virtio_pci_bus_class_init(ObjectClass *klass, void *data)
     k->has_extra_state = virtio_pci_has_extra_state;
     k->query_guest_notifiers = virtio_pci_query_guest_notifiers;
     k->set_guest_notifiers = virtio_pci_set_guest_notifiers;
+    k->set_host_notifier_mr = virtio_pci_set_host_notifier_mr;
     k->vmstate_change = virtio_pci_vmstate_change;
     k->pre_plugged = virtio_pci_pre_plugged;
     k->device_plugged = virtio_pci_device_plugged;
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 006d3d1148..1debb0147b 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2454,6 +2454,19 @@ EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq)
     return &vq->host_notifier;
 }
 
+int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
+                                      MemoryRegion *mr, bool assign)
+{
+    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+
+    if (k->set_host_notifier_mr) {
+        return k->set_host_notifier_mr(qbus->parent, n, mr, assign);
+    }
+
+    return -1;
+}
+
 void virtio_device_set_child_bus_name(VirtIODevice *vdev, char *bus_name)
 {
     g_free(vdev->bus_name);
diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
index ced3d2d2b0..7fec9dc929 100644
--- a/include/hw/virtio/virtio-bus.h
+++ b/include/hw/virtio/virtio-bus.h
@@ -52,6 +52,8 @@ typedef struct VirtioBusClass {
     bool (*has_extra_state)(DeviceState *d);
     bool (*query_guest_notifiers)(DeviceState *d);
     int (*set_guest_notifiers)(DeviceState *d, int nvqs, bool assign);
+    int (*set_host_notifier_mr)(DeviceState *d, int n,
+                                MemoryRegion *mr, bool assign);
     void (*vmstate_change)(DeviceState *d, bool running);
     /*
      * Expose the features the transport layer supports before
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 098bdaaea3..9c1fa07d6d 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -239,6 +239,8 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, int align);
 void virtio_queue_notify(VirtIODevice *vdev, int n);
 uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
 void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector);
+int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
+                                      MemoryRegion *mr, bool assign);
 int virtio_set_status(VirtIODevice *vdev, uint8_t val);
 void virtio_reset(void *opaque);
 void virtio_update_irq(VirtIODevice *vdev);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [virtio-dev] [PATCH v3 4/6] virtio: support setting memory region based host notifier
@ 2018-04-12 15:12   ` Tiwei Bie
  0 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-04-12 15:12 UTC (permalink / raw)
  To: mst, jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

This patch introduces the support for setting memory region
based host notifiers for virtio device. This is helpful when
using a hardware accelerator for a virtio device, because
hardware heavily depends on the notification, this will allow
the guest driver in the VM to notify the hardware directly.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/virtio/virtio-pci.c         | 22 ++++++++++++++++++++++
 hw/virtio/virtio.c             | 13 +++++++++++++
 include/hw/virtio/virtio-bus.h |  2 ++
 include/hw/virtio/virtio.h     |  2 ++
 4 files changed, 39 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 1e8ab7bbc5..5eb0c323ca 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1037,6 +1037,27 @@ assign_error:
     return r;
 }
 
+static int virtio_pci_set_host_notifier_mr(DeviceState *d, int n,
+                                           MemoryRegion *mr, bool assign)
+{
+    VirtIOPCIProxy *proxy = to_virtio_pci_proxy(d);
+    int offset;
+
+    if (n >= VIRTIO_QUEUE_MAX || !virtio_pci_modern(proxy) ||
+        virtio_pci_queue_mem_mult(proxy) != memory_region_size(mr)) {
+        return -1;
+    }
+
+    if (assign) {
+        offset = virtio_pci_queue_mem_mult(proxy) * n;
+        memory_region_add_subregion_overlap(&proxy->notify.mr, offset, mr, 1);
+    } else {
+        memory_region_del_subregion(&proxy->notify.mr, mr);
+    }
+
+    return 0;
+}
+
 static void virtio_pci_vmstate_change(DeviceState *d, bool running)
 {
     VirtIOPCIProxy *proxy = to_virtio_pci_proxy(d);
@@ -2652,6 +2673,7 @@ static void virtio_pci_bus_class_init(ObjectClass *klass, void *data)
     k->has_extra_state = virtio_pci_has_extra_state;
     k->query_guest_notifiers = virtio_pci_query_guest_notifiers;
     k->set_guest_notifiers = virtio_pci_set_guest_notifiers;
+    k->set_host_notifier_mr = virtio_pci_set_host_notifier_mr;
     k->vmstate_change = virtio_pci_vmstate_change;
     k->pre_plugged = virtio_pci_pre_plugged;
     k->device_plugged = virtio_pci_device_plugged;
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 006d3d1148..1debb0147b 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2454,6 +2454,19 @@ EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq)
     return &vq->host_notifier;
 }
 
+int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
+                                      MemoryRegion *mr, bool assign)
+{
+    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+
+    if (k->set_host_notifier_mr) {
+        return k->set_host_notifier_mr(qbus->parent, n, mr, assign);
+    }
+
+    return -1;
+}
+
 void virtio_device_set_child_bus_name(VirtIODevice *vdev, char *bus_name)
 {
     g_free(vdev->bus_name);
diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
index ced3d2d2b0..7fec9dc929 100644
--- a/include/hw/virtio/virtio-bus.h
+++ b/include/hw/virtio/virtio-bus.h
@@ -52,6 +52,8 @@ typedef struct VirtioBusClass {
     bool (*has_extra_state)(DeviceState *d);
     bool (*query_guest_notifiers)(DeviceState *d);
     int (*set_guest_notifiers)(DeviceState *d, int nvqs, bool assign);
+    int (*set_host_notifier_mr)(DeviceState *d, int n,
+                                MemoryRegion *mr, bool assign);
     void (*vmstate_change)(DeviceState *d, bool running);
     /*
      * Expose the features the transport layer supports before
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 098bdaaea3..9c1fa07d6d 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -239,6 +239,8 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, int align);
 void virtio_queue_notify(VirtIODevice *vdev, int n);
 uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
 void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector);
+int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
+                                      MemoryRegion *mr, bool assign);
 int virtio_set_status(VirtIODevice *vdev, uint8_t val);
 void virtio_reset(void *opaque);
 void virtio_update_irq(VirtIODevice *vdev);
-- 
2.11.0


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [Qemu-devel] [PATCH v3 5/6] vhost: allow backends to filter memory sections
  2018-04-12 15:12 ` [virtio-dev] " Tiwei Bie
@ 2018-04-12 15:12   ` Tiwei Bie
  -1 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-04-12 15:12 UTC (permalink / raw)
  To: mst, jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

This patch introduces a vhost op for vhost backends to allow
them to filter the memory sections that they can handle.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/virtio/vhost-user.c            | 11 +++++++++++
 hw/virtio/vhost.c                 |  9 +++++++--
 include/hw/virtio/vhost-backend.h |  4 ++++
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 9cea2c8c51..791e0a4763 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1622,6 +1622,16 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
     return 0;
 }
 
+static bool vhost_user_mem_section_filter(struct vhost_dev *dev,
+                                          MemoryRegionSection *section)
+{
+    bool result;
+
+    result = memory_region_get_fd(section->mr) >= 0;
+
+    return result;
+}
+
 VhostUserState *vhost_user_init(void)
 {
     VhostUserState *user = g_new0(struct VhostUserState, 1);
@@ -1663,4 +1673,5 @@ const VhostOps user_ops = {
         .vhost_set_config = vhost_user_set_config,
         .vhost_crypto_create_session = vhost_user_crypto_create_session,
         .vhost_crypto_close_session = vhost_user_crypto_close_session,
+        .vhost_backend_mem_section_filter = vhost_user_mem_section_filter,
 };
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index f51bf573d5..a939a38d97 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -382,7 +382,7 @@ static int vhost_verify_ring_mappings(struct vhost_dev *dev,
     return r;
 }
 
-static bool vhost_section(MemoryRegionSection *section)
+static bool vhost_section(struct vhost_dev *dev, MemoryRegionSection *section)
 {
     bool result;
     bool log_dirty = memory_region_get_dirty_log_mask(section->mr) &
@@ -395,6 +395,11 @@ static bool vhost_section(MemoryRegionSection *section)
      */
     result &= !log_dirty;
 
+    if (result && dev->vhost_ops->vhost_backend_mem_section_filter) {
+        result &=
+            dev->vhost_ops->vhost_backend_mem_section_filter(dev, section);
+    }
+
     trace_vhost_section(section->mr->name, result);
     return result;
 }
@@ -628,7 +633,7 @@ static void vhost_region_addnop(MemoryListener *listener,
     struct vhost_dev *dev = container_of(listener, struct vhost_dev,
                                          memory_listener);
 
-    if (!vhost_section(section)) {
+    if (!vhost_section(dev, section)) {
         return;
     }
     vhost_region_add_section(dev, section);
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 5dac61f9ea..81283ec50f 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -101,6 +101,9 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev,
 typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
                                              uint64_t session_id);
 
+typedef bool (*vhost_backend_mem_section_filter_op)(struct vhost_dev *dev,
+                                                MemoryRegionSection *section);
+
 typedef struct VhostOps {
     VhostBackendType backend_type;
     vhost_backend_init vhost_backend_init;
@@ -138,6 +141,7 @@ typedef struct VhostOps {
     vhost_set_config_op vhost_set_config;
     vhost_crypto_create_session_op vhost_crypto_create_session;
     vhost_crypto_close_session_op vhost_crypto_close_session;
+    vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter;
 } VhostOps;
 
 extern const VhostOps user_ops;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [virtio-dev] [PATCH v3 5/6] vhost: allow backends to filter memory sections
@ 2018-04-12 15:12   ` Tiwei Bie
  0 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-04-12 15:12 UTC (permalink / raw)
  To: mst, jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

This patch introduces a vhost op for vhost backends to allow
them to filter the memory sections that they can handle.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/virtio/vhost-user.c            | 11 +++++++++++
 hw/virtio/vhost.c                 |  9 +++++++--
 include/hw/virtio/vhost-backend.h |  4 ++++
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 9cea2c8c51..791e0a4763 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1622,6 +1622,16 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
     return 0;
 }
 
+static bool vhost_user_mem_section_filter(struct vhost_dev *dev,
+                                          MemoryRegionSection *section)
+{
+    bool result;
+
+    result = memory_region_get_fd(section->mr) >= 0;
+
+    return result;
+}
+
 VhostUserState *vhost_user_init(void)
 {
     VhostUserState *user = g_new0(struct VhostUserState, 1);
@@ -1663,4 +1673,5 @@ const VhostOps user_ops = {
         .vhost_set_config = vhost_user_set_config,
         .vhost_crypto_create_session = vhost_user_crypto_create_session,
         .vhost_crypto_close_session = vhost_user_crypto_close_session,
+        .vhost_backend_mem_section_filter = vhost_user_mem_section_filter,
 };
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index f51bf573d5..a939a38d97 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -382,7 +382,7 @@ static int vhost_verify_ring_mappings(struct vhost_dev *dev,
     return r;
 }
 
-static bool vhost_section(MemoryRegionSection *section)
+static bool vhost_section(struct vhost_dev *dev, MemoryRegionSection *section)
 {
     bool result;
     bool log_dirty = memory_region_get_dirty_log_mask(section->mr) &
@@ -395,6 +395,11 @@ static bool vhost_section(MemoryRegionSection *section)
      */
     result &= !log_dirty;
 
+    if (result && dev->vhost_ops->vhost_backend_mem_section_filter) {
+        result &=
+            dev->vhost_ops->vhost_backend_mem_section_filter(dev, section);
+    }
+
     trace_vhost_section(section->mr->name, result);
     return result;
 }
@@ -628,7 +633,7 @@ static void vhost_region_addnop(MemoryListener *listener,
     struct vhost_dev *dev = container_of(listener, struct vhost_dev,
                                          memory_listener);
 
-    if (!vhost_section(section)) {
+    if (!vhost_section(dev, section)) {
         return;
     }
     vhost_region_add_section(dev, section);
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 5dac61f9ea..81283ec50f 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -101,6 +101,9 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev,
 typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
                                              uint64_t session_id);
 
+typedef bool (*vhost_backend_mem_section_filter_op)(struct vhost_dev *dev,
+                                                MemoryRegionSection *section);
+
 typedef struct VhostOps {
     VhostBackendType backend_type;
     vhost_backend_init vhost_backend_init;
@@ -138,6 +141,7 @@ typedef struct VhostOps {
     vhost_set_config_op vhost_set_config;
     vhost_crypto_create_session_op vhost_crypto_create_session;
     vhost_crypto_close_session_op vhost_crypto_close_session;
+    vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter;
 } VhostOps;
 
 extern const VhostOps user_ops;
-- 
2.11.0


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-12 15:12 ` [virtio-dev] " Tiwei Bie
@ 2018-04-12 15:12   ` Tiwei Bie
  -1 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-04-12 15:12 UTC (permalink / raw)
  To: mst, jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
With this feature negotiated, vhost-user backend can register
memory region based host notifiers. And it will allow the guest
driver in the VM to notify the hardware accelerator at the
vhost-user backend directly.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 docs/interop/vhost-user.txt    |  33 +++++++++++
 hw/virtio/vhost-user.c         | 123 +++++++++++++++++++++++++++++++++++++++++
 include/hw/virtio/vhost-user.h |   8 +++
 3 files changed, 164 insertions(+)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index 534caab18a..9e57b36b20 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -132,6 +132,16 @@ Depending on the request type, payload can be:
    Payload: Size bytes array holding the contents of the virtio
        device's configuration space
 
+ * Vring area description
+   -----------------------
+   | u64 | size | offset |
+   -----------------------
+
+   u64: a 64-bit integer contains vring index and flags
+   Size: a 64-bit size of this area
+   Offset: a 64-bit offset of this area from the start of the
+       supplied file descriptor
+
 In QEMU the vhost-user message is implemented with the following struct:
 
 typedef struct VhostUserMsg {
@@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
         VhostUserLog log;
         struct vhost_iotlb_msg iotlb;
         VhostUserConfig config;
+        VhostUserVringArea area;
     };
 } QEMU_PACKED VhostUserMsg;
 
@@ -380,6 +391,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
 #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
 #define VHOST_USER_PROTOCOL_F_CONFIG         9
+#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
 
 Master message types
 --------------------
@@ -777,6 +789,27 @@ Slave message types
      the VHOST_USER_NEED_REPLY flag, master must respond with zero when
      operation is successfully completed, or non-zero otherwise.
 
+ * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
+
+      Id: 3
+      Equivalent ioctl: N/A
+      Slave payload: vring area description
+      Master payload: N/A
+
+      Sets host notifier for a specified queue. The queue index is contained
+      in the u64 field of the vring area description. The host notifier is
+      described by the file descriptor (typically it's a VFIO device fd) which
+      is passed as ancillary data and the size (which is mmap size and should
+      be the same as host page size) and offset (which is mmap offset) carried
+      in the vring area description. QEMU can mmap the file descriptor based
+      on the size and offset to get a memory range. Registering a host notifier
+      means mapping this memory range to the VM as the specified queue's notify
+      MMIO region. Slave sends this request to tell QEMU to de-register the
+      existing notifier if any and register the new notifier if the request is
+      sent with a file descriptor.
+      This request should be sent only when VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
+      protocol feature has been successfully negotiated.
+
 VHOST_USER_PROTOCOL_F_REPLY_ACK:
 -------------------------------
 The original vhost-user specification only demands replies for certain
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 791e0a4763..1cd9c7276b 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -13,6 +13,7 @@
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user.h"
 #include "hw/virtio/vhost-backend.h"
+#include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-net.h"
 #include "chardev/char-fe.h"
 #include "sysemu/kvm.h"
@@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
     VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
     VHOST_USER_PROTOCOL_F_CONFIG = 9,
+    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
     VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_NONE = 0,
     VHOST_USER_SLAVE_IOTLB_MSG = 1,
     VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
+    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__ ((unused));
                                    + sizeof(c.size) \
                                    + sizeof(c.flags))
 
+typedef struct VhostUserVringArea {
+    uint64_t u64;
+    uint64_t size;
+    uint64_t offset;
+} VhostUserVringArea;
+
 typedef struct {
     VhostUserRequest request;
 
@@ -157,6 +166,7 @@ typedef union {
         struct vhost_iotlb_msg iotlb;
         VhostUserConfig config;
         VhostUserCryptoSession session;
+        VhostUserVringArea area;
 } VhostUserPayload;
 
 typedef struct VhostUserMsg {
@@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
     return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
 }
 
+static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
+                                             int queue_idx)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
+    VirtIODevice *vdev = dev->vdev;
+
+    if (n->addr && !n->set) {
+        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
+        n->set = true;
+    }
+}
+
+static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
+                                            int queue_idx)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
+    VirtIODevice *vdev = dev->vdev;
+
+    if (n->addr && n->set) {
+        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
+        n->set = false;
+    }
+}
+
 static int vhost_user_set_vring_base(struct vhost_dev *dev,
                                      struct vhost_vring_state *ring)
 {
+    vhost_user_host_notifier_restore(dev, ring->index);
+
     return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
 }
 
@@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
         .hdr.size = sizeof(msg.payload.state),
     };
 
+    vhost_user_host_notifier_remove(dev, ring->index);
+
     if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
         return -1;
     }
@@ -847,6 +887,76 @@ static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
     return ret;
 }
 
+static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
+                                                       VhostUserVringArea *area,
+                                                       int fd)
+{
+    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
+    size_t page_size = qemu_real_host_page_size;
+    struct vhost_user *u = dev->opaque;
+    VhostUserState *user = u->user;
+    VirtIODevice *vdev = dev->vdev;
+    VhostUserHostNotifier *n;
+    int ret = 0;
+    void *addr;
+    char *name;
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
+        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
+        ret = -1;
+        goto out;
+    }
+
+    n = &user->notifier[queue_idx];
+
+    if (n->addr) {
+        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
+        object_unparent(OBJECT(&n->mr));
+        munmap(n->addr, page_size);
+        n->addr = NULL;
+    }
+
+    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
+        goto out;
+    }
+
+    /* Sanity check. */
+    if (area->size != page_size) {
+        ret = -1;
+        goto out;
+    }
+
+    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
+                fd, area->offset);
+    if (addr == MAP_FAILED) {
+        ret = -1;
+        goto out;
+    }
+
+    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
+                           user, queue_idx);
+    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
+                                      page_size, addr);
+    g_free(name);
+
+    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
+        munmap(addr, page_size);
+        ret = -1;
+        goto out;
+    }
+
+    n->addr = addr;
+    n->set = true;
+
+out:
+    /* Always close the fd. */
+    if (fd != -1) {
+        close(fd);
+    }
+    return ret;
+}
+
 static void slave_read(void *opaque)
 {
     struct vhost_dev *dev = opaque;
@@ -913,6 +1023,10 @@ static void slave_read(void *opaque)
     case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
         ret = vhost_user_slave_handle_config_change(dev);
         break;
+    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
+        ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
+                                                          fd);
+        break;
     default:
         error_report("Received unexpected msg type.");
         if (fd != -1) {
@@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
 
 void vhost_user_cleanup(VhostUserState *user)
 {
+    int i;
+
+    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
+        if (user->notifier[i].addr) {
+            object_unparent(OBJECT(&user->notifier[i].mr));
+            munmap(user->notifier[i].addr, qemu_real_host_page_size);
+            user->notifier[i].addr = NULL;
+        }
+    }
 }
 
 const VhostOps user_ops = {
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index eb8bc0d90d..fd660393a0 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -9,9 +9,17 @@
 #define HW_VIRTIO_VHOST_USER_H
 
 #include "chardev/char-fe.h"
+#include "hw/virtio/virtio.h"
+
+typedef struct VhostUserHostNotifier {
+    MemoryRegion mr;
+    void *addr;
+    bool set;
+} VhostUserHostNotifier;
 
 typedef struct VhostUserState {
     CharBackend *chr;
+    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostUserState;
 
 VhostUserState *vhost_user_init(void);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [virtio-dev] [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-12 15:12   ` Tiwei Bie
  0 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-04-12 15:12 UTC (permalink / raw)
  To: mst, jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev
  Cc: cunming.liang, dan.daly, jianfeng.tan, zhihong.wang, xiao.w.wang,
	tiwei.bie

This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
With this feature negotiated, vhost-user backend can register
memory region based host notifiers. And it will allow the guest
driver in the VM to notify the hardware accelerator at the
vhost-user backend directly.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 docs/interop/vhost-user.txt    |  33 +++++++++++
 hw/virtio/vhost-user.c         | 123 +++++++++++++++++++++++++++++++++++++++++
 include/hw/virtio/vhost-user.h |   8 +++
 3 files changed, 164 insertions(+)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index 534caab18a..9e57b36b20 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -132,6 +132,16 @@ Depending on the request type, payload can be:
    Payload: Size bytes array holding the contents of the virtio
        device's configuration space
 
+ * Vring area description
+   -----------------------
+   | u64 | size | offset |
+   -----------------------
+
+   u64: a 64-bit integer contains vring index and flags
+   Size: a 64-bit size of this area
+   Offset: a 64-bit offset of this area from the start of the
+       supplied file descriptor
+
 In QEMU the vhost-user message is implemented with the following struct:
 
 typedef struct VhostUserMsg {
@@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
         VhostUserLog log;
         struct vhost_iotlb_msg iotlb;
         VhostUserConfig config;
+        VhostUserVringArea area;
     };
 } QEMU_PACKED VhostUserMsg;
 
@@ -380,6 +391,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
 #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
 #define VHOST_USER_PROTOCOL_F_CONFIG         9
+#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
 
 Master message types
 --------------------
@@ -777,6 +789,27 @@ Slave message types
      the VHOST_USER_NEED_REPLY flag, master must respond with zero when
      operation is successfully completed, or non-zero otherwise.
 
+ * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
+
+      Id: 3
+      Equivalent ioctl: N/A
+      Slave payload: vring area description
+      Master payload: N/A
+
+      Sets host notifier for a specified queue. The queue index is contained
+      in the u64 field of the vring area description. The host notifier is
+      described by the file descriptor (typically it's a VFIO device fd) which
+      is passed as ancillary data and the size (which is mmap size and should
+      be the same as host page size) and offset (which is mmap offset) carried
+      in the vring area description. QEMU can mmap the file descriptor based
+      on the size and offset to get a memory range. Registering a host notifier
+      means mapping this memory range to the VM as the specified queue's notify
+      MMIO region. Slave sends this request to tell QEMU to de-register the
+      existing notifier if any and register the new notifier if the request is
+      sent with a file descriptor.
+      This request should be sent only when VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
+      protocol feature has been successfully negotiated.
+
 VHOST_USER_PROTOCOL_F_REPLY_ACK:
 -------------------------------
 The original vhost-user specification only demands replies for certain
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 791e0a4763..1cd9c7276b 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -13,6 +13,7 @@
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user.h"
 #include "hw/virtio/vhost-backend.h"
+#include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-net.h"
 #include "chardev/char-fe.h"
 #include "sysemu/kvm.h"
@@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
     VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
     VHOST_USER_PROTOCOL_F_CONFIG = 9,
+    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
     VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_NONE = 0,
     VHOST_USER_SLAVE_IOTLB_MSG = 1,
     VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
+    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__ ((unused));
                                    + sizeof(c.size) \
                                    + sizeof(c.flags))
 
+typedef struct VhostUserVringArea {
+    uint64_t u64;
+    uint64_t size;
+    uint64_t offset;
+} VhostUserVringArea;
+
 typedef struct {
     VhostUserRequest request;
 
@@ -157,6 +166,7 @@ typedef union {
         struct vhost_iotlb_msg iotlb;
         VhostUserConfig config;
         VhostUserCryptoSession session;
+        VhostUserVringArea area;
 } VhostUserPayload;
 
 typedef struct VhostUserMsg {
@@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
     return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
 }
 
+static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
+                                             int queue_idx)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
+    VirtIODevice *vdev = dev->vdev;
+
+    if (n->addr && !n->set) {
+        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
+        n->set = true;
+    }
+}
+
+static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
+                                            int queue_idx)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
+    VirtIODevice *vdev = dev->vdev;
+
+    if (n->addr && n->set) {
+        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
+        n->set = false;
+    }
+}
+
 static int vhost_user_set_vring_base(struct vhost_dev *dev,
                                      struct vhost_vring_state *ring)
 {
+    vhost_user_host_notifier_restore(dev, ring->index);
+
     return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
 }
 
@@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
         .hdr.size = sizeof(msg.payload.state),
     };
 
+    vhost_user_host_notifier_remove(dev, ring->index);
+
     if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
         return -1;
     }
@@ -847,6 +887,76 @@ static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
     return ret;
 }
 
+static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
+                                                       VhostUserVringArea *area,
+                                                       int fd)
+{
+    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
+    size_t page_size = qemu_real_host_page_size;
+    struct vhost_user *u = dev->opaque;
+    VhostUserState *user = u->user;
+    VirtIODevice *vdev = dev->vdev;
+    VhostUserHostNotifier *n;
+    int ret = 0;
+    void *addr;
+    char *name;
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
+        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
+        ret = -1;
+        goto out;
+    }
+
+    n = &user->notifier[queue_idx];
+
+    if (n->addr) {
+        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
+        object_unparent(OBJECT(&n->mr));
+        munmap(n->addr, page_size);
+        n->addr = NULL;
+    }
+
+    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
+        goto out;
+    }
+
+    /* Sanity check. */
+    if (area->size != page_size) {
+        ret = -1;
+        goto out;
+    }
+
+    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
+                fd, area->offset);
+    if (addr == MAP_FAILED) {
+        ret = -1;
+        goto out;
+    }
+
+    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
+                           user, queue_idx);
+    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
+                                      page_size, addr);
+    g_free(name);
+
+    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
+        munmap(addr, page_size);
+        ret = -1;
+        goto out;
+    }
+
+    n->addr = addr;
+    n->set = true;
+
+out:
+    /* Always close the fd. */
+    if (fd != -1) {
+        close(fd);
+    }
+    return ret;
+}
+
 static void slave_read(void *opaque)
 {
     struct vhost_dev *dev = opaque;
@@ -913,6 +1023,10 @@ static void slave_read(void *opaque)
     case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
         ret = vhost_user_slave_handle_config_change(dev);
         break;
+    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
+        ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
+                                                          fd);
+        break;
     default:
         error_report("Received unexpected msg type.");
         if (fd != -1) {
@@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
 
 void vhost_user_cleanup(VhostUserState *user)
 {
+    int i;
+
+    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
+        if (user->notifier[i].addr) {
+            object_unparent(OBJECT(&user->notifier[i].mr));
+            munmap(user->notifier[i].addr, qemu_real_host_page_size);
+            user->notifier[i].addr = NULL;
+        }
+    }
 }
 
 const VhostOps user_ops = {
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index eb8bc0d90d..fd660393a0 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -9,9 +9,17 @@
 #define HW_VIRTIO_VHOST_USER_H
 
 #include "chardev/char-fe.h"
+#include "hw/virtio/virtio.h"
+
+typedef struct VhostUserHostNotifier {
+    MemoryRegion mr;
+    void *addr;
+    bool set;
+} VhostUserHostNotifier;
 
 typedef struct VhostUserState {
     CharBackend *chr;
+    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostUserState;
 
 VhostUserState *vhost_user_init(void);
-- 
2.11.0


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-12 15:12   ` [virtio-dev] " Tiwei Bie
@ 2018-04-18 16:34     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-18 16:34 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Apr 12, 2018 at 11:12:32PM +0800, Tiwei Bie wrote:
> This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
> With this feature negotiated, vhost-user backend can register
> memory region based host notifiers. And it will allow the guest
> driver in the VM to notify the hardware accelerator at the
> vhost-user backend directly.
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>

Overall I think we can merge this approach, but I have two main concerns
about this:

1. Testing. Most people do not have the virtio hardware
   so how to make sure this does not bit rot?

   I have an idea: add an option like this to libvhost-user.
   Naturally libvhost-user can not get notified about a write
   to an mmapped area, but it can write a special value there
   (e.g. all-ones?) and then poll it to detect VQ # writes.

   Then include a vhost user bridge test with an option like this.

   I'd like to see a patch doing this.

2. Memory barriers. Right now after updating the avail idx,
   virtio does smp_wmb() and then the MMIO write.
   Normal hardware drivers do wmb() which is an sfence.
   Can a PCI device read bypass index write and see a stale
   index value?
   To make virtio pci do wmb() we would need a new feature bit.
   Alternatively I guess we could maybe look at subsystem vendor/device id.

   I'd like to see a patch doing one of these things.

Thanks!

> ---
>  docs/interop/vhost-user.txt    |  33 +++++++++++
>  hw/virtio/vhost-user.c         | 123 +++++++++++++++++++++++++++++++++++++++++
>  include/hw/virtio/vhost-user.h |   8 +++
>  3 files changed, 164 insertions(+)
> 
> diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> index 534caab18a..9e57b36b20 100644
> --- a/docs/interop/vhost-user.txt
> +++ b/docs/interop/vhost-user.txt
> @@ -132,6 +132,16 @@ Depending on the request type, payload can be:
>     Payload: Size bytes array holding the contents of the virtio
>         device's configuration space
>  
> + * Vring area description
> +   -----------------------
> +   | u64 | size | offset |
> +   -----------------------
> +
> +   u64: a 64-bit integer contains vring index and flags
> +   Size: a 64-bit size of this area
> +   Offset: a 64-bit offset of this area from the start of the
> +       supplied file descriptor
> +
>  In QEMU the vhost-user message is implemented with the following struct:
>  
>  typedef struct VhostUserMsg {
> @@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
>          VhostUserLog log;
>          struct vhost_iotlb_msg iotlb;
>          VhostUserConfig config;
> +        VhostUserVringArea area;
>      };
>  } QEMU_PACKED VhostUserMsg;
>  
> @@ -380,6 +391,7 @@ Protocol features
>  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
>  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
>  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> +#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
>  
>  Master message types
>  --------------------
> @@ -777,6 +789,27 @@ Slave message types
>       the VHOST_USER_NEED_REPLY flag, master must respond with zero when
>       operation is successfully completed, or non-zero otherwise.
>  
> + * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
> +
> +      Id: 3
> +      Equivalent ioctl: N/A
> +      Slave payload: vring area description
> +      Master payload: N/A
> +
> +      Sets host notifier for a specified queue. The queue index is contained
> +      in the u64 field of the vring area description. The host notifier is
> +      described by the file descriptor (typically it's a VFIO device fd) which
> +      is passed as ancillary data and the size (which is mmap size and should
> +      be the same as host page size) and offset (which is mmap offset) carried
> +      in the vring area description. QEMU can mmap the file descriptor based
> +      on the size and offset to get a memory range. Registering a host notifier
> +      means mapping this memory range to the VM as the specified queue's notify
> +      MMIO region. Slave sends this request to tell QEMU to de-register the
> +      existing notifier if any and register the new notifier if the request is
> +      sent with a file descriptor.
> +      This request should be sent only when VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
> +      protocol feature has been successfully negotiated.
> +
>  VHOST_USER_PROTOCOL_F_REPLY_ACK:
>  -------------------------------
>  The original vhost-user specification only demands replies for certain
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 791e0a4763..1cd9c7276b 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -13,6 +13,7 @@
>  #include "hw/virtio/vhost.h"
>  #include "hw/virtio/vhost-user.h"
>  #include "hw/virtio/vhost-backend.h"
> +#include "hw/virtio/virtio.h"
>  #include "hw/virtio/virtio-net.h"
>  #include "chardev/char-fe.h"
>  #include "sysemu/kvm.h"
> @@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
>      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
>      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
>      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> +    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
>      VHOST_USER_PROTOCOL_F_MAX
>  };
>  
> @@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
>      VHOST_USER_SLAVE_NONE = 0,
>      VHOST_USER_SLAVE_IOTLB_MSG = 1,
>      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> +    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
>      VHOST_USER_SLAVE_MAX
>  }  VhostUserSlaveRequest;
>  
> @@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__ ((unused));
>                                     + sizeof(c.size) \
>                                     + sizeof(c.flags))
>  
> +typedef struct VhostUserVringArea {
> +    uint64_t u64;
> +    uint64_t size;
> +    uint64_t offset;
> +} VhostUserVringArea;
> +
>  typedef struct {
>      VhostUserRequest request;
>  
> @@ -157,6 +166,7 @@ typedef union {
>          struct vhost_iotlb_msg iotlb;
>          VhostUserConfig config;
>          VhostUserCryptoSession session;
> +        VhostUserVringArea area;
>  } VhostUserPayload;
>  
>  typedef struct VhostUserMsg {
> @@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
>      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
>  }
>  
> +static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
> +                                             int queue_idx)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> +    VirtIODevice *vdev = dev->vdev;
> +
> +    if (n->addr && !n->set) {
> +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
> +        n->set = true;
> +    }
> +}
> +
> +static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
> +                                            int queue_idx)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> +    VirtIODevice *vdev = dev->vdev;
> +
> +    if (n->addr && n->set) {
> +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> +        n->set = false;
> +    }
> +}
> +
>  static int vhost_user_set_vring_base(struct vhost_dev *dev,
>                                       struct vhost_vring_state *ring)
>  {
> +    vhost_user_host_notifier_restore(dev, ring->index);
> +
>      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
>  }
>  
> @@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
>          .hdr.size = sizeof(msg.payload.state),
>      };
>  
> +    vhost_user_host_notifier_remove(dev, ring->index);
> +
>      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
>          return -1;
>      }
> @@ -847,6 +887,76 @@ static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
>      return ret;
>  }
>  
> +static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
> +                                                       VhostUserVringArea *area,
> +                                                       int fd)
> +{
> +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> +    size_t page_size = qemu_real_host_page_size;
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserState *user = u->user;
> +    VirtIODevice *vdev = dev->vdev;
> +    VhostUserHostNotifier *n;
> +    int ret = 0;
> +    void *addr;
> +    char *name;
> +
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
> +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    n = &user->notifier[queue_idx];
> +
> +    if (n->addr) {
> +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> +        object_unparent(OBJECT(&n->mr));
> +        munmap(n->addr, page_size);
> +        n->addr = NULL;
> +    }
> +
> +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> +        goto out;
> +    }
> +
> +    /* Sanity check. */
> +    if (area->size != page_size) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
> +                fd, area->offset);
> +    if (addr == MAP_FAILED) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
> +                           user, queue_idx);
> +    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
> +                                      page_size, addr);
> +    g_free(name);
> +
> +    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
> +        munmap(addr, page_size);
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    n->addr = addr;
> +    n->set = true;
> +
> +out:
> +    /* Always close the fd. */
> +    if (fd != -1) {
> +        close(fd);
> +    }
> +    return ret;
> +}
> +
>  static void slave_read(void *opaque)
>  {
>      struct vhost_dev *dev = opaque;
> @@ -913,6 +1023,10 @@ static void slave_read(void *opaque)
>      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
>          ret = vhost_user_slave_handle_config_change(dev);
>          break;
> +    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
> +        ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
> +                                                          fd);
> +        break;
>      default:
>          error_report("Received unexpected msg type.");
>          if (fd != -1) {
> @@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
>  
>  void vhost_user_cleanup(VhostUserState *user)
>  {
> +    int i;
> +
> +    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> +        if (user->notifier[i].addr) {
> +            object_unparent(OBJECT(&user->notifier[i].mr));
> +            munmap(user->notifier[i].addr, qemu_real_host_page_size);
> +            user->notifier[i].addr = NULL;
> +        }
> +    }
>  }
>  
>  const VhostOps user_ops = {
> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> index eb8bc0d90d..fd660393a0 100644
> --- a/include/hw/virtio/vhost-user.h
> +++ b/include/hw/virtio/vhost-user.h
> @@ -9,9 +9,17 @@
>  #define HW_VIRTIO_VHOST_USER_H
>  
>  #include "chardev/char-fe.h"
> +#include "hw/virtio/virtio.h"
> +
> +typedef struct VhostUserHostNotifier {
> +    MemoryRegion mr;
> +    void *addr;
> +    bool set;
> +} VhostUserHostNotifier;
>  
>  typedef struct VhostUserState {
>      CharBackend *chr;
> +    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
>  } VhostUserState;
>  
>  VhostUserState *vhost_user_init(void);
> -- 
> 2.11.0

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-18 16:34     ` Michael S. Tsirkin
  0 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-18 16:34 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Apr 12, 2018 at 11:12:32PM +0800, Tiwei Bie wrote:
> This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
> With this feature negotiated, vhost-user backend can register
> memory region based host notifiers. And it will allow the guest
> driver in the VM to notify the hardware accelerator at the
> vhost-user backend directly.
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>

Overall I think we can merge this approach, but I have two main concerns
about this:

1. Testing. Most people do not have the virtio hardware
   so how to make sure this does not bit rot?

   I have an idea: add an option like this to libvhost-user.
   Naturally libvhost-user can not get notified about a write
   to an mmapped area, but it can write a special value there
   (e.g. all-ones?) and then poll it to detect VQ # writes.

   Then include a vhost user bridge test with an option like this.

   I'd like to see a patch doing this.

2. Memory barriers. Right now after updating the avail idx,
   virtio does smp_wmb() and then the MMIO write.
   Normal hardware drivers do wmb() which is an sfence.
   Can a PCI device read bypass index write and see a stale
   index value?
   To make virtio pci do wmb() we would need a new feature bit.
   Alternatively I guess we could maybe look at subsystem vendor/device id.

   I'd like to see a patch doing one of these things.

Thanks!

> ---
>  docs/interop/vhost-user.txt    |  33 +++++++++++
>  hw/virtio/vhost-user.c         | 123 +++++++++++++++++++++++++++++++++++++++++
>  include/hw/virtio/vhost-user.h |   8 +++
>  3 files changed, 164 insertions(+)
> 
> diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> index 534caab18a..9e57b36b20 100644
> --- a/docs/interop/vhost-user.txt
> +++ b/docs/interop/vhost-user.txt
> @@ -132,6 +132,16 @@ Depending on the request type, payload can be:
>     Payload: Size bytes array holding the contents of the virtio
>         device's configuration space
>  
> + * Vring area description
> +   -----------------------
> +   | u64 | size | offset |
> +   -----------------------
> +
> +   u64: a 64-bit integer contains vring index and flags
> +   Size: a 64-bit size of this area
> +   Offset: a 64-bit offset of this area from the start of the
> +       supplied file descriptor
> +
>  In QEMU the vhost-user message is implemented with the following struct:
>  
>  typedef struct VhostUserMsg {
> @@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
>          VhostUserLog log;
>          struct vhost_iotlb_msg iotlb;
>          VhostUserConfig config;
> +        VhostUserVringArea area;
>      };
>  } QEMU_PACKED VhostUserMsg;
>  
> @@ -380,6 +391,7 @@ Protocol features
>  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
>  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
>  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> +#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
>  
>  Master message types
>  --------------------
> @@ -777,6 +789,27 @@ Slave message types
>       the VHOST_USER_NEED_REPLY flag, master must respond with zero when
>       operation is successfully completed, or non-zero otherwise.
>  
> + * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
> +
> +      Id: 3
> +      Equivalent ioctl: N/A
> +      Slave payload: vring area description
> +      Master payload: N/A
> +
> +      Sets host notifier for a specified queue. The queue index is contained
> +      in the u64 field of the vring area description. The host notifier is
> +      described by the file descriptor (typically it's a VFIO device fd) which
> +      is passed as ancillary data and the size (which is mmap size and should
> +      be the same as host page size) and offset (which is mmap offset) carried
> +      in the vring area description. QEMU can mmap the file descriptor based
> +      on the size and offset to get a memory range. Registering a host notifier
> +      means mapping this memory range to the VM as the specified queue's notify
> +      MMIO region. Slave sends this request to tell QEMU to de-register the
> +      existing notifier if any and register the new notifier if the request is
> +      sent with a file descriptor.
> +      This request should be sent only when VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
> +      protocol feature has been successfully negotiated.
> +
>  VHOST_USER_PROTOCOL_F_REPLY_ACK:
>  -------------------------------
>  The original vhost-user specification only demands replies for certain
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 791e0a4763..1cd9c7276b 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -13,6 +13,7 @@
>  #include "hw/virtio/vhost.h"
>  #include "hw/virtio/vhost-user.h"
>  #include "hw/virtio/vhost-backend.h"
> +#include "hw/virtio/virtio.h"
>  #include "hw/virtio/virtio-net.h"
>  #include "chardev/char-fe.h"
>  #include "sysemu/kvm.h"
> @@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
>      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
>      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
>      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> +    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
>      VHOST_USER_PROTOCOL_F_MAX
>  };
>  
> @@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
>      VHOST_USER_SLAVE_NONE = 0,
>      VHOST_USER_SLAVE_IOTLB_MSG = 1,
>      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> +    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
>      VHOST_USER_SLAVE_MAX
>  }  VhostUserSlaveRequest;
>  
> @@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__ ((unused));
>                                     + sizeof(c.size) \
>                                     + sizeof(c.flags))
>  
> +typedef struct VhostUserVringArea {
> +    uint64_t u64;
> +    uint64_t size;
> +    uint64_t offset;
> +} VhostUserVringArea;
> +
>  typedef struct {
>      VhostUserRequest request;
>  
> @@ -157,6 +166,7 @@ typedef union {
>          struct vhost_iotlb_msg iotlb;
>          VhostUserConfig config;
>          VhostUserCryptoSession session;
> +        VhostUserVringArea area;
>  } VhostUserPayload;
>  
>  typedef struct VhostUserMsg {
> @@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
>      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
>  }
>  
> +static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
> +                                             int queue_idx)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> +    VirtIODevice *vdev = dev->vdev;
> +
> +    if (n->addr && !n->set) {
> +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
> +        n->set = true;
> +    }
> +}
> +
> +static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
> +                                            int queue_idx)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> +    VirtIODevice *vdev = dev->vdev;
> +
> +    if (n->addr && n->set) {
> +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> +        n->set = false;
> +    }
> +}
> +
>  static int vhost_user_set_vring_base(struct vhost_dev *dev,
>                                       struct vhost_vring_state *ring)
>  {
> +    vhost_user_host_notifier_restore(dev, ring->index);
> +
>      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
>  }
>  
> @@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
>          .hdr.size = sizeof(msg.payload.state),
>      };
>  
> +    vhost_user_host_notifier_remove(dev, ring->index);
> +
>      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
>          return -1;
>      }
> @@ -847,6 +887,76 @@ static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
>      return ret;
>  }
>  
> +static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
> +                                                       VhostUserVringArea *area,
> +                                                       int fd)
> +{
> +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> +    size_t page_size = qemu_real_host_page_size;
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserState *user = u->user;
> +    VirtIODevice *vdev = dev->vdev;
> +    VhostUserHostNotifier *n;
> +    int ret = 0;
> +    void *addr;
> +    char *name;
> +
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
> +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    n = &user->notifier[queue_idx];
> +
> +    if (n->addr) {
> +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> +        object_unparent(OBJECT(&n->mr));
> +        munmap(n->addr, page_size);
> +        n->addr = NULL;
> +    }
> +
> +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> +        goto out;
> +    }
> +
> +    /* Sanity check. */
> +    if (area->size != page_size) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
> +                fd, area->offset);
> +    if (addr == MAP_FAILED) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
> +                           user, queue_idx);
> +    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
> +                                      page_size, addr);
> +    g_free(name);
> +
> +    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
> +        munmap(addr, page_size);
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    n->addr = addr;
> +    n->set = true;
> +
> +out:
> +    /* Always close the fd. */
> +    if (fd != -1) {
> +        close(fd);
> +    }
> +    return ret;
> +}
> +
>  static void slave_read(void *opaque)
>  {
>      struct vhost_dev *dev = opaque;
> @@ -913,6 +1023,10 @@ static void slave_read(void *opaque)
>      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
>          ret = vhost_user_slave_handle_config_change(dev);
>          break;
> +    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
> +        ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
> +                                                          fd);
> +        break;
>      default:
>          error_report("Received unexpected msg type.");
>          if (fd != -1) {
> @@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
>  
>  void vhost_user_cleanup(VhostUserState *user)
>  {
> +    int i;
> +
> +    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> +        if (user->notifier[i].addr) {
> +            object_unparent(OBJECT(&user->notifier[i].mr));
> +            munmap(user->notifier[i].addr, qemu_real_host_page_size);
> +            user->notifier[i].addr = NULL;
> +        }
> +    }
>  }
>  
>  const VhostOps user_ops = {
> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> index eb8bc0d90d..fd660393a0 100644
> --- a/include/hw/virtio/vhost-user.h
> +++ b/include/hw/virtio/vhost-user.h
> @@ -9,9 +9,17 @@
>  #define HW_VIRTIO_VHOST_USER_H
>  
>  #include "chardev/char-fe.h"
> +#include "hw/virtio/virtio.h"
> +
> +typedef struct VhostUserHostNotifier {
> +    MemoryRegion mr;
> +    void *addr;
> +    bool set;
> +} VhostUserHostNotifier;
>  
>  typedef struct VhostUserState {
>      CharBackend *chr;
> +    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
>  } VhostUserState;
>  
>  VhostUserState *vhost_user_init(void);
> -- 
> 2.11.0

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-18 16:34     ` [virtio-dev] " Michael S. Tsirkin
@ 2018-04-19 11:14       ` Tiwei Bie
  -1 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-04-19 11:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Wed, Apr 18, 2018 at 07:34:06PM +0300, Michael S. Tsirkin wrote:
> On Thu, Apr 12, 2018 at 11:12:32PM +0800, Tiwei Bie wrote:
> > This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
> > With this feature negotiated, vhost-user backend can register
> > memory region based host notifiers. And it will allow the guest
> > driver in the VM to notify the hardware accelerator at the
> > vhost-user backend directly.
> > 
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> 
> Overall I think we can merge this approach, but I have two main concerns
> about this:
> 
> 1. Testing. Most people do not have the virtio hardware
>    so how to make sure this does not bit rot?
> 
>    I have an idea: add an option like this to libvhost-user.
>    Naturally libvhost-user can not get notified about a write
>    to an mmapped area, but it can write a special value there
>    (e.g. all-ones?) and then poll it to detect VQ # writes.
> 
>    Then include a vhost user bridge test with an option like this.
> 
>    I'd like to see a patch doing this.

Sure, I'll do it. Thanks for the suggestion!

> 
> 2. Memory barriers. Right now after updating the avail idx,
>    virtio does smp_wmb() and then the MMIO write.
>    Normal hardware drivers do wmb() which is an sfence.
>    Can a PCI device read bypass index write and see a stale
>    index value?

It depends on arch's memory model. Cunming will provide
more details about this later.

>    To make virtio pci do wmb() we would need a new feature bit.
>    Alternatively I guess we could maybe look at subsystem vendor/device id.
> 
>    I'd like to see a patch doing one of these things.

We prefer to add a new feature bit as it's a more robust
way to do this. I'll send out some patches soon.

Thank you very much! :)

Best regards,
Tiwei Bie

> 
> Thanks!
> 
> > ---
> >  docs/interop/vhost-user.txt    |  33 +++++++++++
> >  hw/virtio/vhost-user.c         | 123 +++++++++++++++++++++++++++++++++++++++++
> >  include/hw/virtio/vhost-user.h |   8 +++
> >  3 files changed, 164 insertions(+)
> > 
> > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > index 534caab18a..9e57b36b20 100644
> > --- a/docs/interop/vhost-user.txt
> > +++ b/docs/interop/vhost-user.txt
> > @@ -132,6 +132,16 @@ Depending on the request type, payload can be:
> >     Payload: Size bytes array holding the contents of the virtio
> >         device's configuration space
> >  
> > + * Vring area description
> > +   -----------------------
> > +   | u64 | size | offset |
> > +   -----------------------
> > +
> > +   u64: a 64-bit integer contains vring index and flags
> > +   Size: a 64-bit size of this area
> > +   Offset: a 64-bit offset of this area from the start of the
> > +       supplied file descriptor
> > +
> >  In QEMU the vhost-user message is implemented with the following struct:
> >  
> >  typedef struct VhostUserMsg {
> > @@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
> >          VhostUserLog log;
> >          struct vhost_iotlb_msg iotlb;
> >          VhostUserConfig config;
> > +        VhostUserVringArea area;
> >      };
> >  } QEMU_PACKED VhostUserMsg;
> >  
> > @@ -380,6 +391,7 @@ Protocol features
> >  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > +#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
> >  
> >  Master message types
> >  --------------------
> > @@ -777,6 +789,27 @@ Slave message types
> >       the VHOST_USER_NEED_REPLY flag, master must respond with zero when
> >       operation is successfully completed, or non-zero otherwise.
> >  
> > + * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
> > +
> > +      Id: 3
> > +      Equivalent ioctl: N/A
> > +      Slave payload: vring area description
> > +      Master payload: N/A
> > +
> > +      Sets host notifier for a specified queue. The queue index is contained
> > +      in the u64 field of the vring area description. The host notifier is
> > +      described by the file descriptor (typically it's a VFIO device fd) which
> > +      is passed as ancillary data and the size (which is mmap size and should
> > +      be the same as host page size) and offset (which is mmap offset) carried
> > +      in the vring area description. QEMU can mmap the file descriptor based
> > +      on the size and offset to get a memory range. Registering a host notifier
> > +      means mapping this memory range to the VM as the specified queue's notify
> > +      MMIO region. Slave sends this request to tell QEMU to de-register the
> > +      existing notifier if any and register the new notifier if the request is
> > +      sent with a file descriptor.
> > +      This request should be sent only when VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
> > +      protocol feature has been successfully negotiated.
> > +
> >  VHOST_USER_PROTOCOL_F_REPLY_ACK:
> >  -------------------------------
> >  The original vhost-user specification only demands replies for certain
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 791e0a4763..1cd9c7276b 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -13,6 +13,7 @@
> >  #include "hw/virtio/vhost.h"
> >  #include "hw/virtio/vhost-user.h"
> >  #include "hw/virtio/vhost-backend.h"
> > +#include "hw/virtio/virtio.h"
> >  #include "hw/virtio/virtio-net.h"
> >  #include "chardev/char-fe.h"
> >  #include "sysemu/kvm.h"
> > @@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
> >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > +    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
> >      VHOST_USER_PROTOCOL_F_MAX
> >  };
> >  
> > @@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
> >      VHOST_USER_SLAVE_NONE = 0,
> >      VHOST_USER_SLAVE_IOTLB_MSG = 1,
> >      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> > +    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
> >      VHOST_USER_SLAVE_MAX
> >  }  VhostUserSlaveRequest;
> >  
> > @@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__ ((unused));
> >                                     + sizeof(c.size) \
> >                                     + sizeof(c.flags))
> >  
> > +typedef struct VhostUserVringArea {
> > +    uint64_t u64;
> > +    uint64_t size;
> > +    uint64_t offset;
> > +} VhostUserVringArea;
> > +
> >  typedef struct {
> >      VhostUserRequest request;
> >  
> > @@ -157,6 +166,7 @@ typedef union {
> >          struct vhost_iotlb_msg iotlb;
> >          VhostUserConfig config;
> >          VhostUserCryptoSession session;
> > +        VhostUserVringArea area;
> >  } VhostUserPayload;
> >  
> >  typedef struct VhostUserMsg {
> > @@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
> >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
> >  }
> >  
> > +static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
> > +                                             int queue_idx)
> > +{
> > +    struct vhost_user *u = dev->opaque;
> > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > +    VirtIODevice *vdev = dev->vdev;
> > +
> > +    if (n->addr && !n->set) {
> > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
> > +        n->set = true;
> > +    }
> > +}
> > +
> > +static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
> > +                                            int queue_idx)
> > +{
> > +    struct vhost_user *u = dev->opaque;
> > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > +    VirtIODevice *vdev = dev->vdev;
> > +
> > +    if (n->addr && n->set) {
> > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > +        n->set = false;
> > +    }
> > +}
> > +
> >  static int vhost_user_set_vring_base(struct vhost_dev *dev,
> >                                       struct vhost_vring_state *ring)
> >  {
> > +    vhost_user_host_notifier_restore(dev, ring->index);
> > +
> >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
> >  }
> >  
> > @@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
> >          .hdr.size = sizeof(msg.payload.state),
> >      };
> >  
> > +    vhost_user_host_notifier_remove(dev, ring->index);
> > +
> >      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> >          return -1;
> >      }
> > @@ -847,6 +887,76 @@ static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
> >      return ret;
> >  }
> >  
> > +static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
> > +                                                       VhostUserVringArea *area,
> > +                                                       int fd)
> > +{
> > +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> > +    size_t page_size = qemu_real_host_page_size;
> > +    struct vhost_user *u = dev->opaque;
> > +    VhostUserState *user = u->user;
> > +    VirtIODevice *vdev = dev->vdev;
> > +    VhostUserHostNotifier *n;
> > +    int ret = 0;
> > +    void *addr;
> > +    char *name;
> > +
> > +    if (!virtio_has_feature(dev->protocol_features,
> > +                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
> > +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> > +        ret = -1;
> > +        goto out;
> > +    }
> > +
> > +    n = &user->notifier[queue_idx];
> > +
> > +    if (n->addr) {
> > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > +        object_unparent(OBJECT(&n->mr));
> > +        munmap(n->addr, page_size);
> > +        n->addr = NULL;
> > +    }
> > +
> > +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> > +        goto out;
> > +    }
> > +
> > +    /* Sanity check. */
> > +    if (area->size != page_size) {
> > +        ret = -1;
> > +        goto out;
> > +    }
> > +
> > +    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
> > +                fd, area->offset);
> > +    if (addr == MAP_FAILED) {
> > +        ret = -1;
> > +        goto out;
> > +    }
> > +
> > +    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
> > +                           user, queue_idx);
> > +    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
> > +                                      page_size, addr);
> > +    g_free(name);
> > +
> > +    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
> > +        munmap(addr, page_size);
> > +        ret = -1;
> > +        goto out;
> > +    }
> > +
> > +    n->addr = addr;
> > +    n->set = true;
> > +
> > +out:
> > +    /* Always close the fd. */
> > +    if (fd != -1) {
> > +        close(fd);
> > +    }
> > +    return ret;
> > +}
> > +
> >  static void slave_read(void *opaque)
> >  {
> >      struct vhost_dev *dev = opaque;
> > @@ -913,6 +1023,10 @@ static void slave_read(void *opaque)
> >      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
> >          ret = vhost_user_slave_handle_config_change(dev);
> >          break;
> > +    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
> > +        ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
> > +                                                          fd);
> > +        break;
> >      default:
> >          error_report("Received unexpected msg type.");
> >          if (fd != -1) {
> > @@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
> >  
> >  void vhost_user_cleanup(VhostUserState *user)
> >  {
> > +    int i;
> > +
> > +    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> > +        if (user->notifier[i].addr) {
> > +            object_unparent(OBJECT(&user->notifier[i].mr));
> > +            munmap(user->notifier[i].addr, qemu_real_host_page_size);
> > +            user->notifier[i].addr = NULL;
> > +        }
> > +    }
> >  }
> >  
> >  const VhostOps user_ops = {
> > diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> > index eb8bc0d90d..fd660393a0 100644
> > --- a/include/hw/virtio/vhost-user.h
> > +++ b/include/hw/virtio/vhost-user.h
> > @@ -9,9 +9,17 @@
> >  #define HW_VIRTIO_VHOST_USER_H
> >  
> >  #include "chardev/char-fe.h"
> > +#include "hw/virtio/virtio.h"
> > +
> > +typedef struct VhostUserHostNotifier {
> > +    MemoryRegion mr;
> > +    void *addr;
> > +    bool set;
> > +} VhostUserHostNotifier;
> >  
> >  typedef struct VhostUserState {
> >      CharBackend *chr;
> > +    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
> >  } VhostUserState;
> >  
> >  VhostUserState *vhost_user_init(void);
> > -- 
> > 2.11.0

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 11:14       ` Tiwei Bie
  0 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-04-19 11:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Wed, Apr 18, 2018 at 07:34:06PM +0300, Michael S. Tsirkin wrote:
> On Thu, Apr 12, 2018 at 11:12:32PM +0800, Tiwei Bie wrote:
> > This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
> > With this feature negotiated, vhost-user backend can register
> > memory region based host notifiers. And it will allow the guest
> > driver in the VM to notify the hardware accelerator at the
> > vhost-user backend directly.
> > 
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> 
> Overall I think we can merge this approach, but I have two main concerns
> about this:
> 
> 1. Testing. Most people do not have the virtio hardware
>    so how to make sure this does not bit rot?
> 
>    I have an idea: add an option like this to libvhost-user.
>    Naturally libvhost-user can not get notified about a write
>    to an mmapped area, but it can write a special value there
>    (e.g. all-ones?) and then poll it to detect VQ # writes.
> 
>    Then include a vhost user bridge test with an option like this.
> 
>    I'd like to see a patch doing this.

Sure, I'll do it. Thanks for the suggestion!

> 
> 2. Memory barriers. Right now after updating the avail idx,
>    virtio does smp_wmb() and then the MMIO write.
>    Normal hardware drivers do wmb() which is an sfence.
>    Can a PCI device read bypass index write and see a stale
>    index value?

It depends on arch's memory model. Cunming will provide
more details about this later.

>    To make virtio pci do wmb() we would need a new feature bit.
>    Alternatively I guess we could maybe look at subsystem vendor/device id.
> 
>    I'd like to see a patch doing one of these things.

We prefer to add a new feature bit as it's a more robust
way to do this. I'll send out some patches soon.

Thank you very much! :)

Best regards,
Tiwei Bie

> 
> Thanks!
> 
> > ---
> >  docs/interop/vhost-user.txt    |  33 +++++++++++
> >  hw/virtio/vhost-user.c         | 123 +++++++++++++++++++++++++++++++++++++++++
> >  include/hw/virtio/vhost-user.h |   8 +++
> >  3 files changed, 164 insertions(+)
> > 
> > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > index 534caab18a..9e57b36b20 100644
> > --- a/docs/interop/vhost-user.txt
> > +++ b/docs/interop/vhost-user.txt
> > @@ -132,6 +132,16 @@ Depending on the request type, payload can be:
> >     Payload: Size bytes array holding the contents of the virtio
> >         device's configuration space
> >  
> > + * Vring area description
> > +   -----------------------
> > +   | u64 | size | offset |
> > +   -----------------------
> > +
> > +   u64: a 64-bit integer contains vring index and flags
> > +   Size: a 64-bit size of this area
> > +   Offset: a 64-bit offset of this area from the start of the
> > +       supplied file descriptor
> > +
> >  In QEMU the vhost-user message is implemented with the following struct:
> >  
> >  typedef struct VhostUserMsg {
> > @@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
> >          VhostUserLog log;
> >          struct vhost_iotlb_msg iotlb;
> >          VhostUserConfig config;
> > +        VhostUserVringArea area;
> >      };
> >  } QEMU_PACKED VhostUserMsg;
> >  
> > @@ -380,6 +391,7 @@ Protocol features
> >  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > +#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
> >  
> >  Master message types
> >  --------------------
> > @@ -777,6 +789,27 @@ Slave message types
> >       the VHOST_USER_NEED_REPLY flag, master must respond with zero when
> >       operation is successfully completed, or non-zero otherwise.
> >  
> > + * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
> > +
> > +      Id: 3
> > +      Equivalent ioctl: N/A
> > +      Slave payload: vring area description
> > +      Master payload: N/A
> > +
> > +      Sets host notifier for a specified queue. The queue index is contained
> > +      in the u64 field of the vring area description. The host notifier is
> > +      described by the file descriptor (typically it's a VFIO device fd) which
> > +      is passed as ancillary data and the size (which is mmap size and should
> > +      be the same as host page size) and offset (which is mmap offset) carried
> > +      in the vring area description. QEMU can mmap the file descriptor based
> > +      on the size and offset to get a memory range. Registering a host notifier
> > +      means mapping this memory range to the VM as the specified queue's notify
> > +      MMIO region. Slave sends this request to tell QEMU to de-register the
> > +      existing notifier if any and register the new notifier if the request is
> > +      sent with a file descriptor.
> > +      This request should be sent only when VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
> > +      protocol feature has been successfully negotiated.
> > +
> >  VHOST_USER_PROTOCOL_F_REPLY_ACK:
> >  -------------------------------
> >  The original vhost-user specification only demands replies for certain
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 791e0a4763..1cd9c7276b 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -13,6 +13,7 @@
> >  #include "hw/virtio/vhost.h"
> >  #include "hw/virtio/vhost-user.h"
> >  #include "hw/virtio/vhost-backend.h"
> > +#include "hw/virtio/virtio.h"
> >  #include "hw/virtio/virtio-net.h"
> >  #include "chardev/char-fe.h"
> >  #include "sysemu/kvm.h"
> > @@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
> >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > +    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
> >      VHOST_USER_PROTOCOL_F_MAX
> >  };
> >  
> > @@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
> >      VHOST_USER_SLAVE_NONE = 0,
> >      VHOST_USER_SLAVE_IOTLB_MSG = 1,
> >      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> > +    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
> >      VHOST_USER_SLAVE_MAX
> >  }  VhostUserSlaveRequest;
> >  
> > @@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__ ((unused));
> >                                     + sizeof(c.size) \
> >                                     + sizeof(c.flags))
> >  
> > +typedef struct VhostUserVringArea {
> > +    uint64_t u64;
> > +    uint64_t size;
> > +    uint64_t offset;
> > +} VhostUserVringArea;
> > +
> >  typedef struct {
> >      VhostUserRequest request;
> >  
> > @@ -157,6 +166,7 @@ typedef union {
> >          struct vhost_iotlb_msg iotlb;
> >          VhostUserConfig config;
> >          VhostUserCryptoSession session;
> > +        VhostUserVringArea area;
> >  } VhostUserPayload;
> >  
> >  typedef struct VhostUserMsg {
> > @@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
> >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
> >  }
> >  
> > +static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
> > +                                             int queue_idx)
> > +{
> > +    struct vhost_user *u = dev->opaque;
> > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > +    VirtIODevice *vdev = dev->vdev;
> > +
> > +    if (n->addr && !n->set) {
> > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
> > +        n->set = true;
> > +    }
> > +}
> > +
> > +static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
> > +                                            int queue_idx)
> > +{
> > +    struct vhost_user *u = dev->opaque;
> > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > +    VirtIODevice *vdev = dev->vdev;
> > +
> > +    if (n->addr && n->set) {
> > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > +        n->set = false;
> > +    }
> > +}
> > +
> >  static int vhost_user_set_vring_base(struct vhost_dev *dev,
> >                                       struct vhost_vring_state *ring)
> >  {
> > +    vhost_user_host_notifier_restore(dev, ring->index);
> > +
> >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
> >  }
> >  
> > @@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
> >          .hdr.size = sizeof(msg.payload.state),
> >      };
> >  
> > +    vhost_user_host_notifier_remove(dev, ring->index);
> > +
> >      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> >          return -1;
> >      }
> > @@ -847,6 +887,76 @@ static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
> >      return ret;
> >  }
> >  
> > +static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
> > +                                                       VhostUserVringArea *area,
> > +                                                       int fd)
> > +{
> > +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> > +    size_t page_size = qemu_real_host_page_size;
> > +    struct vhost_user *u = dev->opaque;
> > +    VhostUserState *user = u->user;
> > +    VirtIODevice *vdev = dev->vdev;
> > +    VhostUserHostNotifier *n;
> > +    int ret = 0;
> > +    void *addr;
> > +    char *name;
> > +
> > +    if (!virtio_has_feature(dev->protocol_features,
> > +                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
> > +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> > +        ret = -1;
> > +        goto out;
> > +    }
> > +
> > +    n = &user->notifier[queue_idx];
> > +
> > +    if (n->addr) {
> > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > +        object_unparent(OBJECT(&n->mr));
> > +        munmap(n->addr, page_size);
> > +        n->addr = NULL;
> > +    }
> > +
> > +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> > +        goto out;
> > +    }
> > +
> > +    /* Sanity check. */
> > +    if (area->size != page_size) {
> > +        ret = -1;
> > +        goto out;
> > +    }
> > +
> > +    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
> > +                fd, area->offset);
> > +    if (addr == MAP_FAILED) {
> > +        ret = -1;
> > +        goto out;
> > +    }
> > +
> > +    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
> > +                           user, queue_idx);
> > +    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
> > +                                      page_size, addr);
> > +    g_free(name);
> > +
> > +    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
> > +        munmap(addr, page_size);
> > +        ret = -1;
> > +        goto out;
> > +    }
> > +
> > +    n->addr = addr;
> > +    n->set = true;
> > +
> > +out:
> > +    /* Always close the fd. */
> > +    if (fd != -1) {
> > +        close(fd);
> > +    }
> > +    return ret;
> > +}
> > +
> >  static void slave_read(void *opaque)
> >  {
> >      struct vhost_dev *dev = opaque;
> > @@ -913,6 +1023,10 @@ static void slave_read(void *opaque)
> >      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
> >          ret = vhost_user_slave_handle_config_change(dev);
> >          break;
> > +    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
> > +        ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
> > +                                                          fd);
> > +        break;
> >      default:
> >          error_report("Received unexpected msg type.");
> >          if (fd != -1) {
> > @@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
> >  
> >  void vhost_user_cleanup(VhostUserState *user)
> >  {
> > +    int i;
> > +
> > +    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> > +        if (user->notifier[i].addr) {
> > +            object_unparent(OBJECT(&user->notifier[i].mr));
> > +            munmap(user->notifier[i].addr, qemu_real_host_page_size);
> > +            user->notifier[i].addr = NULL;
> > +        }
> > +    }
> >  }
> >  
> >  const VhostOps user_ops = {
> > diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> > index eb8bc0d90d..fd660393a0 100644
> > --- a/include/hw/virtio/vhost-user.h
> > +++ b/include/hw/virtio/vhost-user.h
> > @@ -9,9 +9,17 @@
> >  #define HW_VIRTIO_VHOST_USER_H
> >  
> >  #include "chardev/char-fe.h"
> > +#include "hw/virtio/virtio.h"
> > +
> > +typedef struct VhostUserHostNotifier {
> > +    MemoryRegion mr;
> > +    void *addr;
> > +    bool set;
> > +} VhostUserHostNotifier;
> >  
> >  typedef struct VhostUserState {
> >      CharBackend *chr;
> > +    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
> >  } VhostUserState;
> >  
> >  VhostUserState *vhost_user_init(void);
> > -- 
> > 2.11.0

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 11:14       ` [virtio-dev] " Tiwei Bie
@ 2018-04-19 12:43         ` Liang, Cunming
  -1 siblings, 0 replies; 98+ messages in thread
From: Liang, Cunming @ 2018-04-19 12:43 UTC (permalink / raw)
  To: Bie, Tiwei, Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong, Wang,
	Xiao W



> -----Original Message-----
> From: Bie, Tiwei
> Sent: Thursday, April 19, 2018 7:15 PM
> To: Michael S. Tsirkin <mst@redhat.com>
> Cc: jasowang@redhat.com; alex.williamson@redhat.com; pbonzini@redhat.com;
> stefanha@redhat.com; qemu-devel@nongnu.org; virtio-dev@lists.oasis-
> open.org; Liang, Cunming <cunming.liang@intel.com>; Daly, Dan
> <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: Re: [PATCH v3 6/6] vhost-user: support registering external host
> notifiers
> 
> On Wed, Apr 18, 2018 at 07:34:06PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Apr 12, 2018 at 11:12:32PM +0800, Tiwei Bie wrote:
> > > This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
> > > With this feature negotiated, vhost-user backend can register memory
> > > region based host notifiers. And it will allow the guest driver in
> > > the VM to notify the hardware accelerator at the vhost-user backend
> > > directly.
> > >
> > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> >
> > Overall I think we can merge this approach, but I have two main
> > concerns about this:
> >
> > 1. Testing. Most people do not have the virtio hardware
> >    so how to make sure this does not bit rot?
> >
> >    I have an idea: add an option like this to libvhost-user.
> >    Naturally libvhost-user can not get notified about a write
> >    to an mmapped area, but it can write a special value there
> >    (e.g. all-ones?) and then poll it to detect VQ # writes.
> >
> >    Then include a vhost user bridge test with an option like this.
> >
> >    I'd like to see a patch doing this.
> 
> Sure, I'll do it. Thanks for the suggestion!
> 
> >
> > 2. Memory barriers. Right now after updating the avail idx,
> >    virtio does smp_wmb() and then the MMIO write.
> >    Normal hardware drivers do wmb() which is an sfence.
> >    Can a PCI device read bypass index write and see a stale
> >    index value?
A compiler barrier is enough on strongly-ordered memory platform. As it doesn't re-order store, PCI device won't see a stale index value. But a weakly-ordered memory needs sfence.

> 
> It depends on arch's memory model. Cunming will provide more details about
> this later.
> 
> >    To make virtio pci do wmb() we would need a new feature bit.
> >    Alternatively I guess we could maybe look at subsystem vendor/device id.
> >
> >    I'd like to see a patch doing one of these things.
> 
> We prefer to add a new feature bit as it's a more robust way to do this. I'll send
> out some patches soon.
> 
> Thank you very much! :)
> 
> Best regards,
> Tiwei Bie
> 
> >
> > Thanks!
> >
> > > ---
> > >  docs/interop/vhost-user.txt    |  33 +++++++++++
> > >  hw/virtio/vhost-user.c         | 123
> +++++++++++++++++++++++++++++++++++++++++
> > >  include/hw/virtio/vhost-user.h |   8 +++
> > >  3 files changed, 164 insertions(+)
> > >
> > > diff --git a/docs/interop/vhost-user.txt
> > > b/docs/interop/vhost-user.txt index 534caab18a..9e57b36b20 100644
> > > --- a/docs/interop/vhost-user.txt
> > > +++ b/docs/interop/vhost-user.txt
> > > @@ -132,6 +132,16 @@ Depending on the request type, payload can be:
> > >     Payload: Size bytes array holding the contents of the virtio
> > >         device's configuration space
> > >
> > > + * Vring area description
> > > +   -----------------------
> > > +   | u64 | size | offset |
> > > +   -----------------------
> > > +
> > > +   u64: a 64-bit integer contains vring index and flags
> > > +   Size: a 64-bit size of this area
> > > +   Offset: a 64-bit offset of this area from the start of the
> > > +       supplied file descriptor
> > > +
> > >  In QEMU the vhost-user message is implemented with the following struct:
> > >
> > >  typedef struct VhostUserMsg {
> > > @@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
> > >          VhostUserLog log;
> > >          struct vhost_iotlb_msg iotlb;
> > >          VhostUserConfig config;
> > > +        VhostUserVringArea area;
> > >      };
> > >  } QEMU_PACKED VhostUserMsg;
> > >
> > > @@ -380,6 +391,7 @@ Protocol features  #define
> > > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> > >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> > >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > > +#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
> > >
> > >  Master message types
> > >  --------------------
> > > @@ -777,6 +789,27 @@ Slave message types
> > >       the VHOST_USER_NEED_REPLY flag, master must respond with zero when
> > >       operation is successfully completed, or non-zero otherwise.
> > >
> > > + * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
> > > +
> > > +      Id: 3
> > > +      Equivalent ioctl: N/A
> > > +      Slave payload: vring area description
> > > +      Master payload: N/A
> > > +
> > > +      Sets host notifier for a specified queue. The queue index is contained
> > > +      in the u64 field of the vring area description. The host notifier is
> > > +      described by the file descriptor (typically it's a VFIO device fd) which
> > > +      is passed as ancillary data and the size (which is mmap size and should
> > > +      be the same as host page size) and offset (which is mmap offset) carried
> > > +      in the vring area description. QEMU can mmap the file descriptor based
> > > +      on the size and offset to get a memory range. Registering a host notifier
> > > +      means mapping this memory range to the VM as the specified queue's
> notify
> > > +      MMIO region. Slave sends this request to tell QEMU to de-register the
> > > +      existing notifier if any and register the new notifier if the request is
> > > +      sent with a file descriptor.
> > > +      This request should be sent only when
> VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
> > > +      protocol feature has been successfully negotiated.
> > > +
> > >  VHOST_USER_PROTOCOL_F_REPLY_ACK:
> > >  -------------------------------
> > >  The original vhost-user specification only demands replies for
> > > certain diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index 791e0a4763..1cd9c7276b 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -13,6 +13,7 @@
> > >  #include "hw/virtio/vhost.h"
> > >  #include "hw/virtio/vhost-user.h"
> > >  #include "hw/virtio/vhost-backend.h"
> > > +#include "hw/virtio/virtio.h"
> > >  #include "hw/virtio/virtio-net.h"
> > >  #include "chardev/char-fe.h"
> > >  #include "sysemu/kvm.h"
> > > @@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
> > >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> > >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> > >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > > +    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
> > >      VHOST_USER_PROTOCOL_F_MAX
> > >  };
> > >
> > > @@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
> > >      VHOST_USER_SLAVE_NONE = 0,
> > >      VHOST_USER_SLAVE_IOTLB_MSG = 1,
> > >      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> > > +    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
> > >      VHOST_USER_SLAVE_MAX
> > >  }  VhostUserSlaveRequest;
> > >
> > > @@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__ ((unused));
> > >                                     + sizeof(c.size) \
> > >                                     + sizeof(c.flags))
> > >
> > > +typedef struct VhostUserVringArea {
> > > +    uint64_t u64;
> > > +    uint64_t size;
> > > +    uint64_t offset;
> > > +} VhostUserVringArea;
> > > +
> > >  typedef struct {
> > >      VhostUserRequest request;
> > >
> > > @@ -157,6 +166,7 @@ typedef union {
> > >          struct vhost_iotlb_msg iotlb;
> > >          VhostUserConfig config;
> > >          VhostUserCryptoSession session;
> > > +        VhostUserVringArea area;
> > >  } VhostUserPayload;
> > >
> > >  typedef struct VhostUserMsg {
> > > @@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct
> vhost_dev *dev,
> > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);  }
> > >
> > > +static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
> > > +                                             int queue_idx) {
> > > +    struct vhost_user *u = dev->opaque;
> > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > +    VirtIODevice *vdev = dev->vdev;
> > > +
> > > +    if (n->addr && !n->set) {
> > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
> > > +        n->set = true;
> > > +    }
> > > +}
> > > +
> > > +static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
> > > +                                            int queue_idx) {
> > > +    struct vhost_user *u = dev->opaque;
> > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > +    VirtIODevice *vdev = dev->vdev;
> > > +
> > > +    if (n->addr && n->set) {
> > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > > +        n->set = false;
> > > +    }
> > > +}
> > > +
> > >  static int vhost_user_set_vring_base(struct vhost_dev *dev,
> > >                                       struct vhost_vring_state
> > > *ring)  {
> > > +    vhost_user_host_notifier_restore(dev, ring->index);
> > > +
> > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
> > > }
> > >
> > > @@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct
> vhost_dev *dev,
> > >          .hdr.size = sizeof(msg.payload.state),
> > >      };
> > >
> > > +    vhost_user_host_notifier_remove(dev, ring->index);
> > > +
> > >      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > >          return -1;
> > >      }
> > > @@ -847,6 +887,76 @@ static int
> vhost_user_slave_handle_config_change(struct vhost_dev *dev)
> > >      return ret;
> > >  }
> > >
> > > +static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev
> *dev,
> > > +                                                       VhostUserVringArea *area,
> > > +                                                       int fd) {
> > > +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> > > +    size_t page_size = qemu_real_host_page_size;
> > > +    struct vhost_user *u = dev->opaque;
> > > +    VhostUserState *user = u->user;
> > > +    VirtIODevice *vdev = dev->vdev;
> > > +    VhostUserHostNotifier *n;
> > > +    int ret = 0;
> > > +    void *addr;
> > > +    char *name;
> > > +
> > > +    if (!virtio_has_feature(dev->protocol_features,
> > > +                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
> > > +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    n = &user->notifier[queue_idx];
> > > +
> > > +    if (n->addr) {
> > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > > +        object_unparent(OBJECT(&n->mr));
> > > +        munmap(n->addr, page_size);
> > > +        n->addr = NULL;
> > > +    }
> > > +
> > > +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> > > +        goto out;
> > > +    }
> > > +
> > > +    /* Sanity check. */
> > > +    if (area->size != page_size) {
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
> > > +                fd, area->offset);
> > > +    if (addr == MAP_FAILED) {
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
> > > +                           user, queue_idx);
> > > +    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
> > > +                                      page_size, addr);
> > > +    g_free(name);
> > > +
> > > +    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
> > > +        munmap(addr, page_size);
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    n->addr = addr;
> > > +    n->set = true;
> > > +
> > > +out:
> > > +    /* Always close the fd. */
> > > +    if (fd != -1) {
> > > +        close(fd);
> > > +    }
> > > +    return ret;
> > > +}
> > > +
> > >  static void slave_read(void *opaque)  {
> > >      struct vhost_dev *dev = opaque; @@ -913,6 +1023,10 @@ static
> > > void slave_read(void *opaque)
> > >      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
> > >          ret = vhost_user_slave_handle_config_change(dev);
> > >          break;
> > > +    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
> > > +        ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
> > > +                                                          fd);
> > > +        break;
> > >      default:
> > >          error_report("Received unexpected msg type.");
> > >          if (fd != -1) {
> > > @@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
> > >
> > >  void vhost_user_cleanup(VhostUserState *user)  {
> > > +    int i;
> > > +
> > > +    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> > > +        if (user->notifier[i].addr) {
> > > +            object_unparent(OBJECT(&user->notifier[i].mr));
> > > +            munmap(user->notifier[i].addr, qemu_real_host_page_size);
> > > +            user->notifier[i].addr = NULL;
> > > +        }
> > > +    }
> > >  }
> > >
> > >  const VhostOps user_ops = {
> > > diff --git a/include/hw/virtio/vhost-user.h
> > > b/include/hw/virtio/vhost-user.h index eb8bc0d90d..fd660393a0 100644
> > > --- a/include/hw/virtio/vhost-user.h
> > > +++ b/include/hw/virtio/vhost-user.h
> > > @@ -9,9 +9,17 @@
> > >  #define HW_VIRTIO_VHOST_USER_H
> > >
> > >  #include "chardev/char-fe.h"
> > > +#include "hw/virtio/virtio.h"
> > > +
> > > +typedef struct VhostUserHostNotifier {
> > > +    MemoryRegion mr;
> > > +    void *addr;
> > > +    bool set;
> > > +} VhostUserHostNotifier;
> > >
> > >  typedef struct VhostUserState {
> > >      CharBackend *chr;
> > > +    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > >  } VhostUserState;
> > >
> > >  VhostUserState *vhost_user_init(void);
> > > --
> > > 2.11.0

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 12:43         ` Liang, Cunming
  0 siblings, 0 replies; 98+ messages in thread
From: Liang, Cunming @ 2018-04-19 12:43 UTC (permalink / raw)
  To: Bie, Tiwei, Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong, Wang,
	Xiao W



> -----Original Message-----
> From: Bie, Tiwei
> Sent: Thursday, April 19, 2018 7:15 PM
> To: Michael S. Tsirkin <mst@redhat.com>
> Cc: jasowang@redhat.com; alex.williamson@redhat.com; pbonzini@redhat.com;
> stefanha@redhat.com; qemu-devel@nongnu.org; virtio-dev@lists.oasis-
> open.org; Liang, Cunming <cunming.liang@intel.com>; Daly, Dan
> <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: Re: [PATCH v3 6/6] vhost-user: support registering external host
> notifiers
> 
> On Wed, Apr 18, 2018 at 07:34:06PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Apr 12, 2018 at 11:12:32PM +0800, Tiwei Bie wrote:
> > > This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
> > > With this feature negotiated, vhost-user backend can register memory
> > > region based host notifiers. And it will allow the guest driver in
> > > the VM to notify the hardware accelerator at the vhost-user backend
> > > directly.
> > >
> > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> >
> > Overall I think we can merge this approach, but I have two main
> > concerns about this:
> >
> > 1. Testing. Most people do not have the virtio hardware
> >    so how to make sure this does not bit rot?
> >
> >    I have an idea: add an option like this to libvhost-user.
> >    Naturally libvhost-user can not get notified about a write
> >    to an mmapped area, but it can write a special value there
> >    (e.g. all-ones?) and then poll it to detect VQ # writes.
> >
> >    Then include a vhost user bridge test with an option like this.
> >
> >    I'd like to see a patch doing this.
> 
> Sure, I'll do it. Thanks for the suggestion!
> 
> >
> > 2. Memory barriers. Right now after updating the avail idx,
> >    virtio does smp_wmb() and then the MMIO write.
> >    Normal hardware drivers do wmb() which is an sfence.
> >    Can a PCI device read bypass index write and see a stale
> >    index value?
A compiler barrier is enough on strongly-ordered memory platform. As it doesn't re-order store, PCI device won't see a stale index value. But a weakly-ordered memory needs sfence.

> 
> It depends on arch's memory model. Cunming will provide more details about
> this later.
> 
> >    To make virtio pci do wmb() we would need a new feature bit.
> >    Alternatively I guess we could maybe look at subsystem vendor/device id.
> >
> >    I'd like to see a patch doing one of these things.
> 
> We prefer to add a new feature bit as it's a more robust way to do this. I'll send
> out some patches soon.
> 
> Thank you very much! :)
> 
> Best regards,
> Tiwei Bie
> 
> >
> > Thanks!
> >
> > > ---
> > >  docs/interop/vhost-user.txt    |  33 +++++++++++
> > >  hw/virtio/vhost-user.c         | 123
> +++++++++++++++++++++++++++++++++++++++++
> > >  include/hw/virtio/vhost-user.h |   8 +++
> > >  3 files changed, 164 insertions(+)
> > >
> > > diff --git a/docs/interop/vhost-user.txt
> > > b/docs/interop/vhost-user.txt index 534caab18a..9e57b36b20 100644
> > > --- a/docs/interop/vhost-user.txt
> > > +++ b/docs/interop/vhost-user.txt
> > > @@ -132,6 +132,16 @@ Depending on the request type, payload can be:
> > >     Payload: Size bytes array holding the contents of the virtio
> > >         device's configuration space
> > >
> > > + * Vring area description
> > > +   -----------------------
> > > +   | u64 | size | offset |
> > > +   -----------------------
> > > +
> > > +   u64: a 64-bit integer contains vring index and flags
> > > +   Size: a 64-bit size of this area
> > > +   Offset: a 64-bit offset of this area from the start of the
> > > +       supplied file descriptor
> > > +
> > >  In QEMU the vhost-user message is implemented with the following struct:
> > >
> > >  typedef struct VhostUserMsg {
> > > @@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
> > >          VhostUserLog log;
> > >          struct vhost_iotlb_msg iotlb;
> > >          VhostUserConfig config;
> > > +        VhostUserVringArea area;
> > >      };
> > >  } QEMU_PACKED VhostUserMsg;
> > >
> > > @@ -380,6 +391,7 @@ Protocol features  #define
> > > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> > >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> > >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > > +#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
> > >
> > >  Master message types
> > >  --------------------
> > > @@ -777,6 +789,27 @@ Slave message types
> > >       the VHOST_USER_NEED_REPLY flag, master must respond with zero when
> > >       operation is successfully completed, or non-zero otherwise.
> > >
> > > + * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
> > > +
> > > +      Id: 3
> > > +      Equivalent ioctl: N/A
> > > +      Slave payload: vring area description
> > > +      Master payload: N/A
> > > +
> > > +      Sets host notifier for a specified queue. The queue index is contained
> > > +      in the u64 field of the vring area description. The host notifier is
> > > +      described by the file descriptor (typically it's a VFIO device fd) which
> > > +      is passed as ancillary data and the size (which is mmap size and should
> > > +      be the same as host page size) and offset (which is mmap offset) carried
> > > +      in the vring area description. QEMU can mmap the file descriptor based
> > > +      on the size and offset to get a memory range. Registering a host notifier
> > > +      means mapping this memory range to the VM as the specified queue's
> notify
> > > +      MMIO region. Slave sends this request to tell QEMU to de-register the
> > > +      existing notifier if any and register the new notifier if the request is
> > > +      sent with a file descriptor.
> > > +      This request should be sent only when
> VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
> > > +      protocol feature has been successfully negotiated.
> > > +
> > >  VHOST_USER_PROTOCOL_F_REPLY_ACK:
> > >  -------------------------------
> > >  The original vhost-user specification only demands replies for
> > > certain diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index 791e0a4763..1cd9c7276b 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -13,6 +13,7 @@
> > >  #include "hw/virtio/vhost.h"
> > >  #include "hw/virtio/vhost-user.h"
> > >  #include "hw/virtio/vhost-backend.h"
> > > +#include "hw/virtio/virtio.h"
> > >  #include "hw/virtio/virtio-net.h"
> > >  #include "chardev/char-fe.h"
> > >  #include "sysemu/kvm.h"
> > > @@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
> > >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> > >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> > >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > > +    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
> > >      VHOST_USER_PROTOCOL_F_MAX
> > >  };
> > >
> > > @@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
> > >      VHOST_USER_SLAVE_NONE = 0,
> > >      VHOST_USER_SLAVE_IOTLB_MSG = 1,
> > >      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> > > +    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
> > >      VHOST_USER_SLAVE_MAX
> > >  }  VhostUserSlaveRequest;
> > >
> > > @@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__ ((unused));
> > >                                     + sizeof(c.size) \
> > >                                     + sizeof(c.flags))
> > >
> > > +typedef struct VhostUserVringArea {
> > > +    uint64_t u64;
> > > +    uint64_t size;
> > > +    uint64_t offset;
> > > +} VhostUserVringArea;
> > > +
> > >  typedef struct {
> > >      VhostUserRequest request;
> > >
> > > @@ -157,6 +166,7 @@ typedef union {
> > >          struct vhost_iotlb_msg iotlb;
> > >          VhostUserConfig config;
> > >          VhostUserCryptoSession session;
> > > +        VhostUserVringArea area;
> > >  } VhostUserPayload;
> > >
> > >  typedef struct VhostUserMsg {
> > > @@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct
> vhost_dev *dev,
> > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);  }
> > >
> > > +static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
> > > +                                             int queue_idx) {
> > > +    struct vhost_user *u = dev->opaque;
> > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > +    VirtIODevice *vdev = dev->vdev;
> > > +
> > > +    if (n->addr && !n->set) {
> > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
> > > +        n->set = true;
> > > +    }
> > > +}
> > > +
> > > +static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
> > > +                                            int queue_idx) {
> > > +    struct vhost_user *u = dev->opaque;
> > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > +    VirtIODevice *vdev = dev->vdev;
> > > +
> > > +    if (n->addr && n->set) {
> > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > > +        n->set = false;
> > > +    }
> > > +}
> > > +
> > >  static int vhost_user_set_vring_base(struct vhost_dev *dev,
> > >                                       struct vhost_vring_state
> > > *ring)  {
> > > +    vhost_user_host_notifier_restore(dev, ring->index);
> > > +
> > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
> > > }
> > >
> > > @@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct
> vhost_dev *dev,
> > >          .hdr.size = sizeof(msg.payload.state),
> > >      };
> > >
> > > +    vhost_user_host_notifier_remove(dev, ring->index);
> > > +
> > >      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > >          return -1;
> > >      }
> > > @@ -847,6 +887,76 @@ static int
> vhost_user_slave_handle_config_change(struct vhost_dev *dev)
> > >      return ret;
> > >  }
> > >
> > > +static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev
> *dev,
> > > +                                                       VhostUserVringArea *area,
> > > +                                                       int fd) {
> > > +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> > > +    size_t page_size = qemu_real_host_page_size;
> > > +    struct vhost_user *u = dev->opaque;
> > > +    VhostUserState *user = u->user;
> > > +    VirtIODevice *vdev = dev->vdev;
> > > +    VhostUserHostNotifier *n;
> > > +    int ret = 0;
> > > +    void *addr;
> > > +    char *name;
> > > +
> > > +    if (!virtio_has_feature(dev->protocol_features,
> > > +                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
> > > +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    n = &user->notifier[queue_idx];
> > > +
> > > +    if (n->addr) {
> > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > > +        object_unparent(OBJECT(&n->mr));
> > > +        munmap(n->addr, page_size);
> > > +        n->addr = NULL;
> > > +    }
> > > +
> > > +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> > > +        goto out;
> > > +    }
> > > +
> > > +    /* Sanity check. */
> > > +    if (area->size != page_size) {
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
> > > +                fd, area->offset);
> > > +    if (addr == MAP_FAILED) {
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
> > > +                           user, queue_idx);
> > > +    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
> > > +                                      page_size, addr);
> > > +    g_free(name);
> > > +
> > > +    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
> > > +        munmap(addr, page_size);
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    n->addr = addr;
> > > +    n->set = true;
> > > +
> > > +out:
> > > +    /* Always close the fd. */
> > > +    if (fd != -1) {
> > > +        close(fd);
> > > +    }
> > > +    return ret;
> > > +}
> > > +
> > >  static void slave_read(void *opaque)  {
> > >      struct vhost_dev *dev = opaque; @@ -913,6 +1023,10 @@ static
> > > void slave_read(void *opaque)
> > >      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
> > >          ret = vhost_user_slave_handle_config_change(dev);
> > >          break;
> > > +    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
> > > +        ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
> > > +                                                          fd);
> > > +        break;
> > >      default:
> > >          error_report("Received unexpected msg type.");
> > >          if (fd != -1) {
> > > @@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
> > >
> > >  void vhost_user_cleanup(VhostUserState *user)  {
> > > +    int i;
> > > +
> > > +    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> > > +        if (user->notifier[i].addr) {
> > > +            object_unparent(OBJECT(&user->notifier[i].mr));
> > > +            munmap(user->notifier[i].addr, qemu_real_host_page_size);
> > > +            user->notifier[i].addr = NULL;
> > > +        }
> > > +    }
> > >  }
> > >
> > >  const VhostOps user_ops = {
> > > diff --git a/include/hw/virtio/vhost-user.h
> > > b/include/hw/virtio/vhost-user.h index eb8bc0d90d..fd660393a0 100644
> > > --- a/include/hw/virtio/vhost-user.h
> > > +++ b/include/hw/virtio/vhost-user.h
> > > @@ -9,9 +9,17 @@
> > >  #define HW_VIRTIO_VHOST_USER_H
> > >
> > >  #include "chardev/char-fe.h"
> > > +#include "hw/virtio/virtio.h"
> > > +
> > > +typedef struct VhostUserHostNotifier {
> > > +    MemoryRegion mr;
> > > +    void *addr;
> > > +    bool set;
> > > +} VhostUserHostNotifier;
> > >
> > >  typedef struct VhostUserState {
> > >      CharBackend *chr;
> > > +    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > >  } VhostUserState;
> > >
> > >  VhostUserState *vhost_user_init(void);
> > > --
> > > 2.11.0

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 12:43         ` [virtio-dev] " Liang, Cunming
@ 2018-04-19 13:02           ` Paolo Bonzini
  -1 siblings, 0 replies; 98+ messages in thread
From: Paolo Bonzini @ 2018-04-19 13:02 UTC (permalink / raw)
  To: Liang, Cunming, Bie, Tiwei, Michael S. Tsirkin
  Cc: jasowang, alex.williamson, stefanha, qemu-devel, virtio-dev,
	Daly, Dan, Tan, Jianfeng, Wang, Zhihong, Wang, Xiao W

On 19/04/2018 14:43, Liang, Cunming wrote:
>> 2. Memory barriers. Right now after updating the avail idx, 
>> virtio does smp_wmb() and then the MMIO write. Normal hardware
>> drivers do wmb() which is an sfence. Can a PCI device read bypass
>> index write and see a stale index value?
>
> A compiler barrier is enough on strongly-ordered memory platform. As
> it doesn't re-order store, PCI device won't see a stale index value.
> But a weakly-ordered memory needs sfence.

That is complicated then.  We need to define a feature bit and (in the
Linux driver) propagate it to vring_create_virtqueue's weak_barrier
argument.  However:

- if we make it 1 when weak barriers are needed, the device also needs
to nack feature negotiation (not allow setting the FEATURES_OK) if the
bit is not set by the driver.  However, that is not enough.  Live
migration assumes that it is okay to migrate a virtual machine from a
source that doesn't support a feature to a destination that supports it.
 In this case, it would assume that it is okay to migrate from software
virtio to hardware virtio.  This is wrong because the destination would
use weak barriers

- if we make it 1 when strong barriers are enough, software virtio
devices needs to be updated to expose the bit.  This works, including
live migration, but updated drivers will now go slower when run against
an old device that doesn't know the feature bit.

Maybe bump the PCI revision, so that only the new revision has the bit?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 13:02           ` Paolo Bonzini
  0 siblings, 0 replies; 98+ messages in thread
From: Paolo Bonzini @ 2018-04-19 13:02 UTC (permalink / raw)
  To: Liang, Cunming, Bie, Tiwei, Michael S. Tsirkin
  Cc: jasowang, alex.williamson, stefanha, qemu-devel, virtio-dev,
	Daly, Dan, Tan, Jianfeng, Wang, Zhihong, Wang, Xiao W

On 19/04/2018 14:43, Liang, Cunming wrote:
>> 2. Memory barriers. Right now after updating the avail idx, 
>> virtio does smp_wmb() and then the MMIO write. Normal hardware
>> drivers do wmb() which is an sfence. Can a PCI device read bypass
>> index write and see a stale index value?
>
> A compiler barrier is enough on strongly-ordered memory platform. As
> it doesn't re-order store, PCI device won't see a stale index value.
> But a weakly-ordered memory needs sfence.

That is complicated then.  We need to define a feature bit and (in the
Linux driver) propagate it to vring_create_virtqueue's weak_barrier
argument.  However:

- if we make it 1 when weak barriers are needed, the device also needs
to nack feature negotiation (not allow setting the FEATURES_OK) if the
bit is not set by the driver.  However, that is not enough.  Live
migration assumes that it is okay to migrate a virtual machine from a
source that doesn't support a feature to a destination that supports it.
 In this case, it would assume that it is okay to migrate from software
virtio to hardware virtio.  This is wrong because the destination would
use weak barriers

- if we make it 1 when strong barriers are enough, software virtio
devices needs to be updated to expose the bit.  This works, including
live migration, but updated drivers will now go slower when run against
an old device that doesn't know the feature bit.

Maybe bump the PCI revision, so that only the new revision has the bit?

Thanks,

Paolo

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 13:02           ` Paolo Bonzini
@ 2018-04-19 15:19             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-19 15:19 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liang, Cunming, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On Thu, Apr 19, 2018 at 03:02:40PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 14:43, Liang, Cunming wrote:
> >> 2. Memory barriers. Right now after updating the avail idx, 
> >> virtio does smp_wmb() and then the MMIO write. Normal hardware
> >> drivers do wmb() which is an sfence. Can a PCI device read bypass
> >> index write and see a stale index value?
> >
> > A compiler barrier is enough on strongly-ordered memory platform. As
> > it doesn't re-order store, PCI device won't see a stale index value.
> > But a weakly-ordered memory needs sfence.
> 
> That is complicated then.  We need to define a feature bit and (in the
> Linux driver) propagate it to vring_create_virtqueue's weak_barrier
> argument.  However:
> 
> - if we make it 1 when weak barriers are needed, the device also needs
> to nack feature negotiation (not allow setting the FEATURES_OK) if the
> bit is not set by the driver.
>  However, that is not enough.  Live
> migration assumes that it is okay to migrate a virtual machine from a
> source that doesn't support a feature to a destination that supports it.
>  In this case, it would assume that it is okay to migrate from software
> virtio to hardware virtio.  This is wrong because the destination would
> use weak barriers

You can't migrate between systems with different sets of device features
right now.

> - if we make it 1 when strong barriers are enough, software virtio
> devices needs to be updated to expose the bit.  This works, including
> live migration, but updated drivers will now go slower when run against
> an old device that doesn't know the feature bit.
> 
> Maybe bump the PCI revision, so that only the new revision has the bit?
> 
> Thanks,
> 
> Paolo

As a first step, if you want to migrate to a HW offloaded solution
then you need to enable the feature. It does mean it will go
a bit slower when run with software, so it's only good if
most systems in your cluster do have the HW offload.
I think we can start by getting that working and
think about ways to improve down the road.


That's the usecase we designed FEATURES_OK for though, so I do
think/hope it's enough and we don't need to play with revisions.


-- 
MST

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 15:19             ` Michael S. Tsirkin
  0 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-19 15:19 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liang, Cunming, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On Thu, Apr 19, 2018 at 03:02:40PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 14:43, Liang, Cunming wrote:
> >> 2. Memory barriers. Right now after updating the avail idx, 
> >> virtio does smp_wmb() and then the MMIO write. Normal hardware
> >> drivers do wmb() which is an sfence. Can a PCI device read bypass
> >> index write and see a stale index value?
> >
> > A compiler barrier is enough on strongly-ordered memory platform. As
> > it doesn't re-order store, PCI device won't see a stale index value.
> > But a weakly-ordered memory needs sfence.
> 
> That is complicated then.  We need to define a feature bit and (in the
> Linux driver) propagate it to vring_create_virtqueue's weak_barrier
> argument.  However:
> 
> - if we make it 1 when weak barriers are needed, the device also needs
> to nack feature negotiation (not allow setting the FEATURES_OK) if the
> bit is not set by the driver.
>  However, that is not enough.  Live
> migration assumes that it is okay to migrate a virtual machine from a
> source that doesn't support a feature to a destination that supports it.
>  In this case, it would assume that it is okay to migrate from software
> virtio to hardware virtio.  This is wrong because the destination would
> use weak barriers

You can't migrate between systems with different sets of device features
right now.

> - if we make it 1 when strong barriers are enough, software virtio
> devices needs to be updated to expose the bit.  This works, including
> live migration, but updated drivers will now go slower when run against
> an old device that doesn't know the feature bit.
> 
> Maybe bump the PCI revision, so that only the new revision has the bit?
> 
> Thanks,
> 
> Paolo

As a first step, if you want to migrate to a HW offloaded solution
then you need to enable the feature. It does mean it will go
a bit slower when run with software, so it's only good if
most systems in your cluster do have the HW offload.
I think we can start by getting that working and
think about ways to improve down the road.


That's the usecase we designed FEATURES_OK for though, so I do
think/hope it's enough and we don't need to play with revisions.


-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 12:43         ` [virtio-dev] " Liang, Cunming
@ 2018-04-19 15:42           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-19 15:42 UTC (permalink / raw)
  To: Liang, Cunming
  Cc: Bie, Tiwei, jasowang, alex.williamson, pbonzini, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On Thu, Apr 19, 2018 at 12:43:42PM +0000, Liang, Cunming wrote:
> 
> 
> > -----Original Message-----
> > From: Bie, Tiwei
> > Sent: Thursday, April 19, 2018 7:15 PM
> > To: Michael S. Tsirkin <mst@redhat.com>
> > Cc: jasowang@redhat.com; alex.williamson@redhat.com; pbonzini@redhat.com;
> > stefanha@redhat.com; qemu-devel@nongnu.org; virtio-dev@lists.oasis-
> > open.org; Liang, Cunming <cunming.liang@intel.com>; Daly, Dan
> > <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> > <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> > Subject: Re: [PATCH v3 6/6] vhost-user: support registering external host
> > notifiers
> > 
> > On Wed, Apr 18, 2018 at 07:34:06PM +0300, Michael S. Tsirkin wrote:
> > > On Thu, Apr 12, 2018 at 11:12:32PM +0800, Tiwei Bie wrote:
> > > > This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
> > > > With this feature negotiated, vhost-user backend can register memory
> > > > region based host notifiers. And it will allow the guest driver in
> > > > the VM to notify the hardware accelerator at the vhost-user backend
> > > > directly.
> > > >
> > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > >
> > > Overall I think we can merge this approach, but I have two main
> > > concerns about this:
> > >
> > > 1. Testing. Most people do not have the virtio hardware
> > >    so how to make sure this does not bit rot?
> > >
> > >    I have an idea: add an option like this to libvhost-user.
> > >    Naturally libvhost-user can not get notified about a write
> > >    to an mmapped area, but it can write a special value there
> > >    (e.g. all-ones?) and then poll it to detect VQ # writes.
> > >
> > >    Then include a vhost user bridge test with an option like this.
> > >
> > >    I'd like to see a patch doing this.
> > 
> > Sure, I'll do it. Thanks for the suggestion!
> > 
> > >
> > > 2. Memory barriers. Right now after updating the avail idx,
> > >    virtio does smp_wmb() and then the MMIO write.
> > >    Normal hardware drivers do wmb() which is an sfence.
> > >    Can a PCI device read bypass index write and see a stale
> > >    index value?
> A compiler barrier is enough on strongly-ordered memory platform. As it doesn't re-order store, PCI device won't see a stale index value. But a weakly-ordered memory needs sfence.


Oh you are right.

So it's only needed for non-intel platforms or when packets are in WC
memory then. And I don't know whether dpdk ever puts packets in WC memory.

I guess we'll cross this bridge when we get to it.


> > 
> > It depends on arch's memory model. Cunming will provide more details about
> > this later.
> > 
> > >    To make virtio pci do wmb() we would need a new feature bit.
> > >    Alternatively I guess we could maybe look at subsystem vendor/device id.
> > >
> > >    I'd like to see a patch doing one of these things.
> > 
> > We prefer to add a new feature bit as it's a more robust way to do this. I'll send
> > out some patches soon.
> > 
> > Thank you very much! :)
> > 
> > Best regards,
> > Tiwei Bie
> > 
> > >
> > > Thanks!
> > >
> > > > ---
> > > >  docs/interop/vhost-user.txt    |  33 +++++++++++
> > > >  hw/virtio/vhost-user.c         | 123
> > +++++++++++++++++++++++++++++++++++++++++
> > > >  include/hw/virtio/vhost-user.h |   8 +++
> > > >  3 files changed, 164 insertions(+)
> > > >
> > > > diff --git a/docs/interop/vhost-user.txt
> > > > b/docs/interop/vhost-user.txt index 534caab18a..9e57b36b20 100644
> > > > --- a/docs/interop/vhost-user.txt
> > > > +++ b/docs/interop/vhost-user.txt
> > > > @@ -132,6 +132,16 @@ Depending on the request type, payload can be:
> > > >     Payload: Size bytes array holding the contents of the virtio
> > > >         device's configuration space
> > > >
> > > > + * Vring area description
> > > > +   -----------------------
> > > > +   | u64 | size | offset |
> > > > +   -----------------------
> > > > +
> > > > +   u64: a 64-bit integer contains vring index and flags
> > > > +   Size: a 64-bit size of this area
> > > > +   Offset: a 64-bit offset of this area from the start of the
> > > > +       supplied file descriptor
> > > > +
> > > >  In QEMU the vhost-user message is implemented with the following struct:
> > > >
> > > >  typedef struct VhostUserMsg {
> > > > @@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
> > > >          VhostUserLog log;
> > > >          struct vhost_iotlb_msg iotlb;
> > > >          VhostUserConfig config;
> > > > +        VhostUserVringArea area;
> > > >      };
> > > >  } QEMU_PACKED VhostUserMsg;
> > > >
> > > > @@ -380,6 +391,7 @@ Protocol features  #define
> > > > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> > > >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> > > >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > > > +#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
> > > >
> > > >  Master message types
> > > >  --------------------
> > > > @@ -777,6 +789,27 @@ Slave message types
> > > >       the VHOST_USER_NEED_REPLY flag, master must respond with zero when
> > > >       operation is successfully completed, or non-zero otherwise.
> > > >
> > > > + * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
> > > > +
> > > > +      Id: 3
> > > > +      Equivalent ioctl: N/A
> > > > +      Slave payload: vring area description
> > > > +      Master payload: N/A
> > > > +
> > > > +      Sets host notifier for a specified queue. The queue index is contained
> > > > +      in the u64 field of the vring area description. The host notifier is
> > > > +      described by the file descriptor (typically it's a VFIO device fd) which
> > > > +      is passed as ancillary data and the size (which is mmap size and should
> > > > +      be the same as host page size) and offset (which is mmap offset) carried
> > > > +      in the vring area description. QEMU can mmap the file descriptor based
> > > > +      on the size and offset to get a memory range. Registering a host notifier
> > > > +      means mapping this memory range to the VM as the specified queue's
> > notify
> > > > +      MMIO region. Slave sends this request to tell QEMU to de-register the
> > > > +      existing notifier if any and register the new notifier if the request is
> > > > +      sent with a file descriptor.
> > > > +      This request should be sent only when
> > VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
> > > > +      protocol feature has been successfully negotiated.
> > > > +
> > > >  VHOST_USER_PROTOCOL_F_REPLY_ACK:
> > > >  -------------------------------
> > > >  The original vhost-user specification only demands replies for
> > > > certain diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > > index 791e0a4763..1cd9c7276b 100644
> > > > --- a/hw/virtio/vhost-user.c
> > > > +++ b/hw/virtio/vhost-user.c
> > > > @@ -13,6 +13,7 @@
> > > >  #include "hw/virtio/vhost.h"
> > > >  #include "hw/virtio/vhost-user.h"
> > > >  #include "hw/virtio/vhost-backend.h"
> > > > +#include "hw/virtio/virtio.h"
> > > >  #include "hw/virtio/virtio-net.h"
> > > >  #include "chardev/char-fe.h"
> > > >  #include "sysemu/kvm.h"
> > > > @@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
> > > >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> > > >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> > > >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > > > +    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
> > > >      VHOST_USER_PROTOCOL_F_MAX
> > > >  };
> > > >
> > > > @@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
> > > >      VHOST_USER_SLAVE_NONE = 0,
> > > >      VHOST_USER_SLAVE_IOTLB_MSG = 1,
> > > >      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> > > > +    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
> > > >      VHOST_USER_SLAVE_MAX
> > > >  }  VhostUserSlaveRequest;
> > > >
> > > > @@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__ ((unused));
> > > >                                     + sizeof(c.size) \
> > > >                                     + sizeof(c.flags))
> > > >
> > > > +typedef struct VhostUserVringArea {
> > > > +    uint64_t u64;
> > > > +    uint64_t size;
> > > > +    uint64_t offset;
> > > > +} VhostUserVringArea;
> > > > +
> > > >  typedef struct {
> > > >      VhostUserRequest request;
> > > >
> > > > @@ -157,6 +166,7 @@ typedef union {
> > > >          struct vhost_iotlb_msg iotlb;
> > > >          VhostUserConfig config;
> > > >          VhostUserCryptoSession session;
> > > > +        VhostUserVringArea area;
> > > >  } VhostUserPayload;
> > > >
> > > >  typedef struct VhostUserMsg {
> > > > @@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct
> > vhost_dev *dev,
> > > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);  }
> > > >
> > > > +static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
> > > > +                                             int queue_idx) {
> > > > +    struct vhost_user *u = dev->opaque;
> > > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > > +    VirtIODevice *vdev = dev->vdev;
> > > > +
> > > > +    if (n->addr && !n->set) {
> > > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
> > > > +        n->set = true;
> > > > +    }
> > > > +}
> > > > +
> > > > +static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
> > > > +                                            int queue_idx) {
> > > > +    struct vhost_user *u = dev->opaque;
> > > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > > +    VirtIODevice *vdev = dev->vdev;
> > > > +
> > > > +    if (n->addr && n->set) {
> > > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > > > +        n->set = false;
> > > > +    }
> > > > +}
> > > > +
> > > >  static int vhost_user_set_vring_base(struct vhost_dev *dev,
> > > >                                       struct vhost_vring_state
> > > > *ring)  {
> > > > +    vhost_user_host_notifier_restore(dev, ring->index);
> > > > +
> > > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
> > > > }
> > > >
> > > > @@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct
> > vhost_dev *dev,
> > > >          .hdr.size = sizeof(msg.payload.state),
> > > >      };
> > > >
> > > > +    vhost_user_host_notifier_remove(dev, ring->index);
> > > > +
> > > >      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > > >          return -1;
> > > >      }
> > > > @@ -847,6 +887,76 @@ static int
> > vhost_user_slave_handle_config_change(struct vhost_dev *dev)
> > > >      return ret;
> > > >  }
> > > >
> > > > +static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev
> > *dev,
> > > > +                                                       VhostUserVringArea *area,
> > > > +                                                       int fd) {
> > > > +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> > > > +    size_t page_size = qemu_real_host_page_size;
> > > > +    struct vhost_user *u = dev->opaque;
> > > > +    VhostUserState *user = u->user;
> > > > +    VirtIODevice *vdev = dev->vdev;
> > > > +    VhostUserHostNotifier *n;
> > > > +    int ret = 0;
> > > > +    void *addr;
> > > > +    char *name;
> > > > +
> > > > +    if (!virtio_has_feature(dev->protocol_features,
> > > > +                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
> > > > +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> > > > +        ret = -1;
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    n = &user->notifier[queue_idx];
> > > > +
> > > > +    if (n->addr) {
> > > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > > > +        object_unparent(OBJECT(&n->mr));
> > > > +        munmap(n->addr, page_size);
> > > > +        n->addr = NULL;
> > > > +    }
> > > > +
> > > > +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    /* Sanity check. */
> > > > +    if (area->size != page_size) {
> > > > +        ret = -1;
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
> > > > +                fd, area->offset);
> > > > +    if (addr == MAP_FAILED) {
> > > > +        ret = -1;
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
> > > > +                           user, queue_idx);
> > > > +    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
> > > > +                                      page_size, addr);
> > > > +    g_free(name);
> > > > +
> > > > +    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
> > > > +        munmap(addr, page_size);
> > > > +        ret = -1;
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    n->addr = addr;
> > > > +    n->set = true;
> > > > +
> > > > +out:
> > > > +    /* Always close the fd. */
> > > > +    if (fd != -1) {
> > > > +        close(fd);
> > > > +    }
> > > > +    return ret;
> > > > +}
> > > > +
> > > >  static void slave_read(void *opaque)  {
> > > >      struct vhost_dev *dev = opaque; @@ -913,6 +1023,10 @@ static
> > > > void slave_read(void *opaque)
> > > >      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
> > > >          ret = vhost_user_slave_handle_config_change(dev);
> > > >          break;
> > > > +    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
> > > > +        ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
> > > > +                                                          fd);
> > > > +        break;
> > > >      default:
> > > >          error_report("Received unexpected msg type.");
> > > >          if (fd != -1) {
> > > > @@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
> > > >
> > > >  void vhost_user_cleanup(VhostUserState *user)  {
> > > > +    int i;
> > > > +
> > > > +    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> > > > +        if (user->notifier[i].addr) {
> > > > +            object_unparent(OBJECT(&user->notifier[i].mr));
> > > > +            munmap(user->notifier[i].addr, qemu_real_host_page_size);
> > > > +            user->notifier[i].addr = NULL;
> > > > +        }
> > > > +    }
> > > >  }
> > > >
> > > >  const VhostOps user_ops = {
> > > > diff --git a/include/hw/virtio/vhost-user.h
> > > > b/include/hw/virtio/vhost-user.h index eb8bc0d90d..fd660393a0 100644
> > > > --- a/include/hw/virtio/vhost-user.h
> > > > +++ b/include/hw/virtio/vhost-user.h
> > > > @@ -9,9 +9,17 @@
> > > >  #define HW_VIRTIO_VHOST_USER_H
> > > >
> > > >  #include "chardev/char-fe.h"
> > > > +#include "hw/virtio/virtio.h"
> > > > +
> > > > +typedef struct VhostUserHostNotifier {
> > > > +    MemoryRegion mr;
> > > > +    void *addr;
> > > > +    bool set;
> > > > +} VhostUserHostNotifier;
> > > >
> > > >  typedef struct VhostUserState {
> > > >      CharBackend *chr;
> > > > +    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > > >  } VhostUserState;
> > > >
> > > >  VhostUserState *vhost_user_init(void);
> > > > --
> > > > 2.11.0

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 15:42           ` Michael S. Tsirkin
  0 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-19 15:42 UTC (permalink / raw)
  To: Liang, Cunming
  Cc: Bie, Tiwei, jasowang, alex.williamson, pbonzini, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On Thu, Apr 19, 2018 at 12:43:42PM +0000, Liang, Cunming wrote:
> 
> 
> > -----Original Message-----
> > From: Bie, Tiwei
> > Sent: Thursday, April 19, 2018 7:15 PM
> > To: Michael S. Tsirkin <mst@redhat.com>
> > Cc: jasowang@redhat.com; alex.williamson@redhat.com; pbonzini@redhat.com;
> > stefanha@redhat.com; qemu-devel@nongnu.org; virtio-dev@lists.oasis-
> > open.org; Liang, Cunming <cunming.liang@intel.com>; Daly, Dan
> > <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> > <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> > Subject: Re: [PATCH v3 6/6] vhost-user: support registering external host
> > notifiers
> > 
> > On Wed, Apr 18, 2018 at 07:34:06PM +0300, Michael S. Tsirkin wrote:
> > > On Thu, Apr 12, 2018 at 11:12:32PM +0800, Tiwei Bie wrote:
> > > > This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
> > > > With this feature negotiated, vhost-user backend can register memory
> > > > region based host notifiers. And it will allow the guest driver in
> > > > the VM to notify the hardware accelerator at the vhost-user backend
> > > > directly.
> > > >
> > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > >
> > > Overall I think we can merge this approach, but I have two main
> > > concerns about this:
> > >
> > > 1. Testing. Most people do not have the virtio hardware
> > >    so how to make sure this does not bit rot?
> > >
> > >    I have an idea: add an option like this to libvhost-user.
> > >    Naturally libvhost-user can not get notified about a write
> > >    to an mmapped area, but it can write a special value there
> > >    (e.g. all-ones?) and then poll it to detect VQ # writes.
> > >
> > >    Then include a vhost user bridge test with an option like this.
> > >
> > >    I'd like to see a patch doing this.
> > 
> > Sure, I'll do it. Thanks for the suggestion!
> > 
> > >
> > > 2. Memory barriers. Right now after updating the avail idx,
> > >    virtio does smp_wmb() and then the MMIO write.
> > >    Normal hardware drivers do wmb() which is an sfence.
> > >    Can a PCI device read bypass index write and see a stale
> > >    index value?
> A compiler barrier is enough on strongly-ordered memory platform. As it doesn't re-order store, PCI device won't see a stale index value. But a weakly-ordered memory needs sfence.


Oh you are right.

So it's only needed for non-intel platforms or when packets are in WC
memory then. And I don't know whether dpdk ever puts packets in WC memory.

I guess we'll cross this bridge when we get to it.


> > 
> > It depends on arch's memory model. Cunming will provide more details about
> > this later.
> > 
> > >    To make virtio pci do wmb() we would need a new feature bit.
> > >    Alternatively I guess we could maybe look at subsystem vendor/device id.
> > >
> > >    I'd like to see a patch doing one of these things.
> > 
> > We prefer to add a new feature bit as it's a more robust way to do this. I'll send
> > out some patches soon.
> > 
> > Thank you very much! :)
> > 
> > Best regards,
> > Tiwei Bie
> > 
> > >
> > > Thanks!
> > >
> > > > ---
> > > >  docs/interop/vhost-user.txt    |  33 +++++++++++
> > > >  hw/virtio/vhost-user.c         | 123
> > +++++++++++++++++++++++++++++++++++++++++
> > > >  include/hw/virtio/vhost-user.h |   8 +++
> > > >  3 files changed, 164 insertions(+)
> > > >
> > > > diff --git a/docs/interop/vhost-user.txt
> > > > b/docs/interop/vhost-user.txt index 534caab18a..9e57b36b20 100644
> > > > --- a/docs/interop/vhost-user.txt
> > > > +++ b/docs/interop/vhost-user.txt
> > > > @@ -132,6 +132,16 @@ Depending on the request type, payload can be:
> > > >     Payload: Size bytes array holding the contents of the virtio
> > > >         device's configuration space
> > > >
> > > > + * Vring area description
> > > > +   -----------------------
> > > > +   | u64 | size | offset |
> > > > +   -----------------------
> > > > +
> > > > +   u64: a 64-bit integer contains vring index and flags
> > > > +   Size: a 64-bit size of this area
> > > > +   Offset: a 64-bit offset of this area from the start of the
> > > > +       supplied file descriptor
> > > > +
> > > >  In QEMU the vhost-user message is implemented with the following struct:
> > > >
> > > >  typedef struct VhostUserMsg {
> > > > @@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
> > > >          VhostUserLog log;
> > > >          struct vhost_iotlb_msg iotlb;
> > > >          VhostUserConfig config;
> > > > +        VhostUserVringArea area;
> > > >      };
> > > >  } QEMU_PACKED VhostUserMsg;
> > > >
> > > > @@ -380,6 +391,7 @@ Protocol features  #define
> > > > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> > > >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> > > >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > > > +#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
> > > >
> > > >  Master message types
> > > >  --------------------
> > > > @@ -777,6 +789,27 @@ Slave message types
> > > >       the VHOST_USER_NEED_REPLY flag, master must respond with zero when
> > > >       operation is successfully completed, or non-zero otherwise.
> > > >
> > > > + * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
> > > > +
> > > > +      Id: 3
> > > > +      Equivalent ioctl: N/A
> > > > +      Slave payload: vring area description
> > > > +      Master payload: N/A
> > > > +
> > > > +      Sets host notifier for a specified queue. The queue index is contained
> > > > +      in the u64 field of the vring area description. The host notifier is
> > > > +      described by the file descriptor (typically it's a VFIO device fd) which
> > > > +      is passed as ancillary data and the size (which is mmap size and should
> > > > +      be the same as host page size) and offset (which is mmap offset) carried
> > > > +      in the vring area description. QEMU can mmap the file descriptor based
> > > > +      on the size and offset to get a memory range. Registering a host notifier
> > > > +      means mapping this memory range to the VM as the specified queue's
> > notify
> > > > +      MMIO region. Slave sends this request to tell QEMU to de-register the
> > > > +      existing notifier if any and register the new notifier if the request is
> > > > +      sent with a file descriptor.
> > > > +      This request should be sent only when
> > VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
> > > > +      protocol feature has been successfully negotiated.
> > > > +
> > > >  VHOST_USER_PROTOCOL_F_REPLY_ACK:
> > > >  -------------------------------
> > > >  The original vhost-user specification only demands replies for
> > > > certain diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > > index 791e0a4763..1cd9c7276b 100644
> > > > --- a/hw/virtio/vhost-user.c
> > > > +++ b/hw/virtio/vhost-user.c
> > > > @@ -13,6 +13,7 @@
> > > >  #include "hw/virtio/vhost.h"
> > > >  #include "hw/virtio/vhost-user.h"
> > > >  #include "hw/virtio/vhost-backend.h"
> > > > +#include "hw/virtio/virtio.h"
> > > >  #include "hw/virtio/virtio-net.h"
> > > >  #include "chardev/char-fe.h"
> > > >  #include "sysemu/kvm.h"
> > > > @@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
> > > >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> > > >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> > > >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > > > +    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
> > > >      VHOST_USER_PROTOCOL_F_MAX
> > > >  };
> > > >
> > > > @@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
> > > >      VHOST_USER_SLAVE_NONE = 0,
> > > >      VHOST_USER_SLAVE_IOTLB_MSG = 1,
> > > >      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> > > > +    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
> > > >      VHOST_USER_SLAVE_MAX
> > > >  }  VhostUserSlaveRequest;
> > > >
> > > > @@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__ ((unused));
> > > >                                     + sizeof(c.size) \
> > > >                                     + sizeof(c.flags))
> > > >
> > > > +typedef struct VhostUserVringArea {
> > > > +    uint64_t u64;
> > > > +    uint64_t size;
> > > > +    uint64_t offset;
> > > > +} VhostUserVringArea;
> > > > +
> > > >  typedef struct {
> > > >      VhostUserRequest request;
> > > >
> > > > @@ -157,6 +166,7 @@ typedef union {
> > > >          struct vhost_iotlb_msg iotlb;
> > > >          VhostUserConfig config;
> > > >          VhostUserCryptoSession session;
> > > > +        VhostUserVringArea area;
> > > >  } VhostUserPayload;
> > > >
> > > >  typedef struct VhostUserMsg {
> > > > @@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct
> > vhost_dev *dev,
> > > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);  }
> > > >
> > > > +static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
> > > > +                                             int queue_idx) {
> > > > +    struct vhost_user *u = dev->opaque;
> > > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > > +    VirtIODevice *vdev = dev->vdev;
> > > > +
> > > > +    if (n->addr && !n->set) {
> > > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
> > > > +        n->set = true;
> > > > +    }
> > > > +}
> > > > +
> > > > +static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
> > > > +                                            int queue_idx) {
> > > > +    struct vhost_user *u = dev->opaque;
> > > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > > +    VirtIODevice *vdev = dev->vdev;
> > > > +
> > > > +    if (n->addr && n->set) {
> > > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > > > +        n->set = false;
> > > > +    }
> > > > +}
> > > > +
> > > >  static int vhost_user_set_vring_base(struct vhost_dev *dev,
> > > >                                       struct vhost_vring_state
> > > > *ring)  {
> > > > +    vhost_user_host_notifier_restore(dev, ring->index);
> > > > +
> > > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
> > > > }
> > > >
> > > > @@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct
> > vhost_dev *dev,
> > > >          .hdr.size = sizeof(msg.payload.state),
> > > >      };
> > > >
> > > > +    vhost_user_host_notifier_remove(dev, ring->index);
> > > > +
> > > >      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > > >          return -1;
> > > >      }
> > > > @@ -847,6 +887,76 @@ static int
> > vhost_user_slave_handle_config_change(struct vhost_dev *dev)
> > > >      return ret;
> > > >  }
> > > >
> > > > +static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev
> > *dev,
> > > > +                                                       VhostUserVringArea *area,
> > > > +                                                       int fd) {
> > > > +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> > > > +    size_t page_size = qemu_real_host_page_size;
> > > > +    struct vhost_user *u = dev->opaque;
> > > > +    VhostUserState *user = u->user;
> > > > +    VirtIODevice *vdev = dev->vdev;
> > > > +    VhostUserHostNotifier *n;
> > > > +    int ret = 0;
> > > > +    void *addr;
> > > > +    char *name;
> > > > +
> > > > +    if (!virtio_has_feature(dev->protocol_features,
> > > > +                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
> > > > +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> > > > +        ret = -1;
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    n = &user->notifier[queue_idx];
> > > > +
> > > > +    if (n->addr) {
> > > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > > > +        object_unparent(OBJECT(&n->mr));
> > > > +        munmap(n->addr, page_size);
> > > > +        n->addr = NULL;
> > > > +    }
> > > > +
> > > > +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    /* Sanity check. */
> > > > +    if (area->size != page_size) {
> > > > +        ret = -1;
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
> > > > +                fd, area->offset);
> > > > +    if (addr == MAP_FAILED) {
> > > > +        ret = -1;
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
> > > > +                           user, queue_idx);
> > > > +    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
> > > > +                                      page_size, addr);
> > > > +    g_free(name);
> > > > +
> > > > +    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
> > > > +        munmap(addr, page_size);
> > > > +        ret = -1;
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    n->addr = addr;
> > > > +    n->set = true;
> > > > +
> > > > +out:
> > > > +    /* Always close the fd. */
> > > > +    if (fd != -1) {
> > > > +        close(fd);
> > > > +    }
> > > > +    return ret;
> > > > +}
> > > > +
> > > >  static void slave_read(void *opaque)  {
> > > >      struct vhost_dev *dev = opaque; @@ -913,6 +1023,10 @@ static
> > > > void slave_read(void *opaque)
> > > >      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
> > > >          ret = vhost_user_slave_handle_config_change(dev);
> > > >          break;
> > > > +    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
> > > > +        ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
> > > > +                                                          fd);
> > > > +        break;
> > > >      default:
> > > >          error_report("Received unexpected msg type.");
> > > >          if (fd != -1) {
> > > > @@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
> > > >
> > > >  void vhost_user_cleanup(VhostUserState *user)  {
> > > > +    int i;
> > > > +
> > > > +    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> > > > +        if (user->notifier[i].addr) {
> > > > +            object_unparent(OBJECT(&user->notifier[i].mr));
> > > > +            munmap(user->notifier[i].addr, qemu_real_host_page_size);
> > > > +            user->notifier[i].addr = NULL;
> > > > +        }
> > > > +    }
> > > >  }
> > > >
> > > >  const VhostOps user_ops = {
> > > > diff --git a/include/hw/virtio/vhost-user.h
> > > > b/include/hw/virtio/vhost-user.h index eb8bc0d90d..fd660393a0 100644
> > > > --- a/include/hw/virtio/vhost-user.h
> > > > +++ b/include/hw/virtio/vhost-user.h
> > > > @@ -9,9 +9,17 @@
> > > >  #define HW_VIRTIO_VHOST_USER_H
> > > >
> > > >  #include "chardev/char-fe.h"
> > > > +#include "hw/virtio/virtio.h"
> > > > +
> > > > +typedef struct VhostUserHostNotifier {
> > > > +    MemoryRegion mr;
> > > > +    void *addr;
> > > > +    bool set;
> > > > +} VhostUserHostNotifier;
> > > >
> > > >  typedef struct VhostUserState {
> > > >      CharBackend *chr;
> > > > +    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > > >  } VhostUserState;
> > > >
> > > >  VhostUserState *vhost_user_init(void);
> > > > --
> > > > 2.11.0

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 15:19             ` Michael S. Tsirkin
@ 2018-04-19 15:51               ` Paolo Bonzini
  -1 siblings, 0 replies; 98+ messages in thread
From: Paolo Bonzini @ 2018-04-19 15:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Liang, Cunming, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On 19/04/2018 17:19, Michael S. Tsirkin wrote:
>> - if we make it 1 when weak barriers are needed, the device also needs
>> to nack feature negotiation (not allow setting the FEATURES_OK) if the
>> bit is not set by the driver.
>>  However, that is not enough.  Live
>> migration assumes that it is okay to migrate a virtual machine from a
>> source that doesn't support a feature to a destination that supports it.
>>  In this case, it would assume that it is okay to migrate from software
>> virtio to hardware virtio.  This is wrong because the destination would
>> use weak barriers
> 
> You can't migrate between systems with different sets of device features
> right now.

Yes, you can, exactly because some features are defined not by the
machine type but rather by the host kernel.  See virtio_net_get_features
in QEMU's hw/virtio/virtio-net.c, and virtio_set_features_nocheck in
QEMU's hw/virtio/virtio.c.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 15:51               ` Paolo Bonzini
  0 siblings, 0 replies; 98+ messages in thread
From: Paolo Bonzini @ 2018-04-19 15:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Liang, Cunming, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On 19/04/2018 17:19, Michael S. Tsirkin wrote:
>> - if we make it 1 when weak barriers are needed, the device also needs
>> to nack feature negotiation (not allow setting the FEATURES_OK) if the
>> bit is not set by the driver.
>>  However, that is not enough.  Live
>> migration assumes that it is okay to migrate a virtual machine from a
>> source that doesn't support a feature to a destination that supports it.
>>  In this case, it would assume that it is okay to migrate from software
>> virtio to hardware virtio.  This is wrong because the destination would
>> use weak barriers
> 
> You can't migrate between systems with different sets of device features
> right now.

Yes, you can, exactly because some features are defined not by the
machine type but rather by the host kernel.  See virtio_net_get_features
in QEMU's hw/virtio/virtio-net.c, and virtio_set_features_nocheck in
QEMU's hw/virtio/virtio.c.

Thanks,

Paolo

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 15:42           ` [virtio-dev] " Michael S. Tsirkin
@ 2018-04-19 15:52             ` Paolo Bonzini
  -1 siblings, 0 replies; 98+ messages in thread
From: Paolo Bonzini @ 2018-04-19 15:52 UTC (permalink / raw)
  To: Michael S. Tsirkin, Liang, Cunming
  Cc: Bie, Tiwei, jasowang, alex.williamson, stefanha, qemu-devel,
	virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong, Wang,
	Xiao W

On 19/04/2018 17:42, Michael S. Tsirkin wrote:
>> A compiler barrier is enough on strongly-ordered memory platform.
>> As it doesn't re-order store, PCI device won't see a stale index
>> value. But a weakly-ordered memory needs sfence.
> 
> Oh you are right.
> 
> So it's only needed for non-intel platforms or when packets are in
> WC memory then. And I don't know whether dpdk ever puts packets in WC
> memory.
> 
> I guess we'll cross this bridge when we get to it.

Non-TSO architectures seem important...

Paolo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 15:52             ` Paolo Bonzini
  0 siblings, 0 replies; 98+ messages in thread
From: Paolo Bonzini @ 2018-04-19 15:52 UTC (permalink / raw)
  To: Michael S. Tsirkin, Liang, Cunming
  Cc: Bie, Tiwei, jasowang, alex.williamson, stefanha, qemu-devel,
	virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong, Wang,
	Xiao W

On 19/04/2018 17:42, Michael S. Tsirkin wrote:
>> A compiler barrier is enough on strongly-ordered memory platform.
>> As it doesn't re-order store, PCI device won't see a stale index
>> value. But a weakly-ordered memory needs sfence.
> 
> Oh you are right.
> 
> So it's only needed for non-intel platforms or when packets are in
> WC memory then. And I don't know whether dpdk ever puts packets in WC
> memory.
> 
> I guess we'll cross this bridge when we get to it.

Non-TSO architectures seem important...

Paolo

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 15:51               ` Paolo Bonzini
@ 2018-04-19 15:59                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-19 15:59 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liang, Cunming, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On Thu, Apr 19, 2018 at 05:51:51PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 17:19, Michael S. Tsirkin wrote:
> >> - if we make it 1 when weak barriers are needed, the device also needs
> >> to nack feature negotiation (not allow setting the FEATURES_OK) if the
> >> bit is not set by the driver.
> >>  However, that is not enough.  Live
> >> migration assumes that it is okay to migrate a virtual machine from a
> >> source that doesn't support a feature to a destination that supports it.
> >>  In this case, it would assume that it is okay to migrate from software
> >> virtio to hardware virtio.  This is wrong because the destination would
> >> use weak barriers
> > 
> > You can't migrate between systems with different sets of device features
> > right now.
> 
> Yes, you can, exactly because some features are defined not by the
> machine type but rather by the host kernel.  See virtio_net_get_features
> in QEMU's hw/virtio/virtio-net.c, and virtio_set_features_nocheck in
> QEMU's hw/virtio/virtio.c.
> 
> Thanks,
> 
> Paolo

Oh you are right. Well we can just special-case that one :)

-- 
MST

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 15:59                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-19 15:59 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liang, Cunming, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On Thu, Apr 19, 2018 at 05:51:51PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 17:19, Michael S. Tsirkin wrote:
> >> - if we make it 1 when weak barriers are needed, the device also needs
> >> to nack feature negotiation (not allow setting the FEATURES_OK) if the
> >> bit is not set by the driver.
> >>  However, that is not enough.  Live
> >> migration assumes that it is okay to migrate a virtual machine from a
> >> source that doesn't support a feature to a destination that supports it.
> >>  In this case, it would assume that it is okay to migrate from software
> >> virtio to hardware virtio.  This is wrong because the destination would
> >> use weak barriers
> > 
> > You can't migrate between systems with different sets of device features
> > right now.
> 
> Yes, you can, exactly because some features are defined not by the
> machine type but rather by the host kernel.  See virtio_net_get_features
> in QEMU's hw/virtio/virtio-net.c, and virtio_set_features_nocheck in
> QEMU's hw/virtio/virtio.c.
> 
> Thanks,
> 
> Paolo

Oh you are right. Well we can just special-case that one :)

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 15:59                 ` Michael S. Tsirkin
@ 2018-04-19 16:07                   ` Paolo Bonzini
  -1 siblings, 0 replies; 98+ messages in thread
From: Paolo Bonzini @ 2018-04-19 16:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Liang, Cunming, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On 19/04/2018 17:59, Michael S. Tsirkin wrote:
> On Thu, Apr 19, 2018 at 05:51:51PM +0200, Paolo Bonzini wrote:
>> On 19/04/2018 17:19, Michael S. Tsirkin wrote:
>>>> - if we make it 1 when weak barriers are needed, the device also needs
>>>> to nack feature negotiation (not allow setting the FEATURES_OK) if the
>>>> bit is not set by the driver.
>>>>  However, that is not enough.  Live
>>>> migration assumes that it is okay to migrate a virtual machine from a
>>>> source that doesn't support a feature to a destination that supports it.
>>>>  In this case, it would assume that it is okay to migrate from software
>>>> virtio to hardware virtio.  This is wrong because the destination would
>>>> use weak barriers
>>>
>>> You can't migrate between systems with different sets of device features
>>> right now.
>>
>> Yes, you can, exactly because some features are defined not by the
>> machine type but rather by the host kernel.  See virtio_net_get_features
>> in QEMU's hw/virtio/virtio-net.c, and virtio_set_features_nocheck in
>> QEMU's hw/virtio/virtio.c.
> 
> Oh you are right. Well we can just special-case that one :)

Anything wrong with bumping the PCI revision?

Paolo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 16:07                   ` Paolo Bonzini
  0 siblings, 0 replies; 98+ messages in thread
From: Paolo Bonzini @ 2018-04-19 16:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Liang, Cunming, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On 19/04/2018 17:59, Michael S. Tsirkin wrote:
> On Thu, Apr 19, 2018 at 05:51:51PM +0200, Paolo Bonzini wrote:
>> On 19/04/2018 17:19, Michael S. Tsirkin wrote:
>>>> - if we make it 1 when weak barriers are needed, the device also needs
>>>> to nack feature negotiation (not allow setting the FEATURES_OK) if the
>>>> bit is not set by the driver.
>>>>  However, that is not enough.  Live
>>>> migration assumes that it is okay to migrate a virtual machine from a
>>>> source that doesn't support a feature to a destination that supports it.
>>>>  In this case, it would assume that it is okay to migrate from software
>>>> virtio to hardware virtio.  This is wrong because the destination would
>>>> use weak barriers
>>>
>>> You can't migrate between systems with different sets of device features
>>> right now.
>>
>> Yes, you can, exactly because some features are defined not by the
>> machine type but rather by the host kernel.  See virtio_net_get_features
>> in QEMU's hw/virtio/virtio-net.c, and virtio_set_features_nocheck in
>> QEMU's hw/virtio/virtio.c.
> 
> Oh you are right. Well we can just special-case that one :)

Anything wrong with bumping the PCI revision?

Paolo

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 15:19             ` Michael S. Tsirkin
@ 2018-04-19 16:24               ` Liang, Cunming
  -1 siblings, 0 replies; 98+ messages in thread
From: Liang, Cunming @ 2018-04-19 16:24 UTC (permalink / raw)
  To: Michael S. Tsirkin, Paolo Bonzini
  Cc: Bie, Tiwei, jasowang, alex.williamson, stefanha, qemu-devel,
	virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong, Wang,
	Xiao W



> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Thursday, April 19, 2018 11:19 PM
> To: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Liang, Cunming <cunming.liang@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> jasowang@redhat.com; alex.williamson@redhat.com; stefanha@redhat.com;
> qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering
> external host notifiers
> 
> On Thu, Apr 19, 2018 at 03:02:40PM +0200, Paolo Bonzini wrote:
> > On 19/04/2018 14:43, Liang, Cunming wrote:
> > >> 2. Memory barriers. Right now after updating the avail idx, virtio
> > >> does smp_wmb() and then the MMIO write. Normal hardware drivers do
> > >> wmb() which is an sfence. Can a PCI device read bypass index write
> > >> and see a stale index value?
> > >
> > > A compiler barrier is enough on strongly-ordered memory platform. As
> > > it doesn't re-order store, PCI device won't see a stale index value.
> > > But a weakly-ordered memory needs sfence.
> >
> > That is complicated then.  We need to define a feature bit and (in the
> > Linux driver) propagate it to vring_create_virtqueue's weak_barrier
> > argument.  However:
> >
> > - if we make it 1 when weak barriers are needed, the device also needs
> > to nack feature negotiation (not allow setting the FEATURES_OK) if the
> > bit is not set by the driver.
> >  However, that is not enough.  Live
> > migration assumes that it is okay to migrate a virtual machine from a
> > source that doesn't support a feature to a destination that supports it.
> >  In this case, it would assume that it is okay to migrate from
> > software virtio to hardware virtio.  This is wrong because the
> > destination would use weak barriers
> 
> You can't migrate between systems with different sets of device features right
> now.
> 
> > - if we make it 1 when strong barriers are enough, software virtio
> > devices needs to be updated to expose the bit.  This works, including
> > live migration, but updated drivers will now go slower when run
> > against an old device that doesn't know the feature bit.
> >
> > Maybe bump the PCI revision, so that only the new revision has the bit?
> >
> > Thanks,
> >
> > Paolo
> 
> As a first step, if you want to migrate to a HW offloaded solution then you need
> to enable the feature.

> It does mean it will go a bit slower when run with software,
> so it's only good if most systems in your cluster do have the HW offload.
To clarify a bit more, it's suboptimal to always use mandatory barriers for MMIO. Per strongly-order memory, 'weak barriers' (smp_wmb) is pretty good for MMIO. The tradeoff doesn't always happen, software and HW offload can align on the same page.

> I think we can start by getting that working and think about ways to improve
> down the road.
> 
> 
> That's the usecase we designed FEATURES_OK for though, so I do think/hope it's
> enough and we don't need to play with revisions.
> 
> 
> --
> MST

^ permalink raw reply	[flat|nested] 98+ messages in thread

* RE: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 16:24               ` Liang, Cunming
  0 siblings, 0 replies; 98+ messages in thread
From: Liang, Cunming @ 2018-04-19 16:24 UTC (permalink / raw)
  To: Michael S. Tsirkin, Paolo Bonzini
  Cc: Bie, Tiwei, jasowang, alex.williamson, stefanha, qemu-devel,
	virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong, Wang,
	Xiao W



> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Thursday, April 19, 2018 11:19 PM
> To: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Liang, Cunming <cunming.liang@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> jasowang@redhat.com; alex.williamson@redhat.com; stefanha@redhat.com;
> qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering
> external host notifiers
> 
> On Thu, Apr 19, 2018 at 03:02:40PM +0200, Paolo Bonzini wrote:
> > On 19/04/2018 14:43, Liang, Cunming wrote:
> > >> 2. Memory barriers. Right now after updating the avail idx, virtio
> > >> does smp_wmb() and then the MMIO write. Normal hardware drivers do
> > >> wmb() which is an sfence. Can a PCI device read bypass index write
> > >> and see a stale index value?
> > >
> > > A compiler barrier is enough on strongly-ordered memory platform. As
> > > it doesn't re-order store, PCI device won't see a stale index value.
> > > But a weakly-ordered memory needs sfence.
> >
> > That is complicated then.  We need to define a feature bit and (in the
> > Linux driver) propagate it to vring_create_virtqueue's weak_barrier
> > argument.  However:
> >
> > - if we make it 1 when weak barriers are needed, the device also needs
> > to nack feature negotiation (not allow setting the FEATURES_OK) if the
> > bit is not set by the driver.
> >  However, that is not enough.  Live
> > migration assumes that it is okay to migrate a virtual machine from a
> > source that doesn't support a feature to a destination that supports it.
> >  In this case, it would assume that it is okay to migrate from
> > software virtio to hardware virtio.  This is wrong because the
> > destination would use weak barriers
> 
> You can't migrate between systems with different sets of device features right
> now.
> 
> > - if we make it 1 when strong barriers are enough, software virtio
> > devices needs to be updated to expose the bit.  This works, including
> > live migration, but updated drivers will now go slower when run
> > against an old device that doesn't know the feature bit.
> >
> > Maybe bump the PCI revision, so that only the new revision has the bit?
> >
> > Thanks,
> >
> > Paolo
> 
> As a first step, if you want to migrate to a HW offloaded solution then you need
> to enable the feature.

> It does mean it will go a bit slower when run with software,
> so it's only good if most systems in your cluster do have the HW offload.
To clarify a bit more, it's suboptimal to always use mandatory barriers for MMIO. Per strongly-order memory, 'weak barriers' (smp_wmb) is pretty good for MMIO. The tradeoff doesn't always happen, software and HW offload can align on the same page.

> I think we can start by getting that working and think about ways to improve
> down the road.
> 
> 
> That's the usecase we designed FEATURES_OK for though, so I do think/hope it's
> enough and we don't need to play with revisions.
> 
> 
> --
> MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 15:42           ` [virtio-dev] " Michael S. Tsirkin
@ 2018-04-19 16:27             ` Liang, Cunming
  -1 siblings, 0 replies; 98+ messages in thread
From: Liang, Cunming @ 2018-04-19 16:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Bie, Tiwei, jasowang, alex.williamson, pbonzini, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W



> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Thursday, April 19, 2018 11:43 PM
> To: Liang, Cunming <cunming.liang@intel.com>
> Cc: Bie, Tiwei <tiwei.bie@intel.com>; jasowang@redhat.com;
> alex.williamson@redhat.com; pbonzini@redhat.com; stefanha@redhat.com;
> qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: Re: [PATCH v3 6/6] vhost-user: support registering external host
> notifiers
> 
> On Thu, Apr 19, 2018 at 12:43:42PM +0000, Liang, Cunming wrote:
> >
> >
> > > -----Original Message-----
> > > From: Bie, Tiwei
> > > Sent: Thursday, April 19, 2018 7:15 PM
> > > To: Michael S. Tsirkin <mst@redhat.com>
> > > Cc: jasowang@redhat.com; alex.williamson@redhat.com;
> > > pbonzini@redhat.com; stefanha@redhat.com; qemu-devel@nongnu.org;
> > > virtio-dev@lists.oasis- open.org; Liang, Cunming
> > > <cunming.liang@intel.com>; Daly, Dan <dan.daly@intel.com>; Tan,
> > > Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> > > <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> > > Subject: Re: [PATCH v3 6/6] vhost-user: support registering external
> > > host notifiers
> > >
> > > On Wed, Apr 18, 2018 at 07:34:06PM +0300, Michael S. Tsirkin wrote:
> > > > On Thu, Apr 12, 2018 at 11:12:32PM +0800, Tiwei Bie wrote:
> > > > > This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
> > > > > With this feature negotiated, vhost-user backend can register
> > > > > memory region based host notifiers. And it will allow the guest
> > > > > driver in the VM to notify the hardware accelerator at the
> > > > > vhost-user backend directly.
> > > > >
> > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > >
> > > > Overall I think we can merge this approach, but I have two main
> > > > concerns about this:
> > > >
> > > > 1. Testing. Most people do not have the virtio hardware
> > > >    so how to make sure this does not bit rot?
> > > >
> > > >    I have an idea: add an option like this to libvhost-user.
> > > >    Naturally libvhost-user can not get notified about a write
> > > >    to an mmapped area, but it can write a special value there
> > > >    (e.g. all-ones?) and then poll it to detect VQ # writes.
> > > >
> > > >    Then include a vhost user bridge test with an option like this.
> > > >
> > > >    I'd like to see a patch doing this.
> > >
> > > Sure, I'll do it. Thanks for the suggestion!
> > >
> > > >
> > > > 2. Memory barriers. Right now after updating the avail idx,
> > > >    virtio does smp_wmb() and then the MMIO write.
> > > >    Normal hardware drivers do wmb() which is an sfence.
> > > >    Can a PCI device read bypass index write and see a stale
> > > >    index value?
> > A compiler barrier is enough on strongly-ordered memory platform. As it
> doesn't re-order store, PCI device won't see a stale index value. But a weakly-
> ordered memory needs sfence.
> 
> 
> Oh you are right.
> 
> So it's only needed for non-intel platforms or when packets are in WC memory
> then. And I don't know whether dpdk ever puts packets in WC memory.

No, we haven't use WC memory.

> 
> I guess we'll cross this bridge when we get to it.
> 
> 
> > >
> > > It depends on arch's memory model. Cunming will provide more details
> > > about this later.
> > >
> > > >    To make virtio pci do wmb() we would need a new feature bit.
> > > >    Alternatively I guess we could maybe look at subsystem vendor/device id.
> > > >
> > > >    I'd like to see a patch doing one of these things.
> > >
> > > We prefer to add a new feature bit as it's a more robust way to do
> > > this. I'll send out some patches soon.
> > >
> > > Thank you very much! :)
> > >
> > > Best regards,
> > > Tiwei Bie
> > >
> > > >
> > > > Thanks!
> > > >
> > > > > ---
> > > > >  docs/interop/vhost-user.txt    |  33 +++++++++++
> > > > >  hw/virtio/vhost-user.c         | 123
> > > +++++++++++++++++++++++++++++++++++++++++
> > > > >  include/hw/virtio/vhost-user.h |   8 +++
> > > > >  3 files changed, 164 insertions(+)
> > > > >
> > > > > diff --git a/docs/interop/vhost-user.txt
> > > > > b/docs/interop/vhost-user.txt index 534caab18a..9e57b36b20
> > > > > 100644
> > > > > --- a/docs/interop/vhost-user.txt
> > > > > +++ b/docs/interop/vhost-user.txt
> > > > > @@ -132,6 +132,16 @@ Depending on the request type, payload can be:
> > > > >     Payload: Size bytes array holding the contents of the virtio
> > > > >         device's configuration space
> > > > >
> > > > > + * Vring area description
> > > > > +   -----------------------
> > > > > +   | u64 | size | offset |
> > > > > +   -----------------------
> > > > > +
> > > > > +   u64: a 64-bit integer contains vring index and flags
> > > > > +   Size: a 64-bit size of this area
> > > > > +   Offset: a 64-bit offset of this area from the start of the
> > > > > +       supplied file descriptor
> > > > > +
> > > > >  In QEMU the vhost-user message is implemented with the following
> struct:
> > > > >
> > > > >  typedef struct VhostUserMsg {
> > > > > @@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
> > > > >          VhostUserLog log;
> > > > >          struct vhost_iotlb_msg iotlb;
> > > > >          VhostUserConfig config;
> > > > > +        VhostUserVringArea area;
> > > > >      };
> > > > >  } QEMU_PACKED VhostUserMsg;
> > > > >
> > > > > @@ -380,6 +391,7 @@ Protocol features  #define
> > > > > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> > > > >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> > > > >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > > > > +#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
> > > > >
> > > > >  Master message types
> > > > >  --------------------
> > > > > @@ -777,6 +789,27 @@ Slave message types
> > > > >       the VHOST_USER_NEED_REPLY flag, master must respond with zero
> when
> > > > >       operation is successfully completed, or non-zero otherwise.
> > > > >
> > > > > + * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
> > > > > +
> > > > > +      Id: 3
> > > > > +      Equivalent ioctl: N/A
> > > > > +      Slave payload: vring area description
> > > > > +      Master payload: N/A
> > > > > +
> > > > > +      Sets host notifier for a specified queue. The queue index is contained
> > > > > +      in the u64 field of the vring area description. The host notifier is
> > > > > +      described by the file descriptor (typically it's a VFIO device fd) which
> > > > > +      is passed as ancillary data and the size (which is mmap size and should
> > > > > +      be the same as host page size) and offset (which is mmap offset)
> carried
> > > > > +      in the vring area description. QEMU can mmap the file descriptor
> based
> > > > > +      on the size and offset to get a memory range. Registering a host
> notifier
> > > > > +      means mapping this memory range to the VM as the
> > > > > + specified queue's
> > > notify
> > > > > +      MMIO region. Slave sends this request to tell QEMU to de-register
> the
> > > > > +      existing notifier if any and register the new notifier if the request is
> > > > > +      sent with a file descriptor.
> > > > > +      This request should be sent only when
> > > VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
> > > > > +      protocol feature has been successfully negotiated.
> > > > > +
> > > > >  VHOST_USER_PROTOCOL_F_REPLY_ACK:
> > > > >  -------------------------------  The original vhost-user
> > > > > specification only demands replies for certain diff --git
> > > > > a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index
> > > > > 791e0a4763..1cd9c7276b 100644
> > > > > --- a/hw/virtio/vhost-user.c
> > > > > +++ b/hw/virtio/vhost-user.c
> > > > > @@ -13,6 +13,7 @@
> > > > >  #include "hw/virtio/vhost.h"
> > > > >  #include "hw/virtio/vhost-user.h"
> > > > >  #include "hw/virtio/vhost-backend.h"
> > > > > +#include "hw/virtio/virtio.h"
> > > > >  #include "hw/virtio/virtio-net.h"
> > > > >  #include "chardev/char-fe.h"
> > > > >  #include "sysemu/kvm.h"
> > > > > @@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
> > > > >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> > > > >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> > > > >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > > > > +    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
> > > > >      VHOST_USER_PROTOCOL_F_MAX
> > > > >  };
> > > > >
> > > > > @@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
> > > > >      VHOST_USER_SLAVE_NONE = 0,
> > > > >      VHOST_USER_SLAVE_IOTLB_MSG = 1,
> > > > >      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> > > > > +    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
> > > > >      VHOST_USER_SLAVE_MAX
> > > > >  }  VhostUserSlaveRequest;
> > > > >
> > > > > @@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__
> ((unused));
> > > > >                                     + sizeof(c.size) \
> > > > >                                     + sizeof(c.flags))
> > > > >
> > > > > +typedef struct VhostUserVringArea {
> > > > > +    uint64_t u64;
> > > > > +    uint64_t size;
> > > > > +    uint64_t offset;
> > > > > +} VhostUserVringArea;
> > > > > +
> > > > >  typedef struct {
> > > > >      VhostUserRequest request;
> > > > >
> > > > > @@ -157,6 +166,7 @@ typedef union {
> > > > >          struct vhost_iotlb_msg iotlb;
> > > > >          VhostUserConfig config;
> > > > >          VhostUserCryptoSession session;
> > > > > +        VhostUserVringArea area;
> > > > >  } VhostUserPayload;
> > > > >
> > > > >  typedef struct VhostUserMsg {
> > > > > @@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct
> > > vhost_dev *dev,
> > > > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM,
> > > > > ring);  }
> > > > >
> > > > > +static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
> > > > > +                                             int queue_idx) {
> > > > > +    struct vhost_user *u = dev->opaque;
> > > > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > > > +    VirtIODevice *vdev = dev->vdev;
> > > > > +
> > > > > +    if (n->addr && !n->set) {
> > > > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
> > > > > +        n->set = true;
> > > > > +    }
> > > > > +}
> > > > > +
> > > > > +static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
> > > > > +                                            int queue_idx) {
> > > > > +    struct vhost_user *u = dev->opaque;
> > > > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > > > +    VirtIODevice *vdev = dev->vdev;
> > > > > +
> > > > > +    if (n->addr && n->set) {
> > > > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr,
> false);
> > > > > +        n->set = false;
> > > > > +    }
> > > > > +}
> > > > > +
> > > > >  static int vhost_user_set_vring_base(struct vhost_dev *dev,
> > > > >                                       struct vhost_vring_state
> > > > > *ring)  {
> > > > > +    vhost_user_host_notifier_restore(dev, ring->index);
> > > > > +
> > > > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE,
> > > > > ring); }
> > > > >
> > > > > @@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct
> > > vhost_dev *dev,
> > > > >          .hdr.size = sizeof(msg.payload.state),
> > > > >      };
> > > > >
> > > > > +    vhost_user_host_notifier_remove(dev, ring->index);
> > > > > +
> > > > >      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > > > >          return -1;
> > > > >      }
> > > > > @@ -847,6 +887,76 @@ static int
> > > vhost_user_slave_handle_config_change(struct vhost_dev *dev)
> > > > >      return ret;
> > > > >  }
> > > > >
> > > > > +static int vhost_user_slave_handle_vring_host_notifier(struct
> > > > > +vhost_dev
> > > *dev,
> > > > > +                                                       VhostUserVringArea *area,
> > > > > +                                                       int fd) {
> > > > > +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> > > > > +    size_t page_size = qemu_real_host_page_size;
> > > > > +    struct vhost_user *u = dev->opaque;
> > > > > +    VhostUserState *user = u->user;
> > > > > +    VirtIODevice *vdev = dev->vdev;
> > > > > +    VhostUserHostNotifier *n;
> > > > > +    int ret = 0;
> > > > > +    void *addr;
> > > > > +    char *name;
> > > > > +
> > > > > +    if (!virtio_has_feature(dev->protocol_features,
> > > > > +                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
> > > > > +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> > > > > +        ret = -1;
> > > > > +        goto out;
> > > > > +    }
> > > > > +
> > > > > +    n = &user->notifier[queue_idx];
> > > > > +
> > > > > +    if (n->addr) {
> > > > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr,
> false);
> > > > > +        object_unparent(OBJECT(&n->mr));
> > > > > +        munmap(n->addr, page_size);
> > > > > +        n->addr = NULL;
> > > > > +    }
> > > > > +
> > > > > +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> > > > > +        goto out;
> > > > > +    }
> > > > > +
> > > > > +    /* Sanity check. */
> > > > > +    if (area->size != page_size) {
> > > > > +        ret = -1;
> > > > > +        goto out;
> > > > > +    }
> > > > > +
> > > > > +    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
> MAP_SHARED,
> > > > > +                fd, area->offset);
> > > > > +    if (addr == MAP_FAILED) {
> > > > > +        ret = -1;
> > > > > +        goto out;
> > > > > +    }
> > > > > +
> > > > > +    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
> > > > > +                           user, queue_idx);
> > > > > +    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
> > > > > +                                      page_size, addr);
> > > > > +    g_free(name);
> > > > > +
> > > > > +    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr,
> true)) {
> > > > > +        munmap(addr, page_size);
> > > > > +        ret = -1;
> > > > > +        goto out;
> > > > > +    }
> > > > > +
> > > > > +    n->addr = addr;
> > > > > +    n->set = true;
> > > > > +
> > > > > +out:
> > > > > +    /* Always close the fd. */
> > > > > +    if (fd != -1) {
> > > > > +        close(fd);
> > > > > +    }
> > > > > +    return ret;
> > > > > +}
> > > > > +
> > > > >  static void slave_read(void *opaque)  {
> > > > >      struct vhost_dev *dev = opaque; @@ -913,6 +1023,10 @@
> > > > > static void slave_read(void *opaque)
> > > > >      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
> > > > >          ret = vhost_user_slave_handle_config_change(dev);
> > > > >          break;
> > > > > +    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
> > > > > +        ret = vhost_user_slave_handle_vring_host_notifier(dev,
> &payload.area,
> > > > > +                                                          fd);
> > > > > +        break;
> > > > >      default:
> > > > >          error_report("Received unexpected msg type.");
> > > > >          if (fd != -1) {
> > > > > @@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
> > > > >
> > > > >  void vhost_user_cleanup(VhostUserState *user)  {
> > > > > +    int i;
> > > > > +
> > > > > +    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> > > > > +        if (user->notifier[i].addr) {
> > > > > +            object_unparent(OBJECT(&user->notifier[i].mr));
> > > > > +            munmap(user->notifier[i].addr, qemu_real_host_page_size);
> > > > > +            user->notifier[i].addr = NULL;
> > > > > +        }
> > > > > +    }
> > > > >  }
> > > > >
> > > > >  const VhostOps user_ops = {
> > > > > diff --git a/include/hw/virtio/vhost-user.h
> > > > > b/include/hw/virtio/vhost-user.h index eb8bc0d90d..fd660393a0
> > > > > 100644
> > > > > --- a/include/hw/virtio/vhost-user.h
> > > > > +++ b/include/hw/virtio/vhost-user.h
> > > > > @@ -9,9 +9,17 @@
> > > > >  #define HW_VIRTIO_VHOST_USER_H
> > > > >
> > > > >  #include "chardev/char-fe.h"
> > > > > +#include "hw/virtio/virtio.h"
> > > > > +
> > > > > +typedef struct VhostUserHostNotifier {
> > > > > +    MemoryRegion mr;
> > > > > +    void *addr;
> > > > > +    bool set;
> > > > > +} VhostUserHostNotifier;
> > > > >
> > > > >  typedef struct VhostUserState {
> > > > >      CharBackend *chr;
> > > > > +    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > > > >  } VhostUserState;
> > > > >
> > > > >  VhostUserState *vhost_user_init(void);
> > > > > --
> > > > > 2.11.0

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 16:27             ` Liang, Cunming
  0 siblings, 0 replies; 98+ messages in thread
From: Liang, Cunming @ 2018-04-19 16:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Bie, Tiwei, jasowang, alex.williamson, pbonzini, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W



> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Thursday, April 19, 2018 11:43 PM
> To: Liang, Cunming <cunming.liang@intel.com>
> Cc: Bie, Tiwei <tiwei.bie@intel.com>; jasowang@redhat.com;
> alex.williamson@redhat.com; pbonzini@redhat.com; stefanha@redhat.com;
> qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: Re: [PATCH v3 6/6] vhost-user: support registering external host
> notifiers
> 
> On Thu, Apr 19, 2018 at 12:43:42PM +0000, Liang, Cunming wrote:
> >
> >
> > > -----Original Message-----
> > > From: Bie, Tiwei
> > > Sent: Thursday, April 19, 2018 7:15 PM
> > > To: Michael S. Tsirkin <mst@redhat.com>
> > > Cc: jasowang@redhat.com; alex.williamson@redhat.com;
> > > pbonzini@redhat.com; stefanha@redhat.com; qemu-devel@nongnu.org;
> > > virtio-dev@lists.oasis- open.org; Liang, Cunming
> > > <cunming.liang@intel.com>; Daly, Dan <dan.daly@intel.com>; Tan,
> > > Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> > > <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> > > Subject: Re: [PATCH v3 6/6] vhost-user: support registering external
> > > host notifiers
> > >
> > > On Wed, Apr 18, 2018 at 07:34:06PM +0300, Michael S. Tsirkin wrote:
> > > > On Thu, Apr 12, 2018 at 11:12:32PM +0800, Tiwei Bie wrote:
> > > > > This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
> > > > > With this feature negotiated, vhost-user backend can register
> > > > > memory region based host notifiers. And it will allow the guest
> > > > > driver in the VM to notify the hardware accelerator at the
> > > > > vhost-user backend directly.
> > > > >
> > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > >
> > > > Overall I think we can merge this approach, but I have two main
> > > > concerns about this:
> > > >
> > > > 1. Testing. Most people do not have the virtio hardware
> > > >    so how to make sure this does not bit rot?
> > > >
> > > >    I have an idea: add an option like this to libvhost-user.
> > > >    Naturally libvhost-user can not get notified about a write
> > > >    to an mmapped area, but it can write a special value there
> > > >    (e.g. all-ones?) and then poll it to detect VQ # writes.
> > > >
> > > >    Then include a vhost user bridge test with an option like this.
> > > >
> > > >    I'd like to see a patch doing this.
> > >
> > > Sure, I'll do it. Thanks for the suggestion!
> > >
> > > >
> > > > 2. Memory barriers. Right now after updating the avail idx,
> > > >    virtio does smp_wmb() and then the MMIO write.
> > > >    Normal hardware drivers do wmb() which is an sfence.
> > > >    Can a PCI device read bypass index write and see a stale
> > > >    index value?
> > A compiler barrier is enough on strongly-ordered memory platform. As it
> doesn't re-order store, PCI device won't see a stale index value. But a weakly-
> ordered memory needs sfence.
> 
> 
> Oh you are right.
> 
> So it's only needed for non-intel platforms or when packets are in WC memory
> then. And I don't know whether dpdk ever puts packets in WC memory.

No, we haven't use WC memory.

> 
> I guess we'll cross this bridge when we get to it.
> 
> 
> > >
> > > It depends on arch's memory model. Cunming will provide more details
> > > about this later.
> > >
> > > >    To make virtio pci do wmb() we would need a new feature bit.
> > > >    Alternatively I guess we could maybe look at subsystem vendor/device id.
> > > >
> > > >    I'd like to see a patch doing one of these things.
> > >
> > > We prefer to add a new feature bit as it's a more robust way to do
> > > this. I'll send out some patches soon.
> > >
> > > Thank you very much! :)
> > >
> > > Best regards,
> > > Tiwei Bie
> > >
> > > >
> > > > Thanks!
> > > >
> > > > > ---
> > > > >  docs/interop/vhost-user.txt    |  33 +++++++++++
> > > > >  hw/virtio/vhost-user.c         | 123
> > > +++++++++++++++++++++++++++++++++++++++++
> > > > >  include/hw/virtio/vhost-user.h |   8 +++
> > > > >  3 files changed, 164 insertions(+)
> > > > >
> > > > > diff --git a/docs/interop/vhost-user.txt
> > > > > b/docs/interop/vhost-user.txt index 534caab18a..9e57b36b20
> > > > > 100644
> > > > > --- a/docs/interop/vhost-user.txt
> > > > > +++ b/docs/interop/vhost-user.txt
> > > > > @@ -132,6 +132,16 @@ Depending on the request type, payload can be:
> > > > >     Payload: Size bytes array holding the contents of the virtio
> > > > >         device's configuration space
> > > > >
> > > > > + * Vring area description
> > > > > +   -----------------------
> > > > > +   | u64 | size | offset |
> > > > > +   -----------------------
> > > > > +
> > > > > +   u64: a 64-bit integer contains vring index and flags
> > > > > +   Size: a 64-bit size of this area
> > > > > +   Offset: a 64-bit offset of this area from the start of the
> > > > > +       supplied file descriptor
> > > > > +
> > > > >  In QEMU the vhost-user message is implemented with the following
> struct:
> > > > >
> > > > >  typedef struct VhostUserMsg {
> > > > > @@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
> > > > >          VhostUserLog log;
> > > > >          struct vhost_iotlb_msg iotlb;
> > > > >          VhostUserConfig config;
> > > > > +        VhostUserVringArea area;
> > > > >      };
> > > > >  } QEMU_PACKED VhostUserMsg;
> > > > >
> > > > > @@ -380,6 +391,7 @@ Protocol features  #define
> > > > > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> > > > >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> > > > >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > > > > +#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
> > > > >
> > > > >  Master message types
> > > > >  --------------------
> > > > > @@ -777,6 +789,27 @@ Slave message types
> > > > >       the VHOST_USER_NEED_REPLY flag, master must respond with zero
> when
> > > > >       operation is successfully completed, or non-zero otherwise.
> > > > >
> > > > > + * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
> > > > > +
> > > > > +      Id: 3
> > > > > +      Equivalent ioctl: N/A
> > > > > +      Slave payload: vring area description
> > > > > +      Master payload: N/A
> > > > > +
> > > > > +      Sets host notifier for a specified queue. The queue index is contained
> > > > > +      in the u64 field of the vring area description. The host notifier is
> > > > > +      described by the file descriptor (typically it's a VFIO device fd) which
> > > > > +      is passed as ancillary data and the size (which is mmap size and should
> > > > > +      be the same as host page size) and offset (which is mmap offset)
> carried
> > > > > +      in the vring area description. QEMU can mmap the file descriptor
> based
> > > > > +      on the size and offset to get a memory range. Registering a host
> notifier
> > > > > +      means mapping this memory range to the VM as the
> > > > > + specified queue's
> > > notify
> > > > > +      MMIO region. Slave sends this request to tell QEMU to de-register
> the
> > > > > +      existing notifier if any and register the new notifier if the request is
> > > > > +      sent with a file descriptor.
> > > > > +      This request should be sent only when
> > > VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
> > > > > +      protocol feature has been successfully negotiated.
> > > > > +
> > > > >  VHOST_USER_PROTOCOL_F_REPLY_ACK:
> > > > >  -------------------------------  The original vhost-user
> > > > > specification only demands replies for certain diff --git
> > > > > a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index
> > > > > 791e0a4763..1cd9c7276b 100644
> > > > > --- a/hw/virtio/vhost-user.c
> > > > > +++ b/hw/virtio/vhost-user.c
> > > > > @@ -13,6 +13,7 @@
> > > > >  #include "hw/virtio/vhost.h"
> > > > >  #include "hw/virtio/vhost-user.h"
> > > > >  #include "hw/virtio/vhost-backend.h"
> > > > > +#include "hw/virtio/virtio.h"
> > > > >  #include "hw/virtio/virtio-net.h"
> > > > >  #include "chardev/char-fe.h"
> > > > >  #include "sysemu/kvm.h"
> > > > > @@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
> > > > >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> > > > >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> > > > >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > > > > +    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
> > > > >      VHOST_USER_PROTOCOL_F_MAX
> > > > >  };
> > > > >
> > > > > @@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
> > > > >      VHOST_USER_SLAVE_NONE = 0,
> > > > >      VHOST_USER_SLAVE_IOTLB_MSG = 1,
> > > > >      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> > > > > +    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
> > > > >      VHOST_USER_SLAVE_MAX
> > > > >  }  VhostUserSlaveRequest;
> > > > >
> > > > > @@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__
> ((unused));
> > > > >                                     + sizeof(c.size) \
> > > > >                                     + sizeof(c.flags))
> > > > >
> > > > > +typedef struct VhostUserVringArea {
> > > > > +    uint64_t u64;
> > > > > +    uint64_t size;
> > > > > +    uint64_t offset;
> > > > > +} VhostUserVringArea;
> > > > > +
> > > > >  typedef struct {
> > > > >      VhostUserRequest request;
> > > > >
> > > > > @@ -157,6 +166,7 @@ typedef union {
> > > > >          struct vhost_iotlb_msg iotlb;
> > > > >          VhostUserConfig config;
> > > > >          VhostUserCryptoSession session;
> > > > > +        VhostUserVringArea area;
> > > > >  } VhostUserPayload;
> > > > >
> > > > >  typedef struct VhostUserMsg {
> > > > > @@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct
> > > vhost_dev *dev,
> > > > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM,
> > > > > ring);  }
> > > > >
> > > > > +static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
> > > > > +                                             int queue_idx) {
> > > > > +    struct vhost_user *u = dev->opaque;
> > > > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > > > +    VirtIODevice *vdev = dev->vdev;
> > > > > +
> > > > > +    if (n->addr && !n->set) {
> > > > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
> > > > > +        n->set = true;
> > > > > +    }
> > > > > +}
> > > > > +
> > > > > +static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
> > > > > +                                            int queue_idx) {
> > > > > +    struct vhost_user *u = dev->opaque;
> > > > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > > > +    VirtIODevice *vdev = dev->vdev;
> > > > > +
> > > > > +    if (n->addr && n->set) {
> > > > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr,
> false);
> > > > > +        n->set = false;
> > > > > +    }
> > > > > +}
> > > > > +
> > > > >  static int vhost_user_set_vring_base(struct vhost_dev *dev,
> > > > >                                       struct vhost_vring_state
> > > > > *ring)  {
> > > > > +    vhost_user_host_notifier_restore(dev, ring->index);
> > > > > +
> > > > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE,
> > > > > ring); }
> > > > >
> > > > > @@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct
> > > vhost_dev *dev,
> > > > >          .hdr.size = sizeof(msg.payload.state),
> > > > >      };
> > > > >
> > > > > +    vhost_user_host_notifier_remove(dev, ring->index);
> > > > > +
> > > > >      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > > > >          return -1;
> > > > >      }
> > > > > @@ -847,6 +887,76 @@ static int
> > > vhost_user_slave_handle_config_change(struct vhost_dev *dev)
> > > > >      return ret;
> > > > >  }
> > > > >
> > > > > +static int vhost_user_slave_handle_vring_host_notifier(struct
> > > > > +vhost_dev
> > > *dev,
> > > > > +                                                       VhostUserVringArea *area,
> > > > > +                                                       int fd) {
> > > > > +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> > > > > +    size_t page_size = qemu_real_host_page_size;
> > > > > +    struct vhost_user *u = dev->opaque;
> > > > > +    VhostUserState *user = u->user;
> > > > > +    VirtIODevice *vdev = dev->vdev;
> > > > > +    VhostUserHostNotifier *n;
> > > > > +    int ret = 0;
> > > > > +    void *addr;
> > > > > +    char *name;
> > > > > +
> > > > > +    if (!virtio_has_feature(dev->protocol_features,
> > > > > +                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
> > > > > +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> > > > > +        ret = -1;
> > > > > +        goto out;
> > > > > +    }
> > > > > +
> > > > > +    n = &user->notifier[queue_idx];
> > > > > +
> > > > > +    if (n->addr) {
> > > > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr,
> false);
> > > > > +        object_unparent(OBJECT(&n->mr));
> > > > > +        munmap(n->addr, page_size);
> > > > > +        n->addr = NULL;
> > > > > +    }
> > > > > +
> > > > > +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> > > > > +        goto out;
> > > > > +    }
> > > > > +
> > > > > +    /* Sanity check. */
> > > > > +    if (area->size != page_size) {
> > > > > +        ret = -1;
> > > > > +        goto out;
> > > > > +    }
> > > > > +
> > > > > +    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
> MAP_SHARED,
> > > > > +                fd, area->offset);
> > > > > +    if (addr == MAP_FAILED) {
> > > > > +        ret = -1;
> > > > > +        goto out;
> > > > > +    }
> > > > > +
> > > > > +    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
> > > > > +                           user, queue_idx);
> > > > > +    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
> > > > > +                                      page_size, addr);
> > > > > +    g_free(name);
> > > > > +
> > > > > +    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr,
> true)) {
> > > > > +        munmap(addr, page_size);
> > > > > +        ret = -1;
> > > > > +        goto out;
> > > > > +    }
> > > > > +
> > > > > +    n->addr = addr;
> > > > > +    n->set = true;
> > > > > +
> > > > > +out:
> > > > > +    /* Always close the fd. */
> > > > > +    if (fd != -1) {
> > > > > +        close(fd);
> > > > > +    }
> > > > > +    return ret;
> > > > > +}
> > > > > +
> > > > >  static void slave_read(void *opaque)  {
> > > > >      struct vhost_dev *dev = opaque; @@ -913,6 +1023,10 @@
> > > > > static void slave_read(void *opaque)
> > > > >      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
> > > > >          ret = vhost_user_slave_handle_config_change(dev);
> > > > >          break;
> > > > > +    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
> > > > > +        ret = vhost_user_slave_handle_vring_host_notifier(dev,
> &payload.area,
> > > > > +                                                          fd);
> > > > > +        break;
> > > > >      default:
> > > > >          error_report("Received unexpected msg type.");
> > > > >          if (fd != -1) {
> > > > > @@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
> > > > >
> > > > >  void vhost_user_cleanup(VhostUserState *user)  {
> > > > > +    int i;
> > > > > +
> > > > > +    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> > > > > +        if (user->notifier[i].addr) {
> > > > > +            object_unparent(OBJECT(&user->notifier[i].mr));
> > > > > +            munmap(user->notifier[i].addr, qemu_real_host_page_size);
> > > > > +            user->notifier[i].addr = NULL;
> > > > > +        }
> > > > > +    }
> > > > >  }
> > > > >
> > > > >  const VhostOps user_ops = {
> > > > > diff --git a/include/hw/virtio/vhost-user.h
> > > > > b/include/hw/virtio/vhost-user.h index eb8bc0d90d..fd660393a0
> > > > > 100644
> > > > > --- a/include/hw/virtio/vhost-user.h
> > > > > +++ b/include/hw/virtio/vhost-user.h
> > > > > @@ -9,9 +9,17 @@
> > > > >  #define HW_VIRTIO_VHOST_USER_H
> > > > >
> > > > >  #include "chardev/char-fe.h"
> > > > > +#include "hw/virtio/virtio.h"
> > > > > +
> > > > > +typedef struct VhostUserHostNotifier {
> > > > > +    MemoryRegion mr;
> > > > > +    void *addr;
> > > > > +    bool set;
> > > > > +} VhostUserHostNotifier;
> > > > >
> > > > >  typedef struct VhostUserState {
> > > > >      CharBackend *chr;
> > > > > +    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > > > >  } VhostUserState;
> > > > >
> > > > >  VhostUserState *vhost_user_init(void);
> > > > > --
> > > > > 2.11.0

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 15:52             ` [virtio-dev] " Paolo Bonzini
@ 2018-04-19 16:34               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-19 16:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liang, Cunming, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On Thu, Apr 19, 2018 at 05:52:23PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 17:42, Michael S. Tsirkin wrote:
> >> A compiler barrier is enough on strongly-ordered memory platform.
> >> As it doesn't re-order store, PCI device won't see a stale index
> >> value. But a weakly-ordered memory needs sfence.
> > 
> > Oh you are right.
> > 
> > So it's only needed for non-intel platforms or when packets are in
> > WC memory then. And I don't know whether dpdk ever puts packets in WC
> > memory.
> > 
> > I guess we'll cross this bridge when we get to it.
> 
> Non-TSO architectures seem important...
> 
> Paolo

Only if there are virtio offload engines on platforms with these
architectures :) Worth working on for sure but doesn't have to block
this patchset.

-- 
MST

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 16:34               ` Michael S. Tsirkin
  0 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-19 16:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liang, Cunming, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On Thu, Apr 19, 2018 at 05:52:23PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 17:42, Michael S. Tsirkin wrote:
> >> A compiler barrier is enough on strongly-ordered memory platform.
> >> As it doesn't re-order store, PCI device won't see a stale index
> >> value. But a weakly-ordered memory needs sfence.
> > 
> > Oh you are right.
> > 
> > So it's only needed for non-intel platforms or when packets are in
> > WC memory then. And I don't know whether dpdk ever puts packets in WC
> > memory.
> > 
> > I guess we'll cross this bridge when we get to it.
> 
> Non-TSO architectures seem important...
> 
> Paolo

Only if there are virtio offload engines on platforms with these
architectures :) Worth working on for sure but doesn't have to block
this patchset.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 16:07                   ` Paolo Bonzini
@ 2018-04-19 16:48                     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-19 16:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liang, Cunming, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On Thu, Apr 19, 2018 at 06:07:07PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 17:59, Michael S. Tsirkin wrote:
> > On Thu, Apr 19, 2018 at 05:51:51PM +0200, Paolo Bonzini wrote:
> >> On 19/04/2018 17:19, Michael S. Tsirkin wrote:
> >>>> - if we make it 1 when weak barriers are needed, the device also needs
> >>>> to nack feature negotiation (not allow setting the FEATURES_OK) if the
> >>>> bit is not set by the driver.
> >>>>  However, that is not enough.  Live
> >>>> migration assumes that it is okay to migrate a virtual machine from a
> >>>> source that doesn't support a feature to a destination that supports it.
> >>>>  In this case, it would assume that it is okay to migrate from software
> >>>> virtio to hardware virtio.  This is wrong because the destination would
> >>>> use weak barriers
> >>>
> >>> You can't migrate between systems with different sets of device features
> >>> right now.
> >>
> >> Yes, you can, exactly because some features are defined not by the
> >> machine type but rather by the host kernel.  See virtio_net_get_features
> >> in QEMU's hw/virtio/virtio-net.c, and virtio_set_features_nocheck in
> >> QEMU's hw/virtio/virtio.c.
> > 
> > Oh you are right. Well we can just special-case that one :)
> 
> Anything wrong with bumping the PCI revision?
> 
> Paolo

Nothing except in virtio 1 bumping the revision does not prevent
loading the old driver.

-- 
MST

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 16:48                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-19 16:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liang, Cunming, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On Thu, Apr 19, 2018 at 06:07:07PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 17:59, Michael S. Tsirkin wrote:
> > On Thu, Apr 19, 2018 at 05:51:51PM +0200, Paolo Bonzini wrote:
> >> On 19/04/2018 17:19, Michael S. Tsirkin wrote:
> >>>> - if we make it 1 when weak barriers are needed, the device also needs
> >>>> to nack feature negotiation (not allow setting the FEATURES_OK) if the
> >>>> bit is not set by the driver.
> >>>>  However, that is not enough.  Live
> >>>> migration assumes that it is okay to migrate a virtual machine from a
> >>>> source that doesn't support a feature to a destination that supports it.
> >>>>  In this case, it would assume that it is okay to migrate from software
> >>>> virtio to hardware virtio.  This is wrong because the destination would
> >>>> use weak barriers
> >>>
> >>> You can't migrate between systems with different sets of device features
> >>> right now.
> >>
> >> Yes, you can, exactly because some features are defined not by the
> >> machine type but rather by the host kernel.  See virtio_net_get_features
> >> in QEMU's hw/virtio/virtio-net.c, and virtio_set_features_nocheck in
> >> QEMU's hw/virtio/virtio.c.
> > 
> > Oh you are right. Well we can just special-case that one :)
> 
> Anything wrong with bumping the PCI revision?
> 
> Paolo

Nothing except in virtio 1 bumping the revision does not prevent
loading the old driver.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 15:52             ` [virtio-dev] " Paolo Bonzini
@ 2018-04-19 16:52               ` Liang, Cunming
  -1 siblings, 0 replies; 98+ messages in thread
From: Liang, Cunming @ 2018-04-19 16:52 UTC (permalink / raw)
  To: Paolo Bonzini, Michael S. Tsirkin
  Cc: Bie, Tiwei, jasowang, alex.williamson, stefanha, qemu-devel,
	virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong, Wang,
	Xiao W



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Thursday, April 19, 2018 11:52 PM
> To: Michael S. Tsirkin <mst@redhat.com>; Liang, Cunming
> <cunming.liang@intel.com>
> Cc: Bie, Tiwei <tiwei.bie@intel.com>; jasowang@redhat.com;
> alex.williamson@redhat.com; stefanha@redhat.com; qemu-devel@nongnu.org;
> virtio-dev@lists.oasis-open.org; Daly, Dan <dan.daly@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>; Wang, Zhihong <zhihong.wang@intel.com>; Wang,
> Xiao W <xiao.w.wang@intel.com>
> Subject: Re: [PATCH v3 6/6] vhost-user: support registering external host
> notifiers
> 
> On 19/04/2018 17:42, Michael S. Tsirkin wrote:
> >> A compiler barrier is enough on strongly-ordered memory platform.
> >> As it doesn't re-order store, PCI device won't see a stale index
> >> value. But a weakly-ordered memory needs sfence.
> >
> > Oh you are right.
> >
> > So it's only needed for non-intel platforms or when packets are in WC
> > memory then. And I don't know whether dpdk ever puts packets in WC
> > memory.
> >
> > I guess we'll cross this bridge when we get to it.
> 
> Non-TSO architectures seem important...

I'm not familiar with Non-TSO, trying to understand the difference according to the feature set. Let's say non-TSO architectures do not set 'weak_barriers'. Then mandatory barrier is used for software. HW offload on that platform would choose different feature set against software? 
If it's not, essentially we're worried about live migration from a TSO to a non-TSO architectures platform?

> 
> Paolo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 16:52               ` Liang, Cunming
  0 siblings, 0 replies; 98+ messages in thread
From: Liang, Cunming @ 2018-04-19 16:52 UTC (permalink / raw)
  To: Paolo Bonzini, Michael S. Tsirkin
  Cc: Bie, Tiwei, jasowang, alex.williamson, stefanha, qemu-devel,
	virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong, Wang,
	Xiao W



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Thursday, April 19, 2018 11:52 PM
> To: Michael S. Tsirkin <mst@redhat.com>; Liang, Cunming
> <cunming.liang@intel.com>
> Cc: Bie, Tiwei <tiwei.bie@intel.com>; jasowang@redhat.com;
> alex.williamson@redhat.com; stefanha@redhat.com; qemu-devel@nongnu.org;
> virtio-dev@lists.oasis-open.org; Daly, Dan <dan.daly@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>; Wang, Zhihong <zhihong.wang@intel.com>; Wang,
> Xiao W <xiao.w.wang@intel.com>
> Subject: Re: [PATCH v3 6/6] vhost-user: support registering external host
> notifiers
> 
> On 19/04/2018 17:42, Michael S. Tsirkin wrote:
> >> A compiler barrier is enough on strongly-ordered memory platform.
> >> As it doesn't re-order store, PCI device won't see a stale index
> >> value. But a weakly-ordered memory needs sfence.
> >
> > Oh you are right.
> >
> > So it's only needed for non-intel platforms or when packets are in WC
> > memory then. And I don't know whether dpdk ever puts packets in WC
> > memory.
> >
> > I guess we'll cross this bridge when we get to it.
> 
> Non-TSO architectures seem important...

I'm not familiar with Non-TSO, trying to understand the difference according to the feature set. Let's say non-TSO architectures do not set 'weak_barriers'. Then mandatory barrier is used for software. HW offload on that platform would choose different feature set against software? 
If it's not, essentially we're worried about live migration from a TSO to a non-TSO architectures platform?

> 
> Paolo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 16:24               ` Liang, Cunming
@ 2018-04-19 16:55                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-19 16:55 UTC (permalink / raw)
  To: Liang, Cunming
  Cc: Paolo Bonzini, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On Thu, Apr 19, 2018 at 04:24:29PM +0000, Liang, Cunming wrote:
> 
> 
> > -----Original Message-----
> > From: Michael S. Tsirkin [mailto:mst@redhat.com]
> > Sent: Thursday, April 19, 2018 11:19 PM
> > To: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Liang, Cunming <cunming.liang@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> > jasowang@redhat.com; alex.williamson@redhat.com; stefanha@redhat.com;
> > qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> > <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> > <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> > Subject: Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering
> > external host notifiers
> > 
> > On Thu, Apr 19, 2018 at 03:02:40PM +0200, Paolo Bonzini wrote:
> > > On 19/04/2018 14:43, Liang, Cunming wrote:
> > > >> 2. Memory barriers. Right now after updating the avail idx, virtio
> > > >> does smp_wmb() and then the MMIO write. Normal hardware drivers do
> > > >> wmb() which is an sfence. Can a PCI device read bypass index write
> > > >> and see a stale index value?
> > > >
> > > > A compiler barrier is enough on strongly-ordered memory platform. As
> > > > it doesn't re-order store, PCI device won't see a stale index value.
> > > > But a weakly-ordered memory needs sfence.
> > >
> > > That is complicated then.  We need to define a feature bit and (in the
> > > Linux driver) propagate it to vring_create_virtqueue's weak_barrier
> > > argument.  However:
> > >
> > > - if we make it 1 when weak barriers are needed, the device also needs
> > > to nack feature negotiation (not allow setting the FEATURES_OK) if the
> > > bit is not set by the driver.
> > >  However, that is not enough.  Live
> > > migration assumes that it is okay to migrate a virtual machine from a
> > > source that doesn't support a feature to a destination that supports it.
> > >  In this case, it would assume that it is okay to migrate from
> > > software virtio to hardware virtio.  This is wrong because the
> > > destination would use weak barriers
> > 
> > You can't migrate between systems with different sets of device features right
> > now.
> > 
> > > - if we make it 1 when strong barriers are enough, software virtio
> > > devices needs to be updated to expose the bit.  This works, including
> > > live migration, but updated drivers will now go slower when run
> > > against an old device that doesn't know the feature bit.
> > >
> > > Maybe bump the PCI revision, so that only the new revision has the bit?
> > >
> > > Thanks,
> > >
> > > Paolo
> > 
> > As a first step, if you want to migrate to a HW offloaded solution then you need
> > to enable the feature.
> 
> > It does mean it will go a bit slower when run with software,
> > so it's only good if most systems in your cluster do have the HW offload.
> To clarify a bit more, it's suboptimal to always use mandatory barriers for MMIO. Per strongly-order memory, 'weak barriers' (smp_wmb) is pretty good for MMIO. The tradeoff doesn't always happen, software and HW offload can align on the same page.

I agree to all of the above except where you say smp_wmb.

smp_wmb is for controlling SMP effects on Linux, and I suspect
it will not do the right thing on some non-Intel architectures.

The claim is I think correct for Intel/AMD platforms, and probably
other strongly ordered ones. I suspect it's incorrect for ARM and
power.

Replace smp_wmb with 'asm volatile ("") on Intel' and I'll agree.



> > I think we can start by getting that working and think about ways to improve
> > down the road.
> > 
> > 
> > That's the usecase we designed FEATURES_OK for though, so I do think/hope it's
> > enough and we don't need to play with revisions.
> > 
> > 
> > --
> > MST

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 16:55                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-19 16:55 UTC (permalink / raw)
  To: Liang, Cunming
  Cc: Paolo Bonzini, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On Thu, Apr 19, 2018 at 04:24:29PM +0000, Liang, Cunming wrote:
> 
> 
> > -----Original Message-----
> > From: Michael S. Tsirkin [mailto:mst@redhat.com]
> > Sent: Thursday, April 19, 2018 11:19 PM
> > To: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Liang, Cunming <cunming.liang@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> > jasowang@redhat.com; alex.williamson@redhat.com; stefanha@redhat.com;
> > qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> > <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> > <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> > Subject: Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering
> > external host notifiers
> > 
> > On Thu, Apr 19, 2018 at 03:02:40PM +0200, Paolo Bonzini wrote:
> > > On 19/04/2018 14:43, Liang, Cunming wrote:
> > > >> 2. Memory barriers. Right now after updating the avail idx, virtio
> > > >> does smp_wmb() and then the MMIO write. Normal hardware drivers do
> > > >> wmb() which is an sfence. Can a PCI device read bypass index write
> > > >> and see a stale index value?
> > > >
> > > > A compiler barrier is enough on strongly-ordered memory platform. As
> > > > it doesn't re-order store, PCI device won't see a stale index value.
> > > > But a weakly-ordered memory needs sfence.
> > >
> > > That is complicated then.  We need to define a feature bit and (in the
> > > Linux driver) propagate it to vring_create_virtqueue's weak_barrier
> > > argument.  However:
> > >
> > > - if we make it 1 when weak barriers are needed, the device also needs
> > > to nack feature negotiation (not allow setting the FEATURES_OK) if the
> > > bit is not set by the driver.
> > >  However, that is not enough.  Live
> > > migration assumes that it is okay to migrate a virtual machine from a
> > > source that doesn't support a feature to a destination that supports it.
> > >  In this case, it would assume that it is okay to migrate from
> > > software virtio to hardware virtio.  This is wrong because the
> > > destination would use weak barriers
> > 
> > You can't migrate between systems with different sets of device features right
> > now.
> > 
> > > - if we make it 1 when strong barriers are enough, software virtio
> > > devices needs to be updated to expose the bit.  This works, including
> > > live migration, but updated drivers will now go slower when run
> > > against an old device that doesn't know the feature bit.
> > >
> > > Maybe bump the PCI revision, so that only the new revision has the bit?
> > >
> > > Thanks,
> > >
> > > Paolo
> > 
> > As a first step, if you want to migrate to a HW offloaded solution then you need
> > to enable the feature.
> 
> > It does mean it will go a bit slower when run with software,
> > so it's only good if most systems in your cluster do have the HW offload.
> To clarify a bit more, it's suboptimal to always use mandatory barriers for MMIO. Per strongly-order memory, 'weak barriers' (smp_wmb) is pretty good for MMIO. The tradeoff doesn't always happen, software and HW offload can align on the same page.

I agree to all of the above except where you say smp_wmb.

smp_wmb is for controlling SMP effects on Linux, and I suspect
it will not do the right thing on some non-Intel architectures.

The claim is I think correct for Intel/AMD platforms, and probably
other strongly ordered ones. I suspect it's incorrect for ARM and
power.

Replace smp_wmb with 'asm volatile ("") on Intel' and I'll agree.



> > I think we can start by getting that working and think about ways to improve
> > down the road.
> > 
> > 
> > That's the usecase we designed FEATURES_OK for though, so I do think/hope it's
> > enough and we don't need to play with revisions.
> > 
> > 
> > --
> > MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 16:52               ` [virtio-dev] " Liang, Cunming
@ 2018-04-19 16:59                 ` Paolo Bonzini
  -1 siblings, 0 replies; 98+ messages in thread
From: Paolo Bonzini @ 2018-04-19 16:59 UTC (permalink / raw)
  To: Liang, Cunming, Michael S. Tsirkin
  Cc: Bie, Tiwei, jasowang, alex.williamson, stefanha, qemu-devel,
	virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong, Wang,
	Xiao W

On 19/04/2018 18:52, Liang, Cunming wrote:
>>> Oh you are right.
>>> 
>>> So it's only needed for non-intel platforms or when packets are
>>> in WC memory then. And I don't know whether dpdk ever puts
>>> packets in WC memory.
>>> 
>>> I guess we'll cross this bridge when we get to it.
>> Non-TSO architectures seem important...
>
> I'm not familiar with Non-TSO, trying to understand the difference
> according to the feature set. Let's say non-TSO architectures do not
> set 'weak_barriers'. Then mandatory barrier is used for software. HW
> offload on that platform would choose different feature set against
> software? If it's not, essentially we're worried about live migration
> from a TSO to a non-TSO architectures platform?

I'm worried about live migration from software virtio to hardware virtio
on non-TSO architectures.  For example, on ARM you would have a "dmb
ishst" (smp_wmb) for software virtio and a "dsb st" (wmb) or "dmb oshst"
(dma_wmb) for hardware virtio.

For this to work, you would have to set up the VM so that it uses the
heavier barriers from the beginning, even when backed by software virtio.

Paolo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 16:59                 ` Paolo Bonzini
  0 siblings, 0 replies; 98+ messages in thread
From: Paolo Bonzini @ 2018-04-19 16:59 UTC (permalink / raw)
  To: Liang, Cunming, Michael S. Tsirkin
  Cc: Bie, Tiwei, jasowang, alex.williamson, stefanha, qemu-devel,
	virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong, Wang,
	Xiao W

On 19/04/2018 18:52, Liang, Cunming wrote:
>>> Oh you are right.
>>> 
>>> So it's only needed for non-intel platforms or when packets are
>>> in WC memory then. And I don't know whether dpdk ever puts
>>> packets in WC memory.
>>> 
>>> I guess we'll cross this bridge when we get to it.
>> Non-TSO architectures seem important...
>
> I'm not familiar with Non-TSO, trying to understand the difference
> according to the feature set. Let's say non-TSO architectures do not
> set 'weak_barriers'. Then mandatory barrier is used for software. HW
> offload on that platform would choose different feature set against
> software? If it's not, essentially we're worried about live migration
> from a TSO to a non-TSO architectures platform?

I'm worried about live migration from software virtio to hardware virtio
on non-TSO architectures.  For example, on ARM you would have a "dmb
ishst" (smp_wmb) for software virtio and a "dsb st" (wmb) or "dmb oshst"
(dma_wmb) for hardware virtio.

For this to work, you would have to set up the VM so that it uses the
heavier barriers from the beginning, even when backed by software virtio.

Paolo

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 16:52               ` [virtio-dev] " Liang, Cunming
@ 2018-04-19 17:00                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-19 17:00 UTC (permalink / raw)
  To: Liang, Cunming
  Cc: Paolo Bonzini, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On Thu, Apr 19, 2018 at 04:52:20PM +0000, Liang, Cunming wrote:
> 
> 
> > -----Original Message-----
> > From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> > Sent: Thursday, April 19, 2018 11:52 PM
> > To: Michael S. Tsirkin <mst@redhat.com>; Liang, Cunming
> > <cunming.liang@intel.com>
> > Cc: Bie, Tiwei <tiwei.bie@intel.com>; jasowang@redhat.com;
> > alex.williamson@redhat.com; stefanha@redhat.com; qemu-devel@nongnu.org;
> > virtio-dev@lists.oasis-open.org; Daly, Dan <dan.daly@intel.com>; Tan, Jianfeng
> > <jianfeng.tan@intel.com>; Wang, Zhihong <zhihong.wang@intel.com>; Wang,
> > Xiao W <xiao.w.wang@intel.com>
> > Subject: Re: [PATCH v3 6/6] vhost-user: support registering external host
> > notifiers
> > 
> > On 19/04/2018 17:42, Michael S. Tsirkin wrote:
> > >> A compiler barrier is enough on strongly-ordered memory platform.
> > >> As it doesn't re-order store, PCI device won't see a stale index
> > >> value. But a weakly-ordered memory needs sfence.
> > >
> > > Oh you are right.
> > >
> > > So it's only needed for non-intel platforms or when packets are in WC
> > > memory then. And I don't know whether dpdk ever puts packets in WC
> > > memory.
> > >
> > > I guess we'll cross this bridge when we get to it.
> > 
> > Non-TSO architectures seem important...
> 
> I'm not familiar with Non-TSO, trying to understand the difference
> according to the feature set. Let's say non-TSO architectures do not
> set 'weak_barriers'. Then mandatory barrier is used for software. HW
> offload on that platform would choose different feature set against
> software? 

Right. We'll need a flag for this feature for starters. It doesn't exist
:) Paolo also points out that we should then add code to disallow
migration between setups with and without the feature.

> If it's not, essentially we're worried about live migration from a TSO to a non-TSO architectures platform?

Probably not.

> > 
> > Paolo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 17:00                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-19 17:00 UTC (permalink / raw)
  To: Liang, Cunming
  Cc: Paolo Bonzini, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On Thu, Apr 19, 2018 at 04:52:20PM +0000, Liang, Cunming wrote:
> 
> 
> > -----Original Message-----
> > From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> > Sent: Thursday, April 19, 2018 11:52 PM
> > To: Michael S. Tsirkin <mst@redhat.com>; Liang, Cunming
> > <cunming.liang@intel.com>
> > Cc: Bie, Tiwei <tiwei.bie@intel.com>; jasowang@redhat.com;
> > alex.williamson@redhat.com; stefanha@redhat.com; qemu-devel@nongnu.org;
> > virtio-dev@lists.oasis-open.org; Daly, Dan <dan.daly@intel.com>; Tan, Jianfeng
> > <jianfeng.tan@intel.com>; Wang, Zhihong <zhihong.wang@intel.com>; Wang,
> > Xiao W <xiao.w.wang@intel.com>
> > Subject: Re: [PATCH v3 6/6] vhost-user: support registering external host
> > notifiers
> > 
> > On 19/04/2018 17:42, Michael S. Tsirkin wrote:
> > >> A compiler barrier is enough on strongly-ordered memory platform.
> > >> As it doesn't re-order store, PCI device won't see a stale index
> > >> value. But a weakly-ordered memory needs sfence.
> > >
> > > Oh you are right.
> > >
> > > So it's only needed for non-intel platforms or when packets are in WC
> > > memory then. And I don't know whether dpdk ever puts packets in WC
> > > memory.
> > >
> > > I guess we'll cross this bridge when we get to it.
> > 
> > Non-TSO architectures seem important...
> 
> I'm not familiar with Non-TSO, trying to understand the difference
> according to the feature set. Let's say non-TSO architectures do not
> set 'weak_barriers'. Then mandatory barrier is used for software. HW
> offload on that platform would choose different feature set against
> software? 

Right. We'll need a flag for this feature for starters. It doesn't exist
:) Paolo also points out that we should then add code to disallow
migration between setups with and without the feature.

> If it's not, essentially we're worried about live migration from a TSO to a non-TSO architectures platform?

Probably not.

> > 
> > Paolo

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 16:59                 ` Paolo Bonzini
@ 2018-04-19 17:27                   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-19 17:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liang, Cunming, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On Thu, Apr 19, 2018 at 06:59:39PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 18:52, Liang, Cunming wrote:
> >>> Oh you are right.
> >>> 
> >>> So it's only needed for non-intel platforms or when packets are
> >>> in WC memory then. And I don't know whether dpdk ever puts
> >>> packets in WC memory.
> >>> 
> >>> I guess we'll cross this bridge when we get to it.
> >> Non-TSO architectures seem important...
> >
> > I'm not familiar with Non-TSO, trying to understand the difference
> > according to the feature set. Let's say non-TSO architectures do not
> > set 'weak_barriers'. Then mandatory barrier is used for software. HW
> > offload on that platform would choose different feature set against
> > software? If it's not, essentially we're worried about live migration
> > from a TSO to a non-TSO architectures platform?
> 
> I'm worried about live migration from software virtio to hardware virtio
> on non-TSO architectures.  For example, on ARM you would have a "dmb
> ishst" (smp_wmb) for software virtio and a "dsb st" (wmb) or "dmb oshst"
> (dma_wmb) for hardware virtio.
> 
> For this to work, you would have to set up the VM so that it uses the
> heavier barriers from the beginning, even when backed by software virtio.
> 
> Paolo

Right. Or disallow hardware to software migrations.

But generally the mandatory and even dma barriers in Linux are often an overkill.

See ARM for example: 

#if defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
#define mb()            __arm_heavy_mb()
#define rmb()           dsb()
#define wmb()           __arm_heavy_mb(st)
#define dma_rmb()       dmb(osh)
#define dma_wmb()       dmb(oshst)
#else
#define mb()            barrier()
#define rmb()           barrier()
#define wmb()           barrier()
#define dma_rmb()       barrier()
#define dma_wmb()       barrier()
#endif

That CONFIG_SMP here is clearly wrong but I don't really know what
to set it to. Also, we probably should switch virtio_wmb to dma_XX
barriers.

That's actually easy. Will try to do.

-- 
MST

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 17:27                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-19 17:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liang, Cunming, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On Thu, Apr 19, 2018 at 06:59:39PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 18:52, Liang, Cunming wrote:
> >>> Oh you are right.
> >>> 
> >>> So it's only needed for non-intel platforms or when packets are
> >>> in WC memory then. And I don't know whether dpdk ever puts
> >>> packets in WC memory.
> >>> 
> >>> I guess we'll cross this bridge when we get to it.
> >> Non-TSO architectures seem important...
> >
> > I'm not familiar with Non-TSO, trying to understand the difference
> > according to the feature set. Let's say non-TSO architectures do not
> > set 'weak_barriers'. Then mandatory barrier is used for software. HW
> > offload on that platform would choose different feature set against
> > software? If it's not, essentially we're worried about live migration
> > from a TSO to a non-TSO architectures platform?
> 
> I'm worried about live migration from software virtio to hardware virtio
> on non-TSO architectures.  For example, on ARM you would have a "dmb
> ishst" (smp_wmb) for software virtio and a "dsb st" (wmb) or "dmb oshst"
> (dma_wmb) for hardware virtio.
> 
> For this to work, you would have to set up the VM so that it uses the
> heavier barriers from the beginning, even when backed by software virtio.
> 
> Paolo

Right. Or disallow hardware to software migrations.

But generally the mandatory and even dma barriers in Linux are often an overkill.

See ARM for example: 

#if defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
#define mb()            __arm_heavy_mb()
#define rmb()           dsb()
#define wmb()           __arm_heavy_mb(st)
#define dma_rmb()       dmb(osh)
#define dma_wmb()       dmb(oshst)
#else
#define mb()            barrier()
#define rmb()           barrier()
#define wmb()           barrier()
#define dma_rmb()       barrier()
#define dma_wmb()       barrier()
#endif

That CONFIG_SMP here is clearly wrong but I don't really know what
to set it to. Also, we probably should switch virtio_wmb to dma_XX
barriers.

That's actually easy. Will try to do.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 17:27                   ` Michael S. Tsirkin
@ 2018-04-19 17:35                     ` Paolo Bonzini
  -1 siblings, 0 replies; 98+ messages in thread
From: Paolo Bonzini @ 2018-04-19 17:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Liang, Cunming, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On 19/04/2018 19:27, Michael S. Tsirkin wrote:
> 
> That CONFIG_SMP here is clearly wrong but I don't really know what
> to set it to. Also, we probably should switch virtio_wmb to dma_XX
> barriers.
> 
> That's actually easy. Will try to do.

Should it be dma_wmb() before updating the indices, and wmb() before
writing the "kick virtqueue" register?

Paolo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 17:35                     ` Paolo Bonzini
  0 siblings, 0 replies; 98+ messages in thread
From: Paolo Bonzini @ 2018-04-19 17:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Liang, Cunming, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On 19/04/2018 19:27, Michael S. Tsirkin wrote:
> 
> That CONFIG_SMP here is clearly wrong but I don't really know what
> to set it to. Also, we probably should switch virtio_wmb to dma_XX
> barriers.
> 
> That's actually easy. Will try to do.

Should it be dma_wmb() before updating the indices, and wmb() before
writing the "kick virtqueue" register?

Paolo

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 17:35                     ` Paolo Bonzini
@ 2018-04-19 17:39                       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-19 17:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liang, Cunming, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On Thu, Apr 19, 2018 at 07:35:57PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 19:27, Michael S. Tsirkin wrote:
> > 
> > That CONFIG_SMP here is clearly wrong but I don't really know what
> > to set it to. Also, we probably should switch virtio_wmb to dma_XX
> > barriers.
> > 
> > That's actually easy. Will try to do.
> 
> Should it be dma_wmb() before updating the indices, and wmb() before
> writing the "kick virtqueue" register?
> 
> Paolo

if anything kick virtqueue should do wmb internally
within virtio pci.

No one uses strong barriers with virtio pci right now
so we don't do it.

-- 
MST

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 17:39                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-04-19 17:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liang, Cunming, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W

On Thu, Apr 19, 2018 at 07:35:57PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 19:27, Michael S. Tsirkin wrote:
> > 
> > That CONFIG_SMP here is clearly wrong but I don't really know what
> > to set it to. Also, we probably should switch virtio_wmb to dma_XX
> > barriers.
> > 
> > That's actually easy. Will try to do.
> 
> Should it be dma_wmb() before updating the indices, and wmb() before
> writing the "kick virtqueue" register?
> 
> Paolo

if anything kick virtqueue should do wmb internally
within virtio pci.

No one uses strong barriers with virtio pci right now
so we don't do it.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 17:00                 ` [virtio-dev] " Michael S. Tsirkin
@ 2018-04-19 23:05                   ` Liang, Cunming
  -1 siblings, 0 replies; 98+ messages in thread
From: Liang, Cunming @ 2018-04-19 23:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Bonzini, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W



> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Friday, April 20, 2018 1:01 AM
> To: Liang, Cunming <cunming.liang@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> jasowang@redhat.com; alex.williamson@redhat.com; stefanha@redhat.com;
> qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: Re: [PATCH v3 6/6] vhost-user: support registering external host
> notifiers
> 
> On Thu, Apr 19, 2018 at 04:52:20PM +0000, Liang, Cunming wrote:
> >
> >
> > > -----Original Message-----
> > > From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> > > Sent: Thursday, April 19, 2018 11:52 PM
> > > To: Michael S. Tsirkin <mst@redhat.com>; Liang, Cunming
> > > <cunming.liang@intel.com>
> > > Cc: Bie, Tiwei <tiwei.bie@intel.com>; jasowang@redhat.com;
> > > alex.williamson@redhat.com; stefanha@redhat.com;
> > > qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> > > <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang,
> > > Zhihong <zhihong.wang@intel.com>; Wang, Xiao W
> > > <xiao.w.wang@intel.com>
> > > Subject: Re: [PATCH v3 6/6] vhost-user: support registering external
> > > host notifiers
> > >
> > > On 19/04/2018 17:42, Michael S. Tsirkin wrote:
> > > >> A compiler barrier is enough on strongly-ordered memory platform.
> > > >> As it doesn't re-order store, PCI device won't see a stale index
> > > >> value. But a weakly-ordered memory needs sfence.
> > > >
> > > > Oh you are right.
> > > >
> > > > So it's only needed for non-intel platforms or when packets are in
> > > > WC memory then. And I don't know whether dpdk ever puts packets in
> > > > WC memory.
> > > >
> > > > I guess we'll cross this bridge when we get to it.
> > >
> > > Non-TSO architectures seem important...
> >
> > I'm not familiar with Non-TSO, trying to understand the difference
> > according to the feature set. Let's say non-TSO architectures do not
> > set 'weak_barriers'. Then mandatory barrier is used for software. HW
> > offload on that platform would choose different feature set against
> > software?
> 
> Right. We'll need a flag for this feature for starters. It doesn't exist
> :) Paolo also points out that we should then add code to disallow migration
> between setups with and without the feature.
I see. Thanks.

> 
> > If it's not, essentially we're worried about live migration from a TSO to a non-
> TSO architectures platform?
> 
> Probably not.
> 
> > >
> > > Paolo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-19 23:05                   ` Liang, Cunming
  0 siblings, 0 replies; 98+ messages in thread
From: Liang, Cunming @ 2018-04-19 23:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Bonzini, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W



> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Friday, April 20, 2018 1:01 AM
> To: Liang, Cunming <cunming.liang@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> jasowang@redhat.com; alex.williamson@redhat.com; stefanha@redhat.com;
> qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: Re: [PATCH v3 6/6] vhost-user: support registering external host
> notifiers
> 
> On Thu, Apr 19, 2018 at 04:52:20PM +0000, Liang, Cunming wrote:
> >
> >
> > > -----Original Message-----
> > > From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> > > Sent: Thursday, April 19, 2018 11:52 PM
> > > To: Michael S. Tsirkin <mst@redhat.com>; Liang, Cunming
> > > <cunming.liang@intel.com>
> > > Cc: Bie, Tiwei <tiwei.bie@intel.com>; jasowang@redhat.com;
> > > alex.williamson@redhat.com; stefanha@redhat.com;
> > > qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> > > <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang,
> > > Zhihong <zhihong.wang@intel.com>; Wang, Xiao W
> > > <xiao.w.wang@intel.com>
> > > Subject: Re: [PATCH v3 6/6] vhost-user: support registering external
> > > host notifiers
> > >
> > > On 19/04/2018 17:42, Michael S. Tsirkin wrote:
> > > >> A compiler barrier is enough on strongly-ordered memory platform.
> > > >> As it doesn't re-order store, PCI device won't see a stale index
> > > >> value. But a weakly-ordered memory needs sfence.
> > > >
> > > > Oh you are right.
> > > >
> > > > So it's only needed for non-intel platforms or when packets are in
> > > > WC memory then. And I don't know whether dpdk ever puts packets in
> > > > WC memory.
> > > >
> > > > I guess we'll cross this bridge when we get to it.
> > >
> > > Non-TSO architectures seem important...
> >
> > I'm not familiar with Non-TSO, trying to understand the difference
> > according to the feature set. Let's say non-TSO architectures do not
> > set 'weak_barriers'. Then mandatory barrier is used for software. HW
> > offload on that platform would choose different feature set against
> > software?
> 
> Right. We'll need a flag for this feature for starters. It doesn't exist
> :) Paolo also points out that we should then add code to disallow migration
> between setups with and without the feature.
I see. Thanks.

> 
> > If it's not, essentially we're worried about live migration from a TSO to a non-
> TSO architectures platform?
> 
> Probably not.
> 
> > >
> > > Paolo

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 16:55                 ` Michael S. Tsirkin
@ 2018-04-20  3:01                   ` Liang, Cunming
  -1 siblings, 0 replies; 98+ messages in thread
From: Liang, Cunming @ 2018-04-20  3:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Bonzini, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W



> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Friday, April 20, 2018 12:56 AM
> To: Liang, Cunming <cunming.liang@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> jasowang@redhat.com; alex.williamson@redhat.com; stefanha@redhat.com;
> qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering
> external host notifiers
> 
> On Thu, Apr 19, 2018 at 04:24:29PM +0000, Liang, Cunming wrote:
> >
> >
> > > -----Original Message-----
> > > From: Michael S. Tsirkin [mailto:mst@redhat.com]
> > > Sent: Thursday, April 19, 2018 11:19 PM
> > > To: Paolo Bonzini <pbonzini@redhat.com>
> > > Cc: Liang, Cunming <cunming.liang@intel.com>; Bie, Tiwei
> > > <tiwei.bie@intel.com>; jasowang@redhat.com;
> > > alex.williamson@redhat.com; stefanha@redhat.com;
> > > qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> > > <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang,
> > > Zhihong <zhihong.wang@intel.com>; Wang, Xiao W
> > > <xiao.w.wang@intel.com>
> > > Subject: Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support
> > > registering external host notifiers
> > >
> > > On Thu, Apr 19, 2018 at 03:02:40PM +0200, Paolo Bonzini wrote:
> > > > On 19/04/2018 14:43, Liang, Cunming wrote:
> > > > >> 2. Memory barriers. Right now after updating the avail idx,
> > > > >> virtio does smp_wmb() and then the MMIO write. Normal hardware
> > > > >> drivers do
> > > > >> wmb() which is an sfence. Can a PCI device read bypass index
> > > > >> write and see a stale index value?
> > > > >
> > > > > A compiler barrier is enough on strongly-ordered memory
> > > > > platform. As it doesn't re-order store, PCI device won't see a stale index
> value.
> > > > > But a weakly-ordered memory needs sfence.
> > > >
> > > > That is complicated then.  We need to define a feature bit and (in
> > > > the Linux driver) propagate it to vring_create_virtqueue's
> > > > weak_barrier argument.  However:
> > > >
> > > > - if we make it 1 when weak barriers are needed, the device also
> > > > needs to nack feature negotiation (not allow setting the
> > > > FEATURES_OK) if the bit is not set by the driver.
> > > >  However, that is not enough.  Live migration assumes that it is
> > > > okay to migrate a virtual machine from a source that doesn't
> > > > support a feature to a destination that supports it.
> > > >  In this case, it would assume that it is okay to migrate from
> > > > software virtio to hardware virtio.  This is wrong because the
> > > > destination would use weak barriers
> > >
> > > You can't migrate between systems with different sets of device
> > > features right now.
> > >
> > > > - if we make it 1 when strong barriers are enough, software virtio
> > > > devices needs to be updated to expose the bit.  This works,
> > > > including live migration, but updated drivers will now go slower
> > > > when run against an old device that doesn't know the feature bit.
> > > >
> > > > Maybe bump the PCI revision, so that only the new revision has the bit?
> > > >
> > > > Thanks,
> > > >
> > > > Paolo
> > >
> > > As a first step, if you want to migrate to a HW offloaded solution
> > > then you need to enable the feature.
> >
> > > It does mean it will go a bit slower when run with software, so it's
> > > only good if most systems in your cluster do have the HW offload.
> > To clarify a bit more, it's suboptimal to always use mandatory barriers for MMIO.
> Per strongly-order memory, 'weak barriers' (smp_wmb) is pretty good for MMIO.
> The tradeoff doesn't always happen, software and HW offload can align on the
> same page.
> 
> I agree to all of the above except where you say smp_wmb.
> 
> smp_wmb is for controlling SMP effects on Linux, and I suspect it will not do the
> right thing on some non-Intel architectures.
> 
> The claim is I think correct for Intel/AMD platforms, and probably other strongly
> ordered ones. I suspect it's incorrect for ARM and power.
> 
> Replace smp_wmb with 'asm volatile ("") on Intel' and I'll agree.

Yeah, that's more accurate. 

> 
> 
> 
> > > I think we can start by getting that working and think about ways to
> > > improve down the road.
> > >
> > >
> > > That's the usecase we designed FEATURES_OK for though, so I do
> > > think/hope it's enough and we don't need to play with revisions.
> > >
> > >
> > > --
> > > MST

^ permalink raw reply	[flat|nested] 98+ messages in thread

* RE: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-04-20  3:01                   ` Liang, Cunming
  0 siblings, 0 replies; 98+ messages in thread
From: Liang, Cunming @ 2018-04-20  3:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Bonzini, Bie, Tiwei, jasowang, alex.williamson, stefanha,
	qemu-devel, virtio-dev, Daly, Dan, Tan, Jianfeng, Wang, Zhihong,
	Wang, Xiao W



> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Friday, April 20, 2018 12:56 AM
> To: Liang, Cunming <cunming.liang@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> jasowang@redhat.com; alex.williamson@redhat.com; stefanha@redhat.com;
> qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering
> external host notifiers
> 
> On Thu, Apr 19, 2018 at 04:24:29PM +0000, Liang, Cunming wrote:
> >
> >
> > > -----Original Message-----
> > > From: Michael S. Tsirkin [mailto:mst@redhat.com]
> > > Sent: Thursday, April 19, 2018 11:19 PM
> > > To: Paolo Bonzini <pbonzini@redhat.com>
> > > Cc: Liang, Cunming <cunming.liang@intel.com>; Bie, Tiwei
> > > <tiwei.bie@intel.com>; jasowang@redhat.com;
> > > alex.williamson@redhat.com; stefanha@redhat.com;
> > > qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> > > <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang,
> > > Zhihong <zhihong.wang@intel.com>; Wang, Xiao W
> > > <xiao.w.wang@intel.com>
> > > Subject: Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support
> > > registering external host notifiers
> > >
> > > On Thu, Apr 19, 2018 at 03:02:40PM +0200, Paolo Bonzini wrote:
> > > > On 19/04/2018 14:43, Liang, Cunming wrote:
> > > > >> 2. Memory barriers. Right now after updating the avail idx,
> > > > >> virtio does smp_wmb() and then the MMIO write. Normal hardware
> > > > >> drivers do
> > > > >> wmb() which is an sfence. Can a PCI device read bypass index
> > > > >> write and see a stale index value?
> > > > >
> > > > > A compiler barrier is enough on strongly-ordered memory
> > > > > platform. As it doesn't re-order store, PCI device won't see a stale index
> value.
> > > > > But a weakly-ordered memory needs sfence.
> > > >
> > > > That is complicated then.  We need to define a feature bit and (in
> > > > the Linux driver) propagate it to vring_create_virtqueue's
> > > > weak_barrier argument.  However:
> > > >
> > > > - if we make it 1 when weak barriers are needed, the device also
> > > > needs to nack feature negotiation (not allow setting the
> > > > FEATURES_OK) if the bit is not set by the driver.
> > > >  However, that is not enough.  Live migration assumes that it is
> > > > okay to migrate a virtual machine from a source that doesn't
> > > > support a feature to a destination that supports it.
> > > >  In this case, it would assume that it is okay to migrate from
> > > > software virtio to hardware virtio.  This is wrong because the
> > > > destination would use weak barriers
> > >
> > > You can't migrate between systems with different sets of device
> > > features right now.
> > >
> > > > - if we make it 1 when strong barriers are enough, software virtio
> > > > devices needs to be updated to expose the bit.  This works,
> > > > including live migration, but updated drivers will now go slower
> > > > when run against an old device that doesn't know the feature bit.
> > > >
> > > > Maybe bump the PCI revision, so that only the new revision has the bit?
> > > >
> > > > Thanks,
> > > >
> > > > Paolo
> > >
> > > As a first step, if you want to migrate to a HW offloaded solution
> > > then you need to enable the feature.
> >
> > > It does mean it will go a bit slower when run with software, so it's
> > > only good if most systems in your cluster do have the HW offload.
> > To clarify a bit more, it's suboptimal to always use mandatory barriers for MMIO.
> Per strongly-order memory, 'weak barriers' (smp_wmb) is pretty good for MMIO.
> The tradeoff doesn't always happen, software and HW offload can align on the
> same page.
> 
> I agree to all of the above except where you say smp_wmb.
> 
> smp_wmb is for controlling SMP effects on Linux, and I suspect it will not do the
> right thing on some non-Intel architectures.
> 
> The claim is I think correct for Intel/AMD platforms, and probably other strongly
> ordered ones. I suspect it's incorrect for ARM and power.
> 
> Replace smp_wmb with 'asm volatile ("") on Intel' and I'll agree.

Yeah, that's more accurate. 

> 
> 
> 
> > > I think we can start by getting that working and think about ways to
> > > improve down the road.
> > >
> > >
> > > That's the usecase we designed FEATURES_OK for though, so I do
> > > think/hope it's enough and we don't need to play with revisions.
> > >
> > >
> > > --
> > > MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
  2018-04-19 11:14       ` [virtio-dev] " Tiwei Bie
@ 2018-05-02 10:32         ` Tiwei Bie
  -1 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-02 10:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Apr 19, 2018 at 07:14:39PM +0800, Tiwei Bie wrote:
> On Wed, Apr 18, 2018 at 07:34:06PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Apr 12, 2018 at 11:12:32PM +0800, Tiwei Bie wrote:
> > > This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
> > > With this feature negotiated, vhost-user backend can register
> > > memory region based host notifiers. And it will allow the guest
> > > driver in the VM to notify the hardware accelerator at the
> > > vhost-user backend directly.
> > > 
> > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > 
> > Overall I think we can merge this approach, but I have two main concerns
> > about this:
> > 
> > 1. Testing. Most people do not have the virtio hardware
> >    so how to make sure this does not bit rot?
> > 
> >    I have an idea: add an option like this to libvhost-user.
> >    Naturally libvhost-user can not get notified about a write
> >    to an mmapped area, but it can write a special value there
> >    (e.g. all-ones?) and then poll it to detect VQ # writes.
> > 
> >    Then include a vhost user bridge test with an option like this.
> > 
> >    I'd like to see a patch doing this.
> 
> Sure, I'll do it. Thanks for the suggestion!
> 
> > 
> > 2. Memory barriers. Right now after updating the avail idx,
> >    virtio does smp_wmb() and then the MMIO write.
> >    Normal hardware drivers do wmb() which is an sfence.
> >    Can a PCI device read bypass index write and see a stale
> >    index value?
> 
> It depends on arch's memory model. Cunming will provide
> more details about this later.
> 
> >    To make virtio pci do wmb() we would need a new feature bit.
> >    Alternatively I guess we could maybe look at subsystem vendor/device id.
> > 
> >    I'd like to see a patch doing one of these things.
> 
> We prefer to add a new feature bit as it's a more robust
> way to do this. I'll send out some patches soon.
> 
> Thank you very much! :)

Hi Michael,

It seems that we have started to merge patches for QEMU 2.13.
Currently, all your concerns about this patch set have been
addressed or WIP. Do you think it's possible to get this patch
set merged first? Or do you have any other concerns? Thanks!

Best regards,
Tiwei Bie


> 
> Best regards,
> Tiwei Bie
> 
> > 
> > Thanks!
> > 
> > > ---
> > >  docs/interop/vhost-user.txt    |  33 +++++++++++
> > >  hw/virtio/vhost-user.c         | 123 +++++++++++++++++++++++++++++++++++++++++
> > >  include/hw/virtio/vhost-user.h |   8 +++
> > >  3 files changed, 164 insertions(+)
> > > 
> > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > > index 534caab18a..9e57b36b20 100644
> > > --- a/docs/interop/vhost-user.txt
> > > +++ b/docs/interop/vhost-user.txt
> > > @@ -132,6 +132,16 @@ Depending on the request type, payload can be:
> > >     Payload: Size bytes array holding the contents of the virtio
> > >         device's configuration space
> > >  
> > > + * Vring area description
> > > +   -----------------------
> > > +   | u64 | size | offset |
> > > +   -----------------------
> > > +
> > > +   u64: a 64-bit integer contains vring index and flags
> > > +   Size: a 64-bit size of this area
> > > +   Offset: a 64-bit offset of this area from the start of the
> > > +       supplied file descriptor
> > > +
> > >  In QEMU the vhost-user message is implemented with the following struct:
> > >  
> > >  typedef struct VhostUserMsg {
> > > @@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
> > >          VhostUserLog log;
> > >          struct vhost_iotlb_msg iotlb;
> > >          VhostUserConfig config;
> > > +        VhostUserVringArea area;
> > >      };
> > >  } QEMU_PACKED VhostUserMsg;
> > >  
> > > @@ -380,6 +391,7 @@ Protocol features
> > >  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> > >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> > >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > > +#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
> > >  
> > >  Master message types
> > >  --------------------
> > > @@ -777,6 +789,27 @@ Slave message types
> > >       the VHOST_USER_NEED_REPLY flag, master must respond with zero when
> > >       operation is successfully completed, or non-zero otherwise.
> > >  
> > > + * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
> > > +
> > > +      Id: 3
> > > +      Equivalent ioctl: N/A
> > > +      Slave payload: vring area description
> > > +      Master payload: N/A
> > > +
> > > +      Sets host notifier for a specified queue. The queue index is contained
> > > +      in the u64 field of the vring area description. The host notifier is
> > > +      described by the file descriptor (typically it's a VFIO device fd) which
> > > +      is passed as ancillary data and the size (which is mmap size and should
> > > +      be the same as host page size) and offset (which is mmap offset) carried
> > > +      in the vring area description. QEMU can mmap the file descriptor based
> > > +      on the size and offset to get a memory range. Registering a host notifier
> > > +      means mapping this memory range to the VM as the specified queue's notify
> > > +      MMIO region. Slave sends this request to tell QEMU to de-register the
> > > +      existing notifier if any and register the new notifier if the request is
> > > +      sent with a file descriptor.
> > > +      This request should be sent only when VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
> > > +      protocol feature has been successfully negotiated.
> > > +
> > >  VHOST_USER_PROTOCOL_F_REPLY_ACK:
> > >  -------------------------------
> > >  The original vhost-user specification only demands replies for certain
> > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index 791e0a4763..1cd9c7276b 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -13,6 +13,7 @@
> > >  #include "hw/virtio/vhost.h"
> > >  #include "hw/virtio/vhost-user.h"
> > >  #include "hw/virtio/vhost-backend.h"
> > > +#include "hw/virtio/virtio.h"
> > >  #include "hw/virtio/virtio-net.h"
> > >  #include "chardev/char-fe.h"
> > >  #include "sysemu/kvm.h"
> > > @@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
> > >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> > >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> > >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > > +    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
> > >      VHOST_USER_PROTOCOL_F_MAX
> > >  };
> > >  
> > > @@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
> > >      VHOST_USER_SLAVE_NONE = 0,
> > >      VHOST_USER_SLAVE_IOTLB_MSG = 1,
> > >      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> > > +    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
> > >      VHOST_USER_SLAVE_MAX
> > >  }  VhostUserSlaveRequest;
> > >  
> > > @@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__ ((unused));
> > >                                     + sizeof(c.size) \
> > >                                     + sizeof(c.flags))
> > >  
> > > +typedef struct VhostUserVringArea {
> > > +    uint64_t u64;
> > > +    uint64_t size;
> > > +    uint64_t offset;
> > > +} VhostUserVringArea;
> > > +
> > >  typedef struct {
> > >      VhostUserRequest request;
> > >  
> > > @@ -157,6 +166,7 @@ typedef union {
> > >          struct vhost_iotlb_msg iotlb;
> > >          VhostUserConfig config;
> > >          VhostUserCryptoSession session;
> > > +        VhostUserVringArea area;
> > >  } VhostUserPayload;
> > >  
> > >  typedef struct VhostUserMsg {
> > > @@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
> > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
> > >  }
> > >  
> > > +static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
> > > +                                             int queue_idx)
> > > +{
> > > +    struct vhost_user *u = dev->opaque;
> > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > +    VirtIODevice *vdev = dev->vdev;
> > > +
> > > +    if (n->addr && !n->set) {
> > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
> > > +        n->set = true;
> > > +    }
> > > +}
> > > +
> > > +static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
> > > +                                            int queue_idx)
> > > +{
> > > +    struct vhost_user *u = dev->opaque;
> > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > +    VirtIODevice *vdev = dev->vdev;
> > > +
> > > +    if (n->addr && n->set) {
> > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > > +        n->set = false;
> > > +    }
> > > +}
> > > +
> > >  static int vhost_user_set_vring_base(struct vhost_dev *dev,
> > >                                       struct vhost_vring_state *ring)
> > >  {
> > > +    vhost_user_host_notifier_restore(dev, ring->index);
> > > +
> > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
> > >  }
> > >  
> > > @@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
> > >          .hdr.size = sizeof(msg.payload.state),
> > >      };
> > >  
> > > +    vhost_user_host_notifier_remove(dev, ring->index);
> > > +
> > >      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > >          return -1;
> > >      }
> > > @@ -847,6 +887,76 @@ static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
> > >      return ret;
> > >  }
> > >  
> > > +static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
> > > +                                                       VhostUserVringArea *area,
> > > +                                                       int fd)
> > > +{
> > > +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> > > +    size_t page_size = qemu_real_host_page_size;
> > > +    struct vhost_user *u = dev->opaque;
> > > +    VhostUserState *user = u->user;
> > > +    VirtIODevice *vdev = dev->vdev;
> > > +    VhostUserHostNotifier *n;
> > > +    int ret = 0;
> > > +    void *addr;
> > > +    char *name;
> > > +
> > > +    if (!virtio_has_feature(dev->protocol_features,
> > > +                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
> > > +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    n = &user->notifier[queue_idx];
> > > +
> > > +    if (n->addr) {
> > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > > +        object_unparent(OBJECT(&n->mr));
> > > +        munmap(n->addr, page_size);
> > > +        n->addr = NULL;
> > > +    }
> > > +
> > > +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> > > +        goto out;
> > > +    }
> > > +
> > > +    /* Sanity check. */
> > > +    if (area->size != page_size) {
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
> > > +                fd, area->offset);
> > > +    if (addr == MAP_FAILED) {
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
> > > +                           user, queue_idx);
> > > +    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
> > > +                                      page_size, addr);
> > > +    g_free(name);
> > > +
> > > +    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
> > > +        munmap(addr, page_size);
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    n->addr = addr;
> > > +    n->set = true;
> > > +
> > > +out:
> > > +    /* Always close the fd. */
> > > +    if (fd != -1) {
> > > +        close(fd);
> > > +    }
> > > +    return ret;
> > > +}
> > > +
> > >  static void slave_read(void *opaque)
> > >  {
> > >      struct vhost_dev *dev = opaque;
> > > @@ -913,6 +1023,10 @@ static void slave_read(void *opaque)
> > >      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
> > >          ret = vhost_user_slave_handle_config_change(dev);
> > >          break;
> > > +    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
> > > +        ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
> > > +                                                          fd);
> > > +        break;
> > >      default:
> > >          error_report("Received unexpected msg type.");
> > >          if (fd != -1) {
> > > @@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
> > >  
> > >  void vhost_user_cleanup(VhostUserState *user)
> > >  {
> > > +    int i;
> > > +
> > > +    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> > > +        if (user->notifier[i].addr) {
> > > +            object_unparent(OBJECT(&user->notifier[i].mr));
> > > +            munmap(user->notifier[i].addr, qemu_real_host_page_size);
> > > +            user->notifier[i].addr = NULL;
> > > +        }
> > > +    }
> > >  }
> > >  
> > >  const VhostOps user_ops = {
> > > diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> > > index eb8bc0d90d..fd660393a0 100644
> > > --- a/include/hw/virtio/vhost-user.h
> > > +++ b/include/hw/virtio/vhost-user.h
> > > @@ -9,9 +9,17 @@
> > >  #define HW_VIRTIO_VHOST_USER_H
> > >  
> > >  #include "chardev/char-fe.h"
> > > +#include "hw/virtio/virtio.h"
> > > +
> > > +typedef struct VhostUserHostNotifier {
> > > +    MemoryRegion mr;
> > > +    void *addr;
> > > +    bool set;
> > > +} VhostUserHostNotifier;
> > >  
> > >  typedef struct VhostUserState {
> > >      CharBackend *chr;
> > > +    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > >  } VhostUserState;
> > >  
> > >  VhostUserState *vhost_user_init(void);
> > > -- 
> > > 2.11.0

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 6/6] vhost-user: support registering external host notifiers
@ 2018-05-02 10:32         ` Tiwei Bie
  0 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-02 10:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Apr 19, 2018 at 07:14:39PM +0800, Tiwei Bie wrote:
> On Wed, Apr 18, 2018 at 07:34:06PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Apr 12, 2018 at 11:12:32PM +0800, Tiwei Bie wrote:
> > > This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
> > > With this feature negotiated, vhost-user backend can register
> > > memory region based host notifiers. And it will allow the guest
> > > driver in the VM to notify the hardware accelerator at the
> > > vhost-user backend directly.
> > > 
> > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > 
> > Overall I think we can merge this approach, but I have two main concerns
> > about this:
> > 
> > 1. Testing. Most people do not have the virtio hardware
> >    so how to make sure this does not bit rot?
> > 
> >    I have an idea: add an option like this to libvhost-user.
> >    Naturally libvhost-user can not get notified about a write
> >    to an mmapped area, but it can write a special value there
> >    (e.g. all-ones?) and then poll it to detect VQ # writes.
> > 
> >    Then include a vhost user bridge test with an option like this.
> > 
> >    I'd like to see a patch doing this.
> 
> Sure, I'll do it. Thanks for the suggestion!
> 
> > 
> > 2. Memory barriers. Right now after updating the avail idx,
> >    virtio does smp_wmb() and then the MMIO write.
> >    Normal hardware drivers do wmb() which is an sfence.
> >    Can a PCI device read bypass index write and see a stale
> >    index value?
> 
> It depends on arch's memory model. Cunming will provide
> more details about this later.
> 
> >    To make virtio pci do wmb() we would need a new feature bit.
> >    Alternatively I guess we could maybe look at subsystem vendor/device id.
> > 
> >    I'd like to see a patch doing one of these things.
> 
> We prefer to add a new feature bit as it's a more robust
> way to do this. I'll send out some patches soon.
> 
> Thank you very much! :)

Hi Michael,

It seems that we have started to merge patches for QEMU 2.13.
Currently, all your concerns about this patch set have been
addressed or WIP. Do you think it's possible to get this patch
set merged first? Or do you have any other concerns? Thanks!

Best regards,
Tiwei Bie


> 
> Best regards,
> Tiwei Bie
> 
> > 
> > Thanks!
> > 
> > > ---
> > >  docs/interop/vhost-user.txt    |  33 +++++++++++
> > >  hw/virtio/vhost-user.c         | 123 +++++++++++++++++++++++++++++++++++++++++
> > >  include/hw/virtio/vhost-user.h |   8 +++
> > >  3 files changed, 164 insertions(+)
> > > 
> > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > > index 534caab18a..9e57b36b20 100644
> > > --- a/docs/interop/vhost-user.txt
> > > +++ b/docs/interop/vhost-user.txt
> > > @@ -132,6 +132,16 @@ Depending on the request type, payload can be:
> > >     Payload: Size bytes array holding the contents of the virtio
> > >         device's configuration space
> > >  
> > > + * Vring area description
> > > +   -----------------------
> > > +   | u64 | size | offset |
> > > +   -----------------------
> > > +
> > > +   u64: a 64-bit integer contains vring index and flags
> > > +   Size: a 64-bit size of this area
> > > +   Offset: a 64-bit offset of this area from the start of the
> > > +       supplied file descriptor
> > > +
> > >  In QEMU the vhost-user message is implemented with the following struct:
> > >  
> > >  typedef struct VhostUserMsg {
> > > @@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
> > >          VhostUserLog log;
> > >          struct vhost_iotlb_msg iotlb;
> > >          VhostUserConfig config;
> > > +        VhostUserVringArea area;
> > >      };
> > >  } QEMU_PACKED VhostUserMsg;
> > >  
> > > @@ -380,6 +391,7 @@ Protocol features
> > >  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> > >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> > >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > > +#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
> > >  
> > >  Master message types
> > >  --------------------
> > > @@ -777,6 +789,27 @@ Slave message types
> > >       the VHOST_USER_NEED_REPLY flag, master must respond with zero when
> > >       operation is successfully completed, or non-zero otherwise.
> > >  
> > > + * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
> > > +
> > > +      Id: 3
> > > +      Equivalent ioctl: N/A
> > > +      Slave payload: vring area description
> > > +      Master payload: N/A
> > > +
> > > +      Sets host notifier for a specified queue. The queue index is contained
> > > +      in the u64 field of the vring area description. The host notifier is
> > > +      described by the file descriptor (typically it's a VFIO device fd) which
> > > +      is passed as ancillary data and the size (which is mmap size and should
> > > +      be the same as host page size) and offset (which is mmap offset) carried
> > > +      in the vring area description. QEMU can mmap the file descriptor based
> > > +      on the size and offset to get a memory range. Registering a host notifier
> > > +      means mapping this memory range to the VM as the specified queue's notify
> > > +      MMIO region. Slave sends this request to tell QEMU to de-register the
> > > +      existing notifier if any and register the new notifier if the request is
> > > +      sent with a file descriptor.
> > > +      This request should be sent only when VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
> > > +      protocol feature has been successfully negotiated.
> > > +
> > >  VHOST_USER_PROTOCOL_F_REPLY_ACK:
> > >  -------------------------------
> > >  The original vhost-user specification only demands replies for certain
> > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index 791e0a4763..1cd9c7276b 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -13,6 +13,7 @@
> > >  #include "hw/virtio/vhost.h"
> > >  #include "hw/virtio/vhost-user.h"
> > >  #include "hw/virtio/vhost-backend.h"
> > > +#include "hw/virtio/virtio.h"
> > >  #include "hw/virtio/virtio-net.h"
> > >  #include "chardev/char-fe.h"
> > >  #include "sysemu/kvm.h"
> > > @@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
> > >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> > >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> > >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > > +    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
> > >      VHOST_USER_PROTOCOL_F_MAX
> > >  };
> > >  
> > > @@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
> > >      VHOST_USER_SLAVE_NONE = 0,
> > >      VHOST_USER_SLAVE_IOTLB_MSG = 1,
> > >      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> > > +    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
> > >      VHOST_USER_SLAVE_MAX
> > >  }  VhostUserSlaveRequest;
> > >  
> > > @@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__ ((unused));
> > >                                     + sizeof(c.size) \
> > >                                     + sizeof(c.flags))
> > >  
> > > +typedef struct VhostUserVringArea {
> > > +    uint64_t u64;
> > > +    uint64_t size;
> > > +    uint64_t offset;
> > > +} VhostUserVringArea;
> > > +
> > >  typedef struct {
> > >      VhostUserRequest request;
> > >  
> > > @@ -157,6 +166,7 @@ typedef union {
> > >          struct vhost_iotlb_msg iotlb;
> > >          VhostUserConfig config;
> > >          VhostUserCryptoSession session;
> > > +        VhostUserVringArea area;
> > >  } VhostUserPayload;
> > >  
> > >  typedef struct VhostUserMsg {
> > > @@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
> > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
> > >  }
> > >  
> > > +static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
> > > +                                             int queue_idx)
> > > +{
> > > +    struct vhost_user *u = dev->opaque;
> > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > +    VirtIODevice *vdev = dev->vdev;
> > > +
> > > +    if (n->addr && !n->set) {
> > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
> > > +        n->set = true;
> > > +    }
> > > +}
> > > +
> > > +static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
> > > +                                            int queue_idx)
> > > +{
> > > +    struct vhost_user *u = dev->opaque;
> > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > +    VirtIODevice *vdev = dev->vdev;
> > > +
> > > +    if (n->addr && n->set) {
> > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > > +        n->set = false;
> > > +    }
> > > +}
> > > +
> > >  static int vhost_user_set_vring_base(struct vhost_dev *dev,
> > >                                       struct vhost_vring_state *ring)
> > >  {
> > > +    vhost_user_host_notifier_restore(dev, ring->index);
> > > +
> > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
> > >  }
> > >  
> > > @@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
> > >          .hdr.size = sizeof(msg.payload.state),
> > >      };
> > >  
> > > +    vhost_user_host_notifier_remove(dev, ring->index);
> > > +
> > >      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > >          return -1;
> > >      }
> > > @@ -847,6 +887,76 @@ static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
> > >      return ret;
> > >  }
> > >  
> > > +static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
> > > +                                                       VhostUserVringArea *area,
> > > +                                                       int fd)
> > > +{
> > > +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> > > +    size_t page_size = qemu_real_host_page_size;
> > > +    struct vhost_user *u = dev->opaque;
> > > +    VhostUserState *user = u->user;
> > > +    VirtIODevice *vdev = dev->vdev;
> > > +    VhostUserHostNotifier *n;
> > > +    int ret = 0;
> > > +    void *addr;
> > > +    char *name;
> > > +
> > > +    if (!virtio_has_feature(dev->protocol_features,
> > > +                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
> > > +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    n = &user->notifier[queue_idx];
> > > +
> > > +    if (n->addr) {
> > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > > +        object_unparent(OBJECT(&n->mr));
> > > +        munmap(n->addr, page_size);
> > > +        n->addr = NULL;
> > > +    }
> > > +
> > > +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> > > +        goto out;
> > > +    }
> > > +
> > > +    /* Sanity check. */
> > > +    if (area->size != page_size) {
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
> > > +                fd, area->offset);
> > > +    if (addr == MAP_FAILED) {
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
> > > +                           user, queue_idx);
> > > +    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
> > > +                                      page_size, addr);
> > > +    g_free(name);
> > > +
> > > +    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
> > > +        munmap(addr, page_size);
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    n->addr = addr;
> > > +    n->set = true;
> > > +
> > > +out:
> > > +    /* Always close the fd. */
> > > +    if (fd != -1) {
> > > +        close(fd);
> > > +    }
> > > +    return ret;
> > > +}
> > > +
> > >  static void slave_read(void *opaque)
> > >  {
> > >      struct vhost_dev *dev = opaque;
> > > @@ -913,6 +1023,10 @@ static void slave_read(void *opaque)
> > >      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
> > >          ret = vhost_user_slave_handle_config_change(dev);
> > >          break;
> > > +    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
> > > +        ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
> > > +                                                          fd);
> > > +        break;
> > >      default:
> > >          error_report("Received unexpected msg type.");
> > >          if (fd != -1) {
> > > @@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
> > >  
> > >  void vhost_user_cleanup(VhostUserState *user)
> > >  {
> > > +    int i;
> > > +
> > > +    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> > > +        if (user->notifier[i].addr) {
> > > +            object_unparent(OBJECT(&user->notifier[i].mr));
> > > +            munmap(user->notifier[i].addr, qemu_real_host_page_size);
> > > +            user->notifier[i].addr = NULL;
> > > +        }
> > > +    }
> > >  }
> > >  
> > >  const VhostOps user_ops = {
> > > diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> > > index eb8bc0d90d..fd660393a0 100644
> > > --- a/include/hw/virtio/vhost-user.h
> > > +++ b/include/hw/virtio/vhost-user.h
> > > @@ -9,9 +9,17 @@
> > >  #define HW_VIRTIO_VHOST_USER_H
> > >  
> > >  #include "chardev/char-fe.h"
> > > +#include "hw/virtio/virtio.h"
> > > +
> > > +typedef struct VhostUserHostNotifier {
> > > +    MemoryRegion mr;
> > > +    void *addr;
> > > +    bool set;
> > > +} VhostUserHostNotifier;
> > >  
> > >  typedef struct VhostUserState {
> > >      CharBackend *chr;
> > > +    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > >  } VhostUserState;
> > >  
> > >  VhostUserState *vhost_user_init(void);
> > > -- 
> > > 2.11.0

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/6] Extend vhost-user to support registering external host notifiers
  2018-04-12 15:12 ` [virtio-dev] " Tiwei Bie
@ 2018-05-16  1:41   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-05-16  1:41 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Apr 12, 2018 at 11:12:26PM +0800, Tiwei Bie wrote:
> The original subject is: Extend vhost-user to support VFIO based accelerators
> 
> Update notes
> ============
> 
> Now, this patch set just focuses on adding the support for
> registering memory region based host notifiers. With this
> support, guest driver in the VM will be able to notify the
> hardware device at the vhost backend directly.
> 
> It's one of the most important things in vDPA -- the host
> notification offload. Because, normally, the hardware device
> heavily depends on the notifications. Without this support,
> there will be a lot of VM-Exit happen due to the notifications
> from guest driver (it will drop the VM performance) and a
> lot of CPU resources wasted to do the notification relay
> (it will make the hardware offload less attractive, because
> one important goal of hardware offload is to free the CPU
> resources).
> 
> More backgrounds of this patch set can be found from the
> cover letter of the previous versions:
> 
> RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
> v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html
> v2:  http://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg05009.html

So I don't think there are any outstanding issues here.
Did I miss anything?

> v2 -> v3:
> - A better implementation of the shared vhost-user state (MST);
> - Use bus callback to add/delete subregions for notification (MST);
> - Refine APIs' names which add/delete subregions for notification (MST);
> - Refine the doc of the new vhost-user types and messages (MST);
> - Separate host notification offload from the guest notification offload (MST);
> - Drop the guest notification offload support from this patch;
> - Add memory filter for vhost backend to filter the sections they can handle;
> 
> v1 -> v2:
> - Add some explanations about why extend vhost-user in commit log (Paolo);
> - Bug fix in slave_read() according to Stefan's fix in DPDK;
> - Remove IOMMU feature check and related commit log;
> - Some minor refinements;
> - Rebase to the latest QEMU;
> 
> RFC -> v1:
> - Add some details about how vDPA works in cover letter (Alexey)
> - Add some details about the OVS offload use-case in cover letter (Jason)
> - Move PCI specific stuffs out of vhost-user (Jason)
> - Handle the virtual IOMMU case (Jason)
> - Move VFIO group management code into vfio/common.c (Alex)
> - Various refinements;
> (approximately sorted by comment posting time)
> 
> Tiwei Bie (6):
>   vhost-user: add Net prefix to internal state structure
>   vhost-user: introduce shared vhost-user state
>   vhost-user: support receiving file descriptors in slave_read
>   virtio: support setting memory region based host notifier
>   vhost: allow backends to filter memory sections
>   vhost-user: support registering external host notifiers
> 
>  backends/cryptodev-vhost-user.c     |  20 +++-
>  docs/interop/vhost-user.txt         |  33 ++++++
>  hw/block/vhost-user-blk.c           |  22 +++-
>  hw/scsi/vhost-user-scsi.c           |  20 +++-
>  hw/virtio/Makefile.objs             |   2 +-
>  hw/virtio/vhost-stub.c              |  10 ++
>  hw/virtio/vhost-user.c              | 206 ++++++++++++++++++++++++++++++++++--
>  hw/virtio/vhost.c                   |   9 +-
>  hw/virtio/virtio-pci.c              |  22 ++++
>  hw/virtio/virtio.c                  |  13 +++
>  include/hw/virtio/vhost-backend.h   |   4 +
>  include/hw/virtio/vhost-user-blk.h  |   2 +
>  include/hw/virtio/vhost-user-scsi.h |   2 +
>  include/hw/virtio/vhost-user.h      |  28 +++++
>  include/hw/virtio/virtio-bus.h      |   2 +
>  include/hw/virtio/virtio.h          |   2 +
>  net/vhost-user.c                    |  78 +++++++++-----
>  17 files changed, 433 insertions(+), 42 deletions(-)
>  create mode 100644 include/hw/virtio/vhost-user.h
> 
> -- 
> 2.11.0

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 0/6] Extend vhost-user to support registering external host notifiers
@ 2018-05-16  1:41   ` Michael S. Tsirkin
  0 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-05-16  1:41 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Apr 12, 2018 at 11:12:26PM +0800, Tiwei Bie wrote:
> The original subject is: Extend vhost-user to support VFIO based accelerators
> 
> Update notes
> ============
> 
> Now, this patch set just focuses on adding the support for
> registering memory region based host notifiers. With this
> support, guest driver in the VM will be able to notify the
> hardware device at the vhost backend directly.
> 
> It's one of the most important things in vDPA -- the host
> notification offload. Because, normally, the hardware device
> heavily depends on the notifications. Without this support,
> there will be a lot of VM-Exit happen due to the notifications
> from guest driver (it will drop the VM performance) and a
> lot of CPU resources wasted to do the notification relay
> (it will make the hardware offload less attractive, because
> one important goal of hardware offload is to free the CPU
> resources).
> 
> More backgrounds of this patch set can be found from the
> cover letter of the previous versions:
> 
> RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
> v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html
> v2:  http://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg05009.html

So I don't think there are any outstanding issues here.
Did I miss anything?

> v2 -> v3:
> - A better implementation of the shared vhost-user state (MST);
> - Use bus callback to add/delete subregions for notification (MST);
> - Refine APIs' names which add/delete subregions for notification (MST);
> - Refine the doc of the new vhost-user types and messages (MST);
> - Separate host notification offload from the guest notification offload (MST);
> - Drop the guest notification offload support from this patch;
> - Add memory filter for vhost backend to filter the sections they can handle;
> 
> v1 -> v2:
> - Add some explanations about why extend vhost-user in commit log (Paolo);
> - Bug fix in slave_read() according to Stefan's fix in DPDK;
> - Remove IOMMU feature check and related commit log;
> - Some minor refinements;
> - Rebase to the latest QEMU;
> 
> RFC -> v1:
> - Add some details about how vDPA works in cover letter (Alexey)
> - Add some details about the OVS offload use-case in cover letter (Jason)
> - Move PCI specific stuffs out of vhost-user (Jason)
> - Handle the virtual IOMMU case (Jason)
> - Move VFIO group management code into vfio/common.c (Alex)
> - Various refinements;
> (approximately sorted by comment posting time)
> 
> Tiwei Bie (6):
>   vhost-user: add Net prefix to internal state structure
>   vhost-user: introduce shared vhost-user state
>   vhost-user: support receiving file descriptors in slave_read
>   virtio: support setting memory region based host notifier
>   vhost: allow backends to filter memory sections
>   vhost-user: support registering external host notifiers
> 
>  backends/cryptodev-vhost-user.c     |  20 +++-
>  docs/interop/vhost-user.txt         |  33 ++++++
>  hw/block/vhost-user-blk.c           |  22 +++-
>  hw/scsi/vhost-user-scsi.c           |  20 +++-
>  hw/virtio/Makefile.objs             |   2 +-
>  hw/virtio/vhost-stub.c              |  10 ++
>  hw/virtio/vhost-user.c              | 206 ++++++++++++++++++++++++++++++++++--
>  hw/virtio/vhost.c                   |   9 +-
>  hw/virtio/virtio-pci.c              |  22 ++++
>  hw/virtio/virtio.c                  |  13 +++
>  include/hw/virtio/vhost-backend.h   |   4 +
>  include/hw/virtio/vhost-user-blk.h  |   2 +
>  include/hw/virtio/vhost-user-scsi.h |   2 +
>  include/hw/virtio/vhost-user.h      |  28 +++++
>  include/hw/virtio/virtio-bus.h      |   2 +
>  include/hw/virtio/virtio.h          |   2 +
>  net/vhost-user.c                    |  78 +++++++++-----
>  17 files changed, 433 insertions(+), 42 deletions(-)
>  create mode 100644 include/hw/virtio/vhost-user.h
> 
> -- 
> 2.11.0

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/6] Extend vhost-user to support registering external host notifiers
  2018-05-16  1:41   ` [virtio-dev] " Michael S. Tsirkin
@ 2018-05-16  1:56     ` Tiwei Bie
  -1 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-16  1:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Wed, May 16, 2018 at 04:41:48AM +0300, Michael S. Tsirkin wrote:
> On Thu, Apr 12, 2018 at 11:12:26PM +0800, Tiwei Bie wrote:
> > The original subject is: Extend vhost-user to support VFIO based accelerators
> > 
> > Update notes
> > ============
> > 
> > Now, this patch set just focuses on adding the support for
> > registering memory region based host notifiers. With this
> > support, guest driver in the VM will be able to notify the
> > hardware device at the vhost backend directly.
> > 
> > It's one of the most important things in vDPA -- the host
> > notification offload. Because, normally, the hardware device
> > heavily depends on the notifications. Without this support,
> > there will be a lot of VM-Exit happen due to the notifications
> > from guest driver (it will drop the VM performance) and a
> > lot of CPU resources wasted to do the notification relay
> > (it will make the hardware offload less attractive, because
> > one important goal of hardware offload is to free the CPU
> > resources).
> > 
> > More backgrounds of this patch set can be found from the
> > cover letter of the previous versions:
> > 
> > RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
> > v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html
> > v2:  http://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg05009.html
> 
> So I don't think there are any outstanding issues here.
> Did I miss anything?

For the host notifiers support in vhost-user, it's
everything here!

The support in libvhost-user and vhost-user-bridge
is being done in another patchset. And we are also
working on the vIOMMU support. If we really see any
new issues in the future, I'll fix them in small
patches actively.

Thank you very much!

Best regards,
Tiwei Bie

> 
> > v2 -> v3:
> > - A better implementation of the shared vhost-user state (MST);
> > - Use bus callback to add/delete subregions for notification (MST);
> > - Refine APIs' names which add/delete subregions for notification (MST);
> > - Refine the doc of the new vhost-user types and messages (MST);
> > - Separate host notification offload from the guest notification offload (MST);
> > - Drop the guest notification offload support from this patch;
> > - Add memory filter for vhost backend to filter the sections they can handle;
> > 
> > v1 -> v2:
> > - Add some explanations about why extend vhost-user in commit log (Paolo);
> > - Bug fix in slave_read() according to Stefan's fix in DPDK;
> > - Remove IOMMU feature check and related commit log;
> > - Some minor refinements;
> > - Rebase to the latest QEMU;
> > 
> > RFC -> v1:
> > - Add some details about how vDPA works in cover letter (Alexey)
> > - Add some details about the OVS offload use-case in cover letter (Jason)
> > - Move PCI specific stuffs out of vhost-user (Jason)
> > - Handle the virtual IOMMU case (Jason)
> > - Move VFIO group management code into vfio/common.c (Alex)
> > - Various refinements;
> > (approximately sorted by comment posting time)
> > 
> > Tiwei Bie (6):
> >   vhost-user: add Net prefix to internal state structure
> >   vhost-user: introduce shared vhost-user state
> >   vhost-user: support receiving file descriptors in slave_read
> >   virtio: support setting memory region based host notifier
> >   vhost: allow backends to filter memory sections
> >   vhost-user: support registering external host notifiers
> > 
> >  backends/cryptodev-vhost-user.c     |  20 +++-
> >  docs/interop/vhost-user.txt         |  33 ++++++
> >  hw/block/vhost-user-blk.c           |  22 +++-
> >  hw/scsi/vhost-user-scsi.c           |  20 +++-
> >  hw/virtio/Makefile.objs             |   2 +-
> >  hw/virtio/vhost-stub.c              |  10 ++
> >  hw/virtio/vhost-user.c              | 206 ++++++++++++++++++++++++++++++++++--
> >  hw/virtio/vhost.c                   |   9 +-
> >  hw/virtio/virtio-pci.c              |  22 ++++
> >  hw/virtio/virtio.c                  |  13 +++
> >  include/hw/virtio/vhost-backend.h   |   4 +
> >  include/hw/virtio/vhost-user-blk.h  |   2 +
> >  include/hw/virtio/vhost-user-scsi.h |   2 +
> >  include/hw/virtio/vhost-user.h      |  28 +++++
> >  include/hw/virtio/virtio-bus.h      |   2 +
> >  include/hw/virtio/virtio.h          |   2 +
> >  net/vhost-user.c                    |  78 +++++++++-----
> >  17 files changed, 433 insertions(+), 42 deletions(-)
> >  create mode 100644 include/hw/virtio/vhost-user.h
> > 
> > -- 
> > 2.11.0

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 0/6] Extend vhost-user to support registering external host notifiers
@ 2018-05-16  1:56     ` Tiwei Bie
  0 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-16  1:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Wed, May 16, 2018 at 04:41:48AM +0300, Michael S. Tsirkin wrote:
> On Thu, Apr 12, 2018 at 11:12:26PM +0800, Tiwei Bie wrote:
> > The original subject is: Extend vhost-user to support VFIO based accelerators
> > 
> > Update notes
> > ============
> > 
> > Now, this patch set just focuses on adding the support for
> > registering memory region based host notifiers. With this
> > support, guest driver in the VM will be able to notify the
> > hardware device at the vhost backend directly.
> > 
> > It's one of the most important things in vDPA -- the host
> > notification offload. Because, normally, the hardware device
> > heavily depends on the notifications. Without this support,
> > there will be a lot of VM-Exit happen due to the notifications
> > from guest driver (it will drop the VM performance) and a
> > lot of CPU resources wasted to do the notification relay
> > (it will make the hardware offload less attractive, because
> > one important goal of hardware offload is to free the CPU
> > resources).
> > 
> > More backgrounds of this patch set can be found from the
> > cover letter of the previous versions:
> > 
> > RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
> > v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html
> > v2:  http://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg05009.html
> 
> So I don't think there are any outstanding issues here.
> Did I miss anything?

For the host notifiers support in vhost-user, it's
everything here!

The support in libvhost-user and vhost-user-bridge
is being done in another patchset. And we are also
working on the vIOMMU support. If we really see any
new issues in the future, I'll fix them in small
patches actively.

Thank you very much!

Best regards,
Tiwei Bie

> 
> > v2 -> v3:
> > - A better implementation of the shared vhost-user state (MST);
> > - Use bus callback to add/delete subregions for notification (MST);
> > - Refine APIs' names which add/delete subregions for notification (MST);
> > - Refine the doc of the new vhost-user types and messages (MST);
> > - Separate host notification offload from the guest notification offload (MST);
> > - Drop the guest notification offload support from this patch;
> > - Add memory filter for vhost backend to filter the sections they can handle;
> > 
> > v1 -> v2:
> > - Add some explanations about why extend vhost-user in commit log (Paolo);
> > - Bug fix in slave_read() according to Stefan's fix in DPDK;
> > - Remove IOMMU feature check and related commit log;
> > - Some minor refinements;
> > - Rebase to the latest QEMU;
> > 
> > RFC -> v1:
> > - Add some details about how vDPA works in cover letter (Alexey)
> > - Add some details about the OVS offload use-case in cover letter (Jason)
> > - Move PCI specific stuffs out of vhost-user (Jason)
> > - Handle the virtual IOMMU case (Jason)
> > - Move VFIO group management code into vfio/common.c (Alex)
> > - Various refinements;
> > (approximately sorted by comment posting time)
> > 
> > Tiwei Bie (6):
> >   vhost-user: add Net prefix to internal state structure
> >   vhost-user: introduce shared vhost-user state
> >   vhost-user: support receiving file descriptors in slave_read
> >   virtio: support setting memory region based host notifier
> >   vhost: allow backends to filter memory sections
> >   vhost-user: support registering external host notifiers
> > 
> >  backends/cryptodev-vhost-user.c     |  20 +++-
> >  docs/interop/vhost-user.txt         |  33 ++++++
> >  hw/block/vhost-user-blk.c           |  22 +++-
> >  hw/scsi/vhost-user-scsi.c           |  20 +++-
> >  hw/virtio/Makefile.objs             |   2 +-
> >  hw/virtio/vhost-stub.c              |  10 ++
> >  hw/virtio/vhost-user.c              | 206 ++++++++++++++++++++++++++++++++++--
> >  hw/virtio/vhost.c                   |   9 +-
> >  hw/virtio/virtio-pci.c              |  22 ++++
> >  hw/virtio/virtio.c                  |  13 +++
> >  include/hw/virtio/vhost-backend.h   |   4 +
> >  include/hw/virtio/vhost-user-blk.h  |   2 +
> >  include/hw/virtio/vhost-user-scsi.h |   2 +
> >  include/hw/virtio/vhost-user.h      |  28 +++++
> >  include/hw/virtio/virtio-bus.h      |   2 +
> >  include/hw/virtio/virtio.h          |   2 +
> >  net/vhost-user.c                    |  78 +++++++++-----
> >  17 files changed, 433 insertions(+), 42 deletions(-)
> >  create mode 100644 include/hw/virtio/vhost-user.h
> > 
> > -- 
> > 2.11.0

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
  2018-04-12 15:12   ` [virtio-dev] " Tiwei Bie
@ 2018-05-23 13:44     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-05-23 13:44 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> When multi queue is enabled e.g. for a virtio-net device,
> each queue pair will have a vhost_dev, and the only thing
> shared between vhost devs currently is the chardev. This
> patch introduces a vhost-user state structure which will
> be shared by all vhost devs of the same virtio device.
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>

Unfortunately this patch seems to cause crashes.
To reproduce, simply run
make check-qtest-x86_64

Sorry that it took me a while to find - it triggers 90% of runs but not
100% which complicates bisection somewhat.

> ---
>  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
>  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
>  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
>  hw/virtio/Makefile.objs             |  2 +-
>  hw/virtio/vhost-stub.c              | 10 ++++++++++
>  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
>  include/hw/virtio/vhost-user-blk.h  |  2 ++
>  include/hw/virtio/vhost-user-scsi.h |  2 ++
>  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
>  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
>  10 files changed, 149 insertions(+), 20 deletions(-)
>  create mode 100644 include/hw/virtio/vhost-user.h
> 
> diff --git a/backends/cryptodev-vhost-user.c b/backends/cryptodev-vhost-user.c
> index 862d4f2580..d52daccfcd 100644
> --- a/backends/cryptodev-vhost-user.c
> +++ b/backends/cryptodev-vhost-user.c
> @@ -26,6 +26,7 @@
>  #include "qapi/error.h"
>  #include "qapi/qmp/qerror.h"
>  #include "qemu/error-report.h"
> +#include "hw/virtio/vhost-user.h"
>  #include "standard-headers/linux/virtio_crypto.h"
>  #include "sysemu/cryptodev-vhost.h"
>  #include "chardev/char-fe.h"
> @@ -46,6 +47,7 @@
>  typedef struct CryptoDevBackendVhostUser {
>      CryptoDevBackend parent_obj;
>  
> +    VhostUserState *vhost_user;
>      CharBackend chr;
>      char *chr_name;
>      bool opened;
> @@ -102,7 +104,7 @@ cryptodev_vhost_user_start(int queues,
>              continue;
>          }
>  
> -        options.opaque = &s->chr;
> +        options.opaque = s->vhost_user;
>          options.backend_type = VHOST_BACKEND_TYPE_USER;
>          options.cc = b->conf.peers.ccs[i];
>          s->vhost_crypto[i] = cryptodev_vhost_init(&options);
> @@ -185,6 +187,7 @@ static void cryptodev_vhost_user_init(
>      size_t i;
>      Error *local_err = NULL;
>      Chardev *chr;
> +    VhostUserState *user;
>      CryptoDevBackendClient *cc;
>      CryptoDevBackendVhostUser *s =
>                        CRYPTODEV_BACKEND_VHOST_USER(backend);
> @@ -215,6 +218,15 @@ static void cryptodev_vhost_user_init(
>          }
>      }
>  
> +    user = vhost_user_init();
> +    if (!user) {
> +        error_setg(errp, "Failed to init vhost_user");
> +        return;
> +    }
> +
> +    user->chr = &s->chr;
> +    s->vhost_user = user;
> +
>      qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
>                       cryptodev_vhost_user_event, NULL, s, NULL, true);
>  
> @@ -299,6 +311,12 @@ static void cryptodev_vhost_user_cleanup(
>              backend->conf.peers.ccs[i] = NULL;
>          }
>      }
> +
> +    if (s->vhost_user) {
> +        vhost_user_cleanup(s->vhost_user);
> +        g_free(s->vhost_user);
> +        s->vhost_user = NULL;
> +    }
>  }
>  
>  static void cryptodev_vhost_user_set_chardev(Object *obj,
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 262baca432..4021d71c31 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -229,6 +229,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> +    VhostUserState *user;
>      int i, ret;
>  
>      if (!s->chardev.chr) {
> @@ -246,6 +247,15 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>          return;
>      }
>  
> +    user = vhost_user_init();
> +    if (!user) {
> +        error_setg(errp, "vhost-user-blk: failed to init vhost_user");
> +        return;
> +    }
> +
> +    user->chr = &s->chardev;
> +    s->vhost_user = user;
> +
>      virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
>                  sizeof(struct virtio_blk_config));
>  
> @@ -261,7 +271,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>  
>      vhost_dev_set_config_notifier(&s->dev, &blk_ops);
>  
> -    ret = vhost_dev_init(&s->dev, &s->chardev, VHOST_BACKEND_TYPE_USER, 0);
> +    ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
>      if (ret < 0) {
>          error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
>                     strerror(-ret));
> @@ -286,6 +296,10 @@ vhost_err:
>  virtio_err:
>      g_free(s->dev.vqs);
>      virtio_cleanup(vdev);
> +
> +    vhost_user_cleanup(user);
> +    g_free(user);
> +    s->vhost_user = NULL;
>  }
>  
>  static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
> @@ -297,6 +311,12 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
>      vhost_dev_cleanup(&s->dev);
>      g_free(s->dev.vqs);
>      virtio_cleanup(vdev);
> +
> +    if (s->vhost_user) {
> +        vhost_user_cleanup(s->vhost_user);
> +        g_free(s->vhost_user);
> +        s->vhost_user = NULL;
> +    }
>  }
>  
>  static void vhost_user_blk_instance_init(Object *obj)
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index 9389ed48e0..9355cfdf07 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -69,6 +69,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
>      VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
>      VHostUserSCSI *s = VHOST_USER_SCSI(dev);
>      VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +    VhostUserState *user;
>      Error *err = NULL;
>      int ret;
>  
> @@ -85,19 +86,30 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
>          return;
>      }
>  
> +    user = vhost_user_init();
> +    if (!user) {
> +        error_setg(errp, "vhost-user-scsi: failed to init vhost_user");
> +        return;
> +    }
> +    user->chr = &vs->conf.chardev;
> +
>      vsc->dev.nvqs = 2 + vs->conf.num_queues;
>      vsc->dev.vqs = g_new(struct vhost_virtqueue, vsc->dev.nvqs);
>      vsc->dev.vq_index = 0;
>      vsc->dev.backend_features = 0;
>  
> -    ret = vhost_dev_init(&vsc->dev, (void *)&vs->conf.chardev,
> +    ret = vhost_dev_init(&vsc->dev, user,
>                           VHOST_BACKEND_TYPE_USER, 0);
>      if (ret < 0) {
>          error_setg(errp, "vhost-user-scsi: vhost initialization failed: %s",
>                     strerror(-ret));
> +        vhost_user_cleanup(user);
> +        g_free(user);
>          return;
>      }
>  
> +    s->vhost_user = user;
> +
>      /* Channel and lun both are 0 for bootable vhost-user-scsi disk */
>      vsc->channel = 0;
>      vsc->lun = 0;
> @@ -117,6 +129,12 @@ static void vhost_user_scsi_unrealize(DeviceState *dev, Error **errp)
>      g_free(vsc->dev.vqs);
>  
>      virtio_scsi_common_unrealize(dev, errp);
> +
> +    if (s->vhost_user) {
> +        vhost_user_cleanup(s->vhost_user);
> +        g_free(s->vhost_user);
> +        s->vhost_user = NULL;
> +    }
>  }
>  
>  static uint64_t vhost_user_scsi_get_features(VirtIODevice *vdev,
> diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
> index 765d363c1f..030969e28c 100644
> --- a/hw/virtio/Makefile.objs
> +++ b/hw/virtio/Makefile.objs
> @@ -11,5 +11,5 @@ obj-y += virtio-crypto.o
>  obj-$(CONFIG_VIRTIO_PCI) += virtio-crypto-pci.o
>  endif
>  
> -common-obj-$(call lnot,$(CONFIG_LINUX)) += vhost-stub.o
> +common-obj-$(call lnot,$(call land,$(CONFIG_VIRTIO),$(CONFIG_LINUX))) += vhost-stub.o
>  common-obj-$(CONFIG_ALL) += vhost-stub.o
> diff --git a/hw/virtio/vhost-stub.c b/hw/virtio/vhost-stub.c
> index 2d76cdebdc..049089b5e2 100644
> --- a/hw/virtio/vhost-stub.c
> +++ b/hw/virtio/vhost-stub.c
> @@ -1,7 +1,17 @@
>  #include "qemu/osdep.h"
>  #include "hw/virtio/vhost.h"
> +#include "hw/virtio/vhost-user.h"
>  
>  bool vhost_has_free_slot(void)
>  {
>      return true;
>  }
> +
> +VhostUserState *vhost_user_init(void)
> +{
> +    return NULL;
> +}
> +
> +void vhost_user_cleanup(VhostUserState *user)
> +{
> +}
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 38da8692bb..91edd95453 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -11,6 +11,7 @@
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
>  #include "hw/virtio/vhost.h"
> +#include "hw/virtio/vhost-user.h"
>  #include "hw/virtio/vhost-backend.h"
>  #include "hw/virtio/virtio-net.h"
>  #include "chardev/char-fe.h"
> @@ -173,7 +174,8 @@ static VhostUserMsg m __attribute__ ((unused));
>  
>  struct vhost_user {
>      struct vhost_dev *dev;
> -    CharBackend *chr;
> +    /* Shared between vhost devs of the same virtio device */
> +    VhostUserState *user;
>      int slave_fd;
>      NotifierWithReturn postcopy_notifier;
>      struct PostCopyFD  postcopy_fd;
> @@ -199,7 +201,7 @@ static bool ioeventfd_enabled(void)
>  static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
>  {
>      struct vhost_user *u = dev->opaque;
> -    CharBackend *chr = u->chr;
> +    CharBackend *chr = u->user->chr;
>      uint8_t *p = (uint8_t *) msg;
>      int r, size = VHOST_USER_HDR_SIZE;
>  
> @@ -285,7 +287,7 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
>                              int *fds, int fd_num)
>  {
>      struct vhost_user *u = dev->opaque;
> -    CharBackend *chr = u->chr;
> +    CharBackend *chr = u->user->chr;
>      int ret, size = VHOST_USER_HDR_SIZE + msg->hdr.size;
>  
>      /*
> @@ -1044,7 +1046,7 @@ static int vhost_user_postcopy_waker(struct PostCopyFD *pcfd, RAMBlock *rb,
>  static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
>  {
>      struct vhost_user *u = dev->opaque;
> -    CharBackend *chr = u->chr;
> +    CharBackend *chr = u->user->chr;
>      int ufd;
>      VhostUserMsg msg = {
>          .hdr.request = VHOST_USER_POSTCOPY_ADVISE,
> @@ -1182,7 +1184,7 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
>      return 0;
>  }
>  
> -static int vhost_user_init(struct vhost_dev *dev, void *opaque)
> +static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
>  {
>      uint64_t features, protocol_features;
>      struct vhost_user *u;
> @@ -1191,7 +1193,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
>      assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
>  
>      u = g_new0(struct vhost_user, 1);
> -    u->chr = opaque;
> +    u->user = opaque;
>      u->slave_fd = -1;
>      u->dev = dev;
>      dev->opaque = u;
> @@ -1267,7 +1269,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
>      return 0;
>  }
>  
> -static int vhost_user_cleanup(struct vhost_dev *dev)
> +static int vhost_user_backend_cleanup(struct vhost_dev *dev)
>  {
>      struct vhost_user *u;
>  
> @@ -1581,10 +1583,21 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
>      return 0;
>  }
>  
> +VhostUserState *vhost_user_init(void)
> +{
> +    VhostUserState *user = g_new0(struct VhostUserState, 1);
> +
> +    return user;
> +}
> +
> +void vhost_user_cleanup(VhostUserState *user)
> +{
> +}
> +
>  const VhostOps user_ops = {
>          .backend_type = VHOST_BACKEND_TYPE_USER,
> -        .vhost_backend_init = vhost_user_init,
> -        .vhost_backend_cleanup = vhost_user_cleanup,
> +        .vhost_backend_init = vhost_user_backend_init,
> +        .vhost_backend_cleanup = vhost_user_backend_cleanup,
>          .vhost_backend_memslots_limit = vhost_user_memslots_limit,
>          .vhost_set_log_base = vhost_user_set_log_base,
>          .vhost_set_mem_table = vhost_user_set_mem_table,
> diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
> index 5804cc904a..f1258ae545 100644
> --- a/include/hw/virtio/vhost-user-blk.h
> +++ b/include/hw/virtio/vhost-user-blk.h
> @@ -21,6 +21,7 @@
>  #include "hw/block/block.h"
>  #include "chardev/char-fe.h"
>  #include "hw/virtio/vhost.h"
> +#include "hw/virtio/vhost-user.h"
>  
>  #define TYPE_VHOST_USER_BLK "vhost-user-blk"
>  #define VHOST_USER_BLK(obj) \
> @@ -36,6 +37,7 @@ typedef struct VHostUserBlk {
>      uint32_t config_wce;
>      uint32_t config_ro;
>      struct vhost_dev dev;
> +    VhostUserState *vhost_user;
>  } VHostUserBlk;
>  
>  #endif
> diff --git a/include/hw/virtio/vhost-user-scsi.h b/include/hw/virtio/vhost-user-scsi.h
> index 01861f78d0..3ec34ae867 100644
> --- a/include/hw/virtio/vhost-user-scsi.h
> +++ b/include/hw/virtio/vhost-user-scsi.h
> @@ -21,6 +21,7 @@
>  #include "hw/qdev.h"
>  #include "hw/virtio/virtio-scsi.h"
>  #include "hw/virtio/vhost.h"
> +#include "hw/virtio/vhost-user.h"
>  #include "hw/virtio/vhost-scsi-common.h"
>  
>  #define TYPE_VHOST_USER_SCSI "vhost-user-scsi"
> @@ -30,6 +31,7 @@
>  typedef struct VHostUserSCSI {
>      VHostSCSICommon parent_obj;
>      uint64_t host_features;
> +    VhostUserState *vhost_user;
>  } VHostUserSCSI;
>  
>  #endif /* VHOST_USER_SCSI_H */
> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> new file mode 100644
> index 0000000000..eb8bc0d90d
> --- /dev/null
> +++ b/include/hw/virtio/vhost-user.h
> @@ -0,0 +1,20 @@
> +/*
> + * Copyright (c) 2017-2018 Intel Corporation
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef HW_VIRTIO_VHOST_USER_H
> +#define HW_VIRTIO_VHOST_USER_H
> +
> +#include "chardev/char-fe.h"
> +
> +typedef struct VhostUserState {
> +    CharBackend *chr;
> +} VhostUserState;
> +
> +VhostUserState *vhost_user_init(void);
> +void vhost_user_cleanup(VhostUserState *user);
> +
> +#endif
> diff --git a/net/vhost-user.c b/net/vhost-user.c
> index fa28aad12d..525a061acf 100644
> --- a/net/vhost-user.c
> +++ b/net/vhost-user.c
> @@ -12,6 +12,7 @@
>  #include "clients.h"
>  #include "net/vhost_net.h"
>  #include "net/vhost-user.h"
> +#include "hw/virtio/vhost-user.h"
>  #include "chardev/char-fe.h"
>  #include "qapi/error.h"
>  #include "qapi/qapi-commands-net.h"
> @@ -23,6 +24,7 @@
>  typedef struct NetVhostUserState {
>      NetClientState nc;
>      CharBackend chr; /* only queue index 0 */
> +    VhostUserState *vhost_user;
>      VHostNetState *vhost_net;
>      guint watch;
>      uint64_t acked_features;
> @@ -64,7 +66,8 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
>      }
>  }
>  
> -static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
> +static int vhost_user_start(int queues, NetClientState *ncs[],
> +                            VhostUserState *be)
>  {
>      VhostNetOptions options;
>      struct vhost_net *net = NULL;
> @@ -144,7 +147,7 @@ static ssize_t vhost_user_receive(NetClientState *nc, const uint8_t *buf,
>      return size;
>  }
>  
> -static void vhost_user_cleanup(NetClientState *nc)
> +static void net_vhost_user_cleanup(NetClientState *nc)
>  {
>      NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
>  
> @@ -153,6 +156,11 @@ static void vhost_user_cleanup(NetClientState *nc)
>          g_free(s->vhost_net);
>          s->vhost_net = NULL;
>      }
> +    if (s->vhost_user) {
> +        vhost_user_cleanup(s->vhost_user);
> +        g_free(s->vhost_user);
> +        s->vhost_user = NULL;
> +    }
>      if (nc->queue_index == 0) {
>          if (s->watch) {
>              g_source_remove(s->watch);
> @@ -182,7 +190,7 @@ static NetClientInfo net_vhost_user_info = {
>          .type = NET_CLIENT_DRIVER_VHOST_USER,
>          .size = sizeof(NetVhostUserState),
>          .receive = vhost_user_receive,
> -        .cleanup = vhost_user_cleanup,
> +        .cleanup = net_vhost_user_cleanup,
>          .has_vnet_hdr = vhost_user_has_vnet_hdr,
>          .has_ufo = vhost_user_has_ufo,
>  };
> @@ -244,7 +252,7 @@ static void net_vhost_user_event(void *opaque, int event)
>      trace_vhost_user_event(chr->label, event);
>      switch (event) {
>      case CHR_EVENT_OPENED:
> -        if (vhost_user_start(queues, ncs, &s->chr) < 0) {
> +        if (vhost_user_start(queues, ncs, s->vhost_user) < 0) {
>              qemu_chr_fe_disconnect(&s->chr);
>              return;
>          }
> @@ -283,12 +291,19 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
>  {
>      Error *err = NULL;
>      NetClientState *nc, *nc0 = NULL;
> +    VhostUserState *user = NULL;
>      NetVhostUserState *s;
>      int i;
>  
>      assert(name);
>      assert(queues > 0);
>  
> +    user = vhost_user_init();
> +    if (!user) {
> +        error_report("failed to init vhost_user");
> +        goto err;
> +    }
> +
>      for (i = 0; i < queues; i++) {
>          nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
>          snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to %s",
> @@ -299,17 +314,19 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
>              s = DO_UPCAST(NetVhostUserState, nc, nc);
>              if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
>                  error_report_err(err);
> -                return -1;
> +                goto err;
>              }
> +            user->chr = &s->chr;
>          }
> -
> +        s = DO_UPCAST(NetVhostUserState, nc, nc);
> +        s->vhost_user = user;
>      }
>  
>      s = DO_UPCAST(NetVhostUserState, nc, nc0);
>      do {
>          if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
>              error_report_err(err);
> -            return -1;
> +            goto err;
>          }
>          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
>                                   net_vhost_user_event, NULL, nc0->name, NULL,
> @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
>      assert(s->vhost_net);
>  
>      return 0;
> +
> +err:
> +    if (user) {
> +        vhost_user_cleanup(user);
> +        g_free(user);
> +        s->vhost_user = NULL;
> +    }
> +
> +    return -1;
>  }
>  
>  static Chardev *net_vhost_claim_chardev(
> -- 
> 2.11.0

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
@ 2018-05-23 13:44     ` Michael S. Tsirkin
  0 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-05-23 13:44 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> When multi queue is enabled e.g. for a virtio-net device,
> each queue pair will have a vhost_dev, and the only thing
> shared between vhost devs currently is the chardev. This
> patch introduces a vhost-user state structure which will
> be shared by all vhost devs of the same virtio device.
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>

Unfortunately this patch seems to cause crashes.
To reproduce, simply run
make check-qtest-x86_64

Sorry that it took me a while to find - it triggers 90% of runs but not
100% which complicates bisection somewhat.

> ---
>  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
>  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
>  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
>  hw/virtio/Makefile.objs             |  2 +-
>  hw/virtio/vhost-stub.c              | 10 ++++++++++
>  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
>  include/hw/virtio/vhost-user-blk.h  |  2 ++
>  include/hw/virtio/vhost-user-scsi.h |  2 ++
>  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
>  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
>  10 files changed, 149 insertions(+), 20 deletions(-)
>  create mode 100644 include/hw/virtio/vhost-user.h
> 
> diff --git a/backends/cryptodev-vhost-user.c b/backends/cryptodev-vhost-user.c
> index 862d4f2580..d52daccfcd 100644
> --- a/backends/cryptodev-vhost-user.c
> +++ b/backends/cryptodev-vhost-user.c
> @@ -26,6 +26,7 @@
>  #include "qapi/error.h"
>  #include "qapi/qmp/qerror.h"
>  #include "qemu/error-report.h"
> +#include "hw/virtio/vhost-user.h"
>  #include "standard-headers/linux/virtio_crypto.h"
>  #include "sysemu/cryptodev-vhost.h"
>  #include "chardev/char-fe.h"
> @@ -46,6 +47,7 @@
>  typedef struct CryptoDevBackendVhostUser {
>      CryptoDevBackend parent_obj;
>  
> +    VhostUserState *vhost_user;
>      CharBackend chr;
>      char *chr_name;
>      bool opened;
> @@ -102,7 +104,7 @@ cryptodev_vhost_user_start(int queues,
>              continue;
>          }
>  
> -        options.opaque = &s->chr;
> +        options.opaque = s->vhost_user;
>          options.backend_type = VHOST_BACKEND_TYPE_USER;
>          options.cc = b->conf.peers.ccs[i];
>          s->vhost_crypto[i] = cryptodev_vhost_init(&options);
> @@ -185,6 +187,7 @@ static void cryptodev_vhost_user_init(
>      size_t i;
>      Error *local_err = NULL;
>      Chardev *chr;
> +    VhostUserState *user;
>      CryptoDevBackendClient *cc;
>      CryptoDevBackendVhostUser *s =
>                        CRYPTODEV_BACKEND_VHOST_USER(backend);
> @@ -215,6 +218,15 @@ static void cryptodev_vhost_user_init(
>          }
>      }
>  
> +    user = vhost_user_init();
> +    if (!user) {
> +        error_setg(errp, "Failed to init vhost_user");
> +        return;
> +    }
> +
> +    user->chr = &s->chr;
> +    s->vhost_user = user;
> +
>      qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
>                       cryptodev_vhost_user_event, NULL, s, NULL, true);
>  
> @@ -299,6 +311,12 @@ static void cryptodev_vhost_user_cleanup(
>              backend->conf.peers.ccs[i] = NULL;
>          }
>      }
> +
> +    if (s->vhost_user) {
> +        vhost_user_cleanup(s->vhost_user);
> +        g_free(s->vhost_user);
> +        s->vhost_user = NULL;
> +    }
>  }
>  
>  static void cryptodev_vhost_user_set_chardev(Object *obj,
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 262baca432..4021d71c31 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -229,6 +229,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> +    VhostUserState *user;
>      int i, ret;
>  
>      if (!s->chardev.chr) {
> @@ -246,6 +247,15 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>          return;
>      }
>  
> +    user = vhost_user_init();
> +    if (!user) {
> +        error_setg(errp, "vhost-user-blk: failed to init vhost_user");
> +        return;
> +    }
> +
> +    user->chr = &s->chardev;
> +    s->vhost_user = user;
> +
>      virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
>                  sizeof(struct virtio_blk_config));
>  
> @@ -261,7 +271,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>  
>      vhost_dev_set_config_notifier(&s->dev, &blk_ops);
>  
> -    ret = vhost_dev_init(&s->dev, &s->chardev, VHOST_BACKEND_TYPE_USER, 0);
> +    ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
>      if (ret < 0) {
>          error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
>                     strerror(-ret));
> @@ -286,6 +296,10 @@ vhost_err:
>  virtio_err:
>      g_free(s->dev.vqs);
>      virtio_cleanup(vdev);
> +
> +    vhost_user_cleanup(user);
> +    g_free(user);
> +    s->vhost_user = NULL;
>  }
>  
>  static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
> @@ -297,6 +311,12 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
>      vhost_dev_cleanup(&s->dev);
>      g_free(s->dev.vqs);
>      virtio_cleanup(vdev);
> +
> +    if (s->vhost_user) {
> +        vhost_user_cleanup(s->vhost_user);
> +        g_free(s->vhost_user);
> +        s->vhost_user = NULL;
> +    }
>  }
>  
>  static void vhost_user_blk_instance_init(Object *obj)
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index 9389ed48e0..9355cfdf07 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -69,6 +69,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
>      VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
>      VHostUserSCSI *s = VHOST_USER_SCSI(dev);
>      VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +    VhostUserState *user;
>      Error *err = NULL;
>      int ret;
>  
> @@ -85,19 +86,30 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
>          return;
>      }
>  
> +    user = vhost_user_init();
> +    if (!user) {
> +        error_setg(errp, "vhost-user-scsi: failed to init vhost_user");
> +        return;
> +    }
> +    user->chr = &vs->conf.chardev;
> +
>      vsc->dev.nvqs = 2 + vs->conf.num_queues;
>      vsc->dev.vqs = g_new(struct vhost_virtqueue, vsc->dev.nvqs);
>      vsc->dev.vq_index = 0;
>      vsc->dev.backend_features = 0;
>  
> -    ret = vhost_dev_init(&vsc->dev, (void *)&vs->conf.chardev,
> +    ret = vhost_dev_init(&vsc->dev, user,
>                           VHOST_BACKEND_TYPE_USER, 0);
>      if (ret < 0) {
>          error_setg(errp, "vhost-user-scsi: vhost initialization failed: %s",
>                     strerror(-ret));
> +        vhost_user_cleanup(user);
> +        g_free(user);
>          return;
>      }
>  
> +    s->vhost_user = user;
> +
>      /* Channel and lun both are 0 for bootable vhost-user-scsi disk */
>      vsc->channel = 0;
>      vsc->lun = 0;
> @@ -117,6 +129,12 @@ static void vhost_user_scsi_unrealize(DeviceState *dev, Error **errp)
>      g_free(vsc->dev.vqs);
>  
>      virtio_scsi_common_unrealize(dev, errp);
> +
> +    if (s->vhost_user) {
> +        vhost_user_cleanup(s->vhost_user);
> +        g_free(s->vhost_user);
> +        s->vhost_user = NULL;
> +    }
>  }
>  
>  static uint64_t vhost_user_scsi_get_features(VirtIODevice *vdev,
> diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
> index 765d363c1f..030969e28c 100644
> --- a/hw/virtio/Makefile.objs
> +++ b/hw/virtio/Makefile.objs
> @@ -11,5 +11,5 @@ obj-y += virtio-crypto.o
>  obj-$(CONFIG_VIRTIO_PCI) += virtio-crypto-pci.o
>  endif
>  
> -common-obj-$(call lnot,$(CONFIG_LINUX)) += vhost-stub.o
> +common-obj-$(call lnot,$(call land,$(CONFIG_VIRTIO),$(CONFIG_LINUX))) += vhost-stub.o
>  common-obj-$(CONFIG_ALL) += vhost-stub.o
> diff --git a/hw/virtio/vhost-stub.c b/hw/virtio/vhost-stub.c
> index 2d76cdebdc..049089b5e2 100644
> --- a/hw/virtio/vhost-stub.c
> +++ b/hw/virtio/vhost-stub.c
> @@ -1,7 +1,17 @@
>  #include "qemu/osdep.h"
>  #include "hw/virtio/vhost.h"
> +#include "hw/virtio/vhost-user.h"
>  
>  bool vhost_has_free_slot(void)
>  {
>      return true;
>  }
> +
> +VhostUserState *vhost_user_init(void)
> +{
> +    return NULL;
> +}
> +
> +void vhost_user_cleanup(VhostUserState *user)
> +{
> +}
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 38da8692bb..91edd95453 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -11,6 +11,7 @@
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
>  #include "hw/virtio/vhost.h"
> +#include "hw/virtio/vhost-user.h"
>  #include "hw/virtio/vhost-backend.h"
>  #include "hw/virtio/virtio-net.h"
>  #include "chardev/char-fe.h"
> @@ -173,7 +174,8 @@ static VhostUserMsg m __attribute__ ((unused));
>  
>  struct vhost_user {
>      struct vhost_dev *dev;
> -    CharBackend *chr;
> +    /* Shared between vhost devs of the same virtio device */
> +    VhostUserState *user;
>      int slave_fd;
>      NotifierWithReturn postcopy_notifier;
>      struct PostCopyFD  postcopy_fd;
> @@ -199,7 +201,7 @@ static bool ioeventfd_enabled(void)
>  static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
>  {
>      struct vhost_user *u = dev->opaque;
> -    CharBackend *chr = u->chr;
> +    CharBackend *chr = u->user->chr;
>      uint8_t *p = (uint8_t *) msg;
>      int r, size = VHOST_USER_HDR_SIZE;
>  
> @@ -285,7 +287,7 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
>                              int *fds, int fd_num)
>  {
>      struct vhost_user *u = dev->opaque;
> -    CharBackend *chr = u->chr;
> +    CharBackend *chr = u->user->chr;
>      int ret, size = VHOST_USER_HDR_SIZE + msg->hdr.size;
>  
>      /*
> @@ -1044,7 +1046,7 @@ static int vhost_user_postcopy_waker(struct PostCopyFD *pcfd, RAMBlock *rb,
>  static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
>  {
>      struct vhost_user *u = dev->opaque;
> -    CharBackend *chr = u->chr;
> +    CharBackend *chr = u->user->chr;
>      int ufd;
>      VhostUserMsg msg = {
>          .hdr.request = VHOST_USER_POSTCOPY_ADVISE,
> @@ -1182,7 +1184,7 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
>      return 0;
>  }
>  
> -static int vhost_user_init(struct vhost_dev *dev, void *opaque)
> +static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
>  {
>      uint64_t features, protocol_features;
>      struct vhost_user *u;
> @@ -1191,7 +1193,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
>      assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
>  
>      u = g_new0(struct vhost_user, 1);
> -    u->chr = opaque;
> +    u->user = opaque;
>      u->slave_fd = -1;
>      u->dev = dev;
>      dev->opaque = u;
> @@ -1267,7 +1269,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
>      return 0;
>  }
>  
> -static int vhost_user_cleanup(struct vhost_dev *dev)
> +static int vhost_user_backend_cleanup(struct vhost_dev *dev)
>  {
>      struct vhost_user *u;
>  
> @@ -1581,10 +1583,21 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
>      return 0;
>  }
>  
> +VhostUserState *vhost_user_init(void)
> +{
> +    VhostUserState *user = g_new0(struct VhostUserState, 1);
> +
> +    return user;
> +}
> +
> +void vhost_user_cleanup(VhostUserState *user)
> +{
> +}
> +
>  const VhostOps user_ops = {
>          .backend_type = VHOST_BACKEND_TYPE_USER,
> -        .vhost_backend_init = vhost_user_init,
> -        .vhost_backend_cleanup = vhost_user_cleanup,
> +        .vhost_backend_init = vhost_user_backend_init,
> +        .vhost_backend_cleanup = vhost_user_backend_cleanup,
>          .vhost_backend_memslots_limit = vhost_user_memslots_limit,
>          .vhost_set_log_base = vhost_user_set_log_base,
>          .vhost_set_mem_table = vhost_user_set_mem_table,
> diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
> index 5804cc904a..f1258ae545 100644
> --- a/include/hw/virtio/vhost-user-blk.h
> +++ b/include/hw/virtio/vhost-user-blk.h
> @@ -21,6 +21,7 @@
>  #include "hw/block/block.h"
>  #include "chardev/char-fe.h"
>  #include "hw/virtio/vhost.h"
> +#include "hw/virtio/vhost-user.h"
>  
>  #define TYPE_VHOST_USER_BLK "vhost-user-blk"
>  #define VHOST_USER_BLK(obj) \
> @@ -36,6 +37,7 @@ typedef struct VHostUserBlk {
>      uint32_t config_wce;
>      uint32_t config_ro;
>      struct vhost_dev dev;
> +    VhostUserState *vhost_user;
>  } VHostUserBlk;
>  
>  #endif
> diff --git a/include/hw/virtio/vhost-user-scsi.h b/include/hw/virtio/vhost-user-scsi.h
> index 01861f78d0..3ec34ae867 100644
> --- a/include/hw/virtio/vhost-user-scsi.h
> +++ b/include/hw/virtio/vhost-user-scsi.h
> @@ -21,6 +21,7 @@
>  #include "hw/qdev.h"
>  #include "hw/virtio/virtio-scsi.h"
>  #include "hw/virtio/vhost.h"
> +#include "hw/virtio/vhost-user.h"
>  #include "hw/virtio/vhost-scsi-common.h"
>  
>  #define TYPE_VHOST_USER_SCSI "vhost-user-scsi"
> @@ -30,6 +31,7 @@
>  typedef struct VHostUserSCSI {
>      VHostSCSICommon parent_obj;
>      uint64_t host_features;
> +    VhostUserState *vhost_user;
>  } VHostUserSCSI;
>  
>  #endif /* VHOST_USER_SCSI_H */
> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> new file mode 100644
> index 0000000000..eb8bc0d90d
> --- /dev/null
> +++ b/include/hw/virtio/vhost-user.h
> @@ -0,0 +1,20 @@
> +/*
> + * Copyright (c) 2017-2018 Intel Corporation
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef HW_VIRTIO_VHOST_USER_H
> +#define HW_VIRTIO_VHOST_USER_H
> +
> +#include "chardev/char-fe.h"
> +
> +typedef struct VhostUserState {
> +    CharBackend *chr;
> +} VhostUserState;
> +
> +VhostUserState *vhost_user_init(void);
> +void vhost_user_cleanup(VhostUserState *user);
> +
> +#endif
> diff --git a/net/vhost-user.c b/net/vhost-user.c
> index fa28aad12d..525a061acf 100644
> --- a/net/vhost-user.c
> +++ b/net/vhost-user.c
> @@ -12,6 +12,7 @@
>  #include "clients.h"
>  #include "net/vhost_net.h"
>  #include "net/vhost-user.h"
> +#include "hw/virtio/vhost-user.h"
>  #include "chardev/char-fe.h"
>  #include "qapi/error.h"
>  #include "qapi/qapi-commands-net.h"
> @@ -23,6 +24,7 @@
>  typedef struct NetVhostUserState {
>      NetClientState nc;
>      CharBackend chr; /* only queue index 0 */
> +    VhostUserState *vhost_user;
>      VHostNetState *vhost_net;
>      guint watch;
>      uint64_t acked_features;
> @@ -64,7 +66,8 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
>      }
>  }
>  
> -static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
> +static int vhost_user_start(int queues, NetClientState *ncs[],
> +                            VhostUserState *be)
>  {
>      VhostNetOptions options;
>      struct vhost_net *net = NULL;
> @@ -144,7 +147,7 @@ static ssize_t vhost_user_receive(NetClientState *nc, const uint8_t *buf,
>      return size;
>  }
>  
> -static void vhost_user_cleanup(NetClientState *nc)
> +static void net_vhost_user_cleanup(NetClientState *nc)
>  {
>      NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
>  
> @@ -153,6 +156,11 @@ static void vhost_user_cleanup(NetClientState *nc)
>          g_free(s->vhost_net);
>          s->vhost_net = NULL;
>      }
> +    if (s->vhost_user) {
> +        vhost_user_cleanup(s->vhost_user);
> +        g_free(s->vhost_user);
> +        s->vhost_user = NULL;
> +    }
>      if (nc->queue_index == 0) {
>          if (s->watch) {
>              g_source_remove(s->watch);
> @@ -182,7 +190,7 @@ static NetClientInfo net_vhost_user_info = {
>          .type = NET_CLIENT_DRIVER_VHOST_USER,
>          .size = sizeof(NetVhostUserState),
>          .receive = vhost_user_receive,
> -        .cleanup = vhost_user_cleanup,
> +        .cleanup = net_vhost_user_cleanup,
>          .has_vnet_hdr = vhost_user_has_vnet_hdr,
>          .has_ufo = vhost_user_has_ufo,
>  };
> @@ -244,7 +252,7 @@ static void net_vhost_user_event(void *opaque, int event)
>      trace_vhost_user_event(chr->label, event);
>      switch (event) {
>      case CHR_EVENT_OPENED:
> -        if (vhost_user_start(queues, ncs, &s->chr) < 0) {
> +        if (vhost_user_start(queues, ncs, s->vhost_user) < 0) {
>              qemu_chr_fe_disconnect(&s->chr);
>              return;
>          }
> @@ -283,12 +291,19 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
>  {
>      Error *err = NULL;
>      NetClientState *nc, *nc0 = NULL;
> +    VhostUserState *user = NULL;
>      NetVhostUserState *s;
>      int i;
>  
>      assert(name);
>      assert(queues > 0);
>  
> +    user = vhost_user_init();
> +    if (!user) {
> +        error_report("failed to init vhost_user");
> +        goto err;
> +    }
> +
>      for (i = 0; i < queues; i++) {
>          nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
>          snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to %s",
> @@ -299,17 +314,19 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
>              s = DO_UPCAST(NetVhostUserState, nc, nc);
>              if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
>                  error_report_err(err);
> -                return -1;
> +                goto err;
>              }
> +            user->chr = &s->chr;
>          }
> -
> +        s = DO_UPCAST(NetVhostUserState, nc, nc);
> +        s->vhost_user = user;
>      }
>  
>      s = DO_UPCAST(NetVhostUserState, nc, nc0);
>      do {
>          if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
>              error_report_err(err);
> -            return -1;
> +            goto err;
>          }
>          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
>                                   net_vhost_user_event, NULL, nc0->name, NULL,
> @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
>      assert(s->vhost_net);
>  
>      return 0;
> +
> +err:
> +    if (user) {
> +        vhost_user_cleanup(user);
> +        g_free(user);
> +        s->vhost_user = NULL;
> +    }
> +
> +    return -1;
>  }
>  
>  static Chardev *net_vhost_claim_chardev(
> -- 
> 2.11.0

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
  2018-05-23 13:44     ` [virtio-dev] " Michael S. Tsirkin
@ 2018-05-23 15:36       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-05-23 15:36 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > When multi queue is enabled e.g. for a virtio-net device,
> > each queue pair will have a vhost_dev, and the only thing
> > shared between vhost devs currently is the chardev. This
> > patch introduces a vhost-user state structure which will
> > be shared by all vhost devs of the same virtio device.
> > 
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> 
> Unfortunately this patch seems to cause crashes.
> To reproduce, simply run
> make check-qtest-x86_64
> 
> Sorry that it took me a while to find - it triggers 90% of runs but not
> 100% which complicates bisection somewhat.
> 
> > ---
> >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> >  hw/virtio/Makefile.objs             |  2 +-
> >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> >  10 files changed, 149 insertions(+), 20 deletions(-)
> >  create mode 100644 include/hw/virtio/vhost-user.h
> > 
> > diff --git a/backends/cryptodev-vhost-user.c b/backends/cryptodev-vhost-user.c
> > index 862d4f2580..d52daccfcd 100644
> > --- a/backends/cryptodev-vhost-user.c
> > +++ b/backends/cryptodev-vhost-user.c
> > @@ -26,6 +26,7 @@
> >  #include "qapi/error.h"
> >  #include "qapi/qmp/qerror.h"
> >  #include "qemu/error-report.h"
> > +#include "hw/virtio/vhost-user.h"
> >  #include "standard-headers/linux/virtio_crypto.h"
> >  #include "sysemu/cryptodev-vhost.h"
> >  #include "chardev/char-fe.h"
> > @@ -46,6 +47,7 @@
> >  typedef struct CryptoDevBackendVhostUser {
> >      CryptoDevBackend parent_obj;
> >  
> > +    VhostUserState *vhost_user;
> >      CharBackend chr;
> >      char *chr_name;
> >      bool opened;
> > @@ -102,7 +104,7 @@ cryptodev_vhost_user_start(int queues,
> >              continue;
> >          }
> >  
> > -        options.opaque = &s->chr;
> > +        options.opaque = s->vhost_user;
> >          options.backend_type = VHOST_BACKEND_TYPE_USER;
> >          options.cc = b->conf.peers.ccs[i];
> >          s->vhost_crypto[i] = cryptodev_vhost_init(&options);
> > @@ -185,6 +187,7 @@ static void cryptodev_vhost_user_init(
> >      size_t i;
> >      Error *local_err = NULL;
> >      Chardev *chr;
> > +    VhostUserState *user;
> >      CryptoDevBackendClient *cc;
> >      CryptoDevBackendVhostUser *s =
> >                        CRYPTODEV_BACKEND_VHOST_USER(backend);
> > @@ -215,6 +218,15 @@ static void cryptodev_vhost_user_init(
> >          }
> >      }
> >  
> > +    user = vhost_user_init();
> > +    if (!user) {
> > +        error_setg(errp, "Failed to init vhost_user");
> > +        return;
> > +    }
> > +
> > +    user->chr = &s->chr;
> > +    s->vhost_user = user;
> > +
> >      qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> >                       cryptodev_vhost_user_event, NULL, s, NULL, true);
> >  
> > @@ -299,6 +311,12 @@ static void cryptodev_vhost_user_cleanup(
> >              backend->conf.peers.ccs[i] = NULL;
> >          }
> >      }
> > +
> > +    if (s->vhost_user) {
> > +        vhost_user_cleanup(s->vhost_user);
> > +        g_free(s->vhost_user);
> > +        s->vhost_user = NULL;
> > +    }
> >  }
> >  
> >  static void cryptodev_vhost_user_set_chardev(Object *obj,
> > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > index 262baca432..4021d71c31 100644
> > --- a/hw/block/vhost-user-blk.c
> > +++ b/hw/block/vhost-user-blk.c
> > @@ -229,6 +229,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> >  {
> >      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > +    VhostUserState *user;
> >      int i, ret;
> >  
> >      if (!s->chardev.chr) {
> > @@ -246,6 +247,15 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> >          return;
> >      }
> >  
> > +    user = vhost_user_init();
> > +    if (!user) {
> > +        error_setg(errp, "vhost-user-blk: failed to init vhost_user");
> > +        return;
> > +    }
> > +
> > +    user->chr = &s->chardev;
> > +    s->vhost_user = user;
> > +
> >      virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
> >                  sizeof(struct virtio_blk_config));
> >  
> > @@ -261,7 +271,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> >  
> >      vhost_dev_set_config_notifier(&s->dev, &blk_ops);
> >  
> > -    ret = vhost_dev_init(&s->dev, &s->chardev, VHOST_BACKEND_TYPE_USER, 0);
> > +    ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
> >      if (ret < 0) {
> >          error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
> >                     strerror(-ret));
> > @@ -286,6 +296,10 @@ vhost_err:
> >  virtio_err:
> >      g_free(s->dev.vqs);
> >      virtio_cleanup(vdev);
> > +
> > +    vhost_user_cleanup(user);
> > +    g_free(user);
> > +    s->vhost_user = NULL;
> >  }
> >  
> >  static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
> > @@ -297,6 +311,12 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
> >      vhost_dev_cleanup(&s->dev);
> >      g_free(s->dev.vqs);
> >      virtio_cleanup(vdev);
> > +
> > +    if (s->vhost_user) {
> > +        vhost_user_cleanup(s->vhost_user);
> > +        g_free(s->vhost_user);
> > +        s->vhost_user = NULL;
> > +    }
> >  }
> >  
> >  static void vhost_user_blk_instance_init(Object *obj)
> > diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> > index 9389ed48e0..9355cfdf07 100644
> > --- a/hw/scsi/vhost-user-scsi.c
> > +++ b/hw/scsi/vhost-user-scsi.c
> > @@ -69,6 +69,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
> >      VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> >      VHostUserSCSI *s = VHOST_USER_SCSI(dev);
> >      VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> > +    VhostUserState *user;
> >      Error *err = NULL;
> >      int ret;
> >  
> > @@ -85,19 +86,30 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
> >          return;
> >      }
> >  
> > +    user = vhost_user_init();
> > +    if (!user) {
> > +        error_setg(errp, "vhost-user-scsi: failed to init vhost_user");
> > +        return;
> > +    }
> > +    user->chr = &vs->conf.chardev;
> > +
> >      vsc->dev.nvqs = 2 + vs->conf.num_queues;
> >      vsc->dev.vqs = g_new(struct vhost_virtqueue, vsc->dev.nvqs);
> >      vsc->dev.vq_index = 0;
> >      vsc->dev.backend_features = 0;
> >  
> > -    ret = vhost_dev_init(&vsc->dev, (void *)&vs->conf.chardev,
> > +    ret = vhost_dev_init(&vsc->dev, user,
> >                           VHOST_BACKEND_TYPE_USER, 0);
> >      if (ret < 0) {
> >          error_setg(errp, "vhost-user-scsi: vhost initialization failed: %s",
> >                     strerror(-ret));
> > +        vhost_user_cleanup(user);
> > +        g_free(user);
> >          return;
> >      }
> >  
> > +    s->vhost_user = user;
> > +
> >      /* Channel and lun both are 0 for bootable vhost-user-scsi disk */
> >      vsc->channel = 0;
> >      vsc->lun = 0;
> > @@ -117,6 +129,12 @@ static void vhost_user_scsi_unrealize(DeviceState *dev, Error **errp)
> >      g_free(vsc->dev.vqs);
> >  
> >      virtio_scsi_common_unrealize(dev, errp);
> > +
> > +    if (s->vhost_user) {
> > +        vhost_user_cleanup(s->vhost_user);
> > +        g_free(s->vhost_user);
> > +        s->vhost_user = NULL;
> > +    }
> >  }
> >  
> >  static uint64_t vhost_user_scsi_get_features(VirtIODevice *vdev,
> > diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
> > index 765d363c1f..030969e28c 100644
> > --- a/hw/virtio/Makefile.objs
> > +++ b/hw/virtio/Makefile.objs
> > @@ -11,5 +11,5 @@ obj-y += virtio-crypto.o
> >  obj-$(CONFIG_VIRTIO_PCI) += virtio-crypto-pci.o
> >  endif
> >  
> > -common-obj-$(call lnot,$(CONFIG_LINUX)) += vhost-stub.o
> > +common-obj-$(call lnot,$(call land,$(CONFIG_VIRTIO),$(CONFIG_LINUX))) += vhost-stub.o
> >  common-obj-$(CONFIG_ALL) += vhost-stub.o
> > diff --git a/hw/virtio/vhost-stub.c b/hw/virtio/vhost-stub.c
> > index 2d76cdebdc..049089b5e2 100644
> > --- a/hw/virtio/vhost-stub.c
> > +++ b/hw/virtio/vhost-stub.c
> > @@ -1,7 +1,17 @@
> >  #include "qemu/osdep.h"
> >  #include "hw/virtio/vhost.h"
> > +#include "hw/virtio/vhost-user.h"
> >  
> >  bool vhost_has_free_slot(void)
> >  {
> >      return true;
> >  }
> > +
> > +VhostUserState *vhost_user_init(void)
> > +{
> > +    return NULL;
> > +}
> > +
> > +void vhost_user_cleanup(VhostUserState *user)
> > +{
> > +}
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 38da8692bb..91edd95453 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -11,6 +11,7 @@
> >  #include "qemu/osdep.h"
> >  #include "qapi/error.h"
> >  #include "hw/virtio/vhost.h"
> > +#include "hw/virtio/vhost-user.h"
> >  #include "hw/virtio/vhost-backend.h"
> >  #include "hw/virtio/virtio-net.h"
> >  #include "chardev/char-fe.h"
> > @@ -173,7 +174,8 @@ static VhostUserMsg m __attribute__ ((unused));
> >  
> >  struct vhost_user {
> >      struct vhost_dev *dev;
> > -    CharBackend *chr;
> > +    /* Shared between vhost devs of the same virtio device */
> > +    VhostUserState *user;
> >      int slave_fd;
> >      NotifierWithReturn postcopy_notifier;
> >      struct PostCopyFD  postcopy_fd;
> > @@ -199,7 +201,7 @@ static bool ioeventfd_enabled(void)
> >  static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
> >  {
> >      struct vhost_user *u = dev->opaque;
> > -    CharBackend *chr = u->chr;
> > +    CharBackend *chr = u->user->chr;
> >      uint8_t *p = (uint8_t *) msg;
> >      int r, size = VHOST_USER_HDR_SIZE;
> >  
> > @@ -285,7 +287,7 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
> >                              int *fds, int fd_num)
> >  {
> >      struct vhost_user *u = dev->opaque;
> > -    CharBackend *chr = u->chr;
> > +    CharBackend *chr = u->user->chr;
> >      int ret, size = VHOST_USER_HDR_SIZE + msg->hdr.size;
> >  
> >      /*
> > @@ -1044,7 +1046,7 @@ static int vhost_user_postcopy_waker(struct PostCopyFD *pcfd, RAMBlock *rb,
> >  static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
> >  {
> >      struct vhost_user *u = dev->opaque;
> > -    CharBackend *chr = u->chr;
> > +    CharBackend *chr = u->user->chr;
> >      int ufd;
> >      VhostUserMsg msg = {
> >          .hdr.request = VHOST_USER_POSTCOPY_ADVISE,
> > @@ -1182,7 +1184,7 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
> >      return 0;
> >  }
> >  
> > -static int vhost_user_init(struct vhost_dev *dev, void *opaque)
> > +static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
> >  {
> >      uint64_t features, protocol_features;
> >      struct vhost_user *u;
> > @@ -1191,7 +1193,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
> >      assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
> >  
> >      u = g_new0(struct vhost_user, 1);
> > -    u->chr = opaque;
> > +    u->user = opaque;
> >      u->slave_fd = -1;
> >      u->dev = dev;
> >      dev->opaque = u;
> > @@ -1267,7 +1269,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
> >      return 0;
> >  }
> >  
> > -static int vhost_user_cleanup(struct vhost_dev *dev)
> > +static int vhost_user_backend_cleanup(struct vhost_dev *dev)
> >  {
> >      struct vhost_user *u;
> >  
> > @@ -1581,10 +1583,21 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
> >      return 0;
> >  }
> >  
> > +VhostUserState *vhost_user_init(void)
> > +{
> > +    VhostUserState *user = g_new0(struct VhostUserState, 1);
> > +
> > +    return user;
> > +}
> > +
> > +void vhost_user_cleanup(VhostUserState *user)
> > +{
> > +}
> > +
> >  const VhostOps user_ops = {
> >          .backend_type = VHOST_BACKEND_TYPE_USER,
> > -        .vhost_backend_init = vhost_user_init,
> > -        .vhost_backend_cleanup = vhost_user_cleanup,
> > +        .vhost_backend_init = vhost_user_backend_init,
> > +        .vhost_backend_cleanup = vhost_user_backend_cleanup,
> >          .vhost_backend_memslots_limit = vhost_user_memslots_limit,
> >          .vhost_set_log_base = vhost_user_set_log_base,
> >          .vhost_set_mem_table = vhost_user_set_mem_table,
> > diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
> > index 5804cc904a..f1258ae545 100644
> > --- a/include/hw/virtio/vhost-user-blk.h
> > +++ b/include/hw/virtio/vhost-user-blk.h
> > @@ -21,6 +21,7 @@
> >  #include "hw/block/block.h"
> >  #include "chardev/char-fe.h"
> >  #include "hw/virtio/vhost.h"
> > +#include "hw/virtio/vhost-user.h"
> >  
> >  #define TYPE_VHOST_USER_BLK "vhost-user-blk"
> >  #define VHOST_USER_BLK(obj) \
> > @@ -36,6 +37,7 @@ typedef struct VHostUserBlk {
> >      uint32_t config_wce;
> >      uint32_t config_ro;
> >      struct vhost_dev dev;
> > +    VhostUserState *vhost_user;
> >  } VHostUserBlk;
> >  
> >  #endif
> > diff --git a/include/hw/virtio/vhost-user-scsi.h b/include/hw/virtio/vhost-user-scsi.h
> > index 01861f78d0..3ec34ae867 100644
> > --- a/include/hw/virtio/vhost-user-scsi.h
> > +++ b/include/hw/virtio/vhost-user-scsi.h
> > @@ -21,6 +21,7 @@
> >  #include "hw/qdev.h"
> >  #include "hw/virtio/virtio-scsi.h"
> >  #include "hw/virtio/vhost.h"
> > +#include "hw/virtio/vhost-user.h"
> >  #include "hw/virtio/vhost-scsi-common.h"
> >  
> >  #define TYPE_VHOST_USER_SCSI "vhost-user-scsi"
> > @@ -30,6 +31,7 @@
> >  typedef struct VHostUserSCSI {
> >      VHostSCSICommon parent_obj;
> >      uint64_t host_features;
> > +    VhostUserState *vhost_user;
> >  } VHostUserSCSI;
> >  
> >  #endif /* VHOST_USER_SCSI_H */
> > diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> > new file mode 100644
> > index 0000000000..eb8bc0d90d
> > --- /dev/null
> > +++ b/include/hw/virtio/vhost-user.h
> > @@ -0,0 +1,20 @@
> > +/*
> > + * Copyright (c) 2017-2018 Intel Corporation
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#ifndef HW_VIRTIO_VHOST_USER_H
> > +#define HW_VIRTIO_VHOST_USER_H
> > +
> > +#include "chardev/char-fe.h"
> > +
> > +typedef struct VhostUserState {
> > +    CharBackend *chr;
> > +} VhostUserState;
> > +
> > +VhostUserState *vhost_user_init(void);
> > +void vhost_user_cleanup(VhostUserState *user);
> > +
> > +#endif
> > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > index fa28aad12d..525a061acf 100644
> > --- a/net/vhost-user.c
> > +++ b/net/vhost-user.c
> > @@ -12,6 +12,7 @@
> >  #include "clients.h"
> >  #include "net/vhost_net.h"
> >  #include "net/vhost-user.h"
> > +#include "hw/virtio/vhost-user.h"
> >  #include "chardev/char-fe.h"
> >  #include "qapi/error.h"
> >  #include "qapi/qapi-commands-net.h"
> > @@ -23,6 +24,7 @@
> >  typedef struct NetVhostUserState {
> >      NetClientState nc;
> >      CharBackend chr; /* only queue index 0 */
> > +    VhostUserState *vhost_user;
> >      VHostNetState *vhost_net;
> >      guint watch;
> >      uint64_t acked_features;
> > @@ -64,7 +66,8 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
> >      }
> >  }
> >  
> > -static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
> > +static int vhost_user_start(int queues, NetClientState *ncs[],
> > +                            VhostUserState *be)
> >  {
> >      VhostNetOptions options;
> >      struct vhost_net *net = NULL;
> > @@ -144,7 +147,7 @@ static ssize_t vhost_user_receive(NetClientState *nc, const uint8_t *buf,
> >      return size;
> >  }
> >  
> > -static void vhost_user_cleanup(NetClientState *nc)
> > +static void net_vhost_user_cleanup(NetClientState *nc)
> >  {
> >      NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
> >  
> > @@ -153,6 +156,11 @@ static void vhost_user_cleanup(NetClientState *nc)
> >          g_free(s->vhost_net);
> >          s->vhost_net = NULL;
> >      }
> > +    if (s->vhost_user) {
> > +        vhost_user_cleanup(s->vhost_user);
> > +        g_free(s->vhost_user);
> > +        s->vhost_user = NULL;
> > +    }
> >      if (nc->queue_index == 0) {
> >          if (s->watch) {
> >              g_source_remove(s->watch);
> > @@ -182,7 +190,7 @@ static NetClientInfo net_vhost_user_info = {
> >          .type = NET_CLIENT_DRIVER_VHOST_USER,
> >          .size = sizeof(NetVhostUserState),
> >          .receive = vhost_user_receive,
> > -        .cleanup = vhost_user_cleanup,
> > +        .cleanup = net_vhost_user_cleanup,
> >          .has_vnet_hdr = vhost_user_has_vnet_hdr,
> >          .has_ufo = vhost_user_has_ufo,
> >  };
> > @@ -244,7 +252,7 @@ static void net_vhost_user_event(void *opaque, int event)
> >      trace_vhost_user_event(chr->label, event);
> >      switch (event) {
> >      case CHR_EVENT_OPENED:
> > -        if (vhost_user_start(queues, ncs, &s->chr) < 0) {
> > +        if (vhost_user_start(queues, ncs, s->vhost_user) < 0) {
> >              qemu_chr_fe_disconnect(&s->chr);
> >              return;
> >          }
> > @@ -283,12 +291,19 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> >  {
> >      Error *err = NULL;
> >      NetClientState *nc, *nc0 = NULL;
> > +    VhostUserState *user = NULL;
> >      NetVhostUserState *s;
> >      int i;
> >  
> >      assert(name);
> >      assert(queues > 0);
> >  
> > +    user = vhost_user_init();
> > +    if (!user) {
> > +        error_report("failed to init vhost_user");
> > +        goto err;
> > +    }
> > +
> >      for (i = 0; i < queues; i++) {
> >          nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
> >          snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to %s",
> > @@ -299,17 +314,19 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> >              s = DO_UPCAST(NetVhostUserState, nc, nc);
> >              if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
> >                  error_report_err(err);
> > -                return -1;
> > +                goto err;
> >              }
> > +            user->chr = &s->chr;
> >          }
> > -
> > +        s = DO_UPCAST(NetVhostUserState, nc, nc);
> > +        s->vhost_user = user;
> >      }
> >  
> >      s = DO_UPCAST(NetVhostUserState, nc, nc0);
> >      do {
> >          if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
> >              error_report_err(err);
> > -            return -1;
> > +            goto err;
> >          }
> >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> >      assert(s->vhost_net);
> >  
> >      return 0;
> > +
> > +err:
> > +    if (user) {
> > +        vhost_user_cleanup(user);
> > +        g_free(user);
> > +        s->vhost_user = NULL;
> > +    }
> > +
> > +    return -1;
> >  }
> >  
> >  static Chardev *net_vhost_claim_chardev(
> > -- 
> > 2.11.0

So far I figured out that commenting the free of
the structure removes the crash, so we seem to
be dealing with a use-after free here.
I suspect that in a MQ situation, one queue gets
closed and attempts to free the structure
while others still use it.

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 525a061..6a1573b 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
         s->vhost_net = NULL;
     }
     if (s->vhost_user) {
-        vhost_user_cleanup(s->vhost_user);
-        g_free(s->vhost_user);
+        //vhost_user_cleanup(s->vhost_user);
+        //g_free(s->vhost_user);
         s->vhost_user = NULL;
     }
     if (nc->queue_index == 0) {
@@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
 
 err:
     if (user) {
-        vhost_user_cleanup(user);
-        g_free(user);
+        //vhost_user_cleanup(user);
+        //g_free(user);
         s->vhost_user = NULL;
     }
 

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
@ 2018-05-23 15:36       ` Michael S. Tsirkin
  0 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-05-23 15:36 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > When multi queue is enabled e.g. for a virtio-net device,
> > each queue pair will have a vhost_dev, and the only thing
> > shared between vhost devs currently is the chardev. This
> > patch introduces a vhost-user state structure which will
> > be shared by all vhost devs of the same virtio device.
> > 
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> 
> Unfortunately this patch seems to cause crashes.
> To reproduce, simply run
> make check-qtest-x86_64
> 
> Sorry that it took me a while to find - it triggers 90% of runs but not
> 100% which complicates bisection somewhat.
> 
> > ---
> >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> >  hw/virtio/Makefile.objs             |  2 +-
> >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> >  10 files changed, 149 insertions(+), 20 deletions(-)
> >  create mode 100644 include/hw/virtio/vhost-user.h
> > 
> > diff --git a/backends/cryptodev-vhost-user.c b/backends/cryptodev-vhost-user.c
> > index 862d4f2580..d52daccfcd 100644
> > --- a/backends/cryptodev-vhost-user.c
> > +++ b/backends/cryptodev-vhost-user.c
> > @@ -26,6 +26,7 @@
> >  #include "qapi/error.h"
> >  #include "qapi/qmp/qerror.h"
> >  #include "qemu/error-report.h"
> > +#include "hw/virtio/vhost-user.h"
> >  #include "standard-headers/linux/virtio_crypto.h"
> >  #include "sysemu/cryptodev-vhost.h"
> >  #include "chardev/char-fe.h"
> > @@ -46,6 +47,7 @@
> >  typedef struct CryptoDevBackendVhostUser {
> >      CryptoDevBackend parent_obj;
> >  
> > +    VhostUserState *vhost_user;
> >      CharBackend chr;
> >      char *chr_name;
> >      bool opened;
> > @@ -102,7 +104,7 @@ cryptodev_vhost_user_start(int queues,
> >              continue;
> >          }
> >  
> > -        options.opaque = &s->chr;
> > +        options.opaque = s->vhost_user;
> >          options.backend_type = VHOST_BACKEND_TYPE_USER;
> >          options.cc = b->conf.peers.ccs[i];
> >          s->vhost_crypto[i] = cryptodev_vhost_init(&options);
> > @@ -185,6 +187,7 @@ static void cryptodev_vhost_user_init(
> >      size_t i;
> >      Error *local_err = NULL;
> >      Chardev *chr;
> > +    VhostUserState *user;
> >      CryptoDevBackendClient *cc;
> >      CryptoDevBackendVhostUser *s =
> >                        CRYPTODEV_BACKEND_VHOST_USER(backend);
> > @@ -215,6 +218,15 @@ static void cryptodev_vhost_user_init(
> >          }
> >      }
> >  
> > +    user = vhost_user_init();
> > +    if (!user) {
> > +        error_setg(errp, "Failed to init vhost_user");
> > +        return;
> > +    }
> > +
> > +    user->chr = &s->chr;
> > +    s->vhost_user = user;
> > +
> >      qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> >                       cryptodev_vhost_user_event, NULL, s, NULL, true);
> >  
> > @@ -299,6 +311,12 @@ static void cryptodev_vhost_user_cleanup(
> >              backend->conf.peers.ccs[i] = NULL;
> >          }
> >      }
> > +
> > +    if (s->vhost_user) {
> > +        vhost_user_cleanup(s->vhost_user);
> > +        g_free(s->vhost_user);
> > +        s->vhost_user = NULL;
> > +    }
> >  }
> >  
> >  static void cryptodev_vhost_user_set_chardev(Object *obj,
> > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > index 262baca432..4021d71c31 100644
> > --- a/hw/block/vhost-user-blk.c
> > +++ b/hw/block/vhost-user-blk.c
> > @@ -229,6 +229,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> >  {
> >      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > +    VhostUserState *user;
> >      int i, ret;
> >  
> >      if (!s->chardev.chr) {
> > @@ -246,6 +247,15 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> >          return;
> >      }
> >  
> > +    user = vhost_user_init();
> > +    if (!user) {
> > +        error_setg(errp, "vhost-user-blk: failed to init vhost_user");
> > +        return;
> > +    }
> > +
> > +    user->chr = &s->chardev;
> > +    s->vhost_user = user;
> > +
> >      virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
> >                  sizeof(struct virtio_blk_config));
> >  
> > @@ -261,7 +271,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> >  
> >      vhost_dev_set_config_notifier(&s->dev, &blk_ops);
> >  
> > -    ret = vhost_dev_init(&s->dev, &s->chardev, VHOST_BACKEND_TYPE_USER, 0);
> > +    ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
> >      if (ret < 0) {
> >          error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
> >                     strerror(-ret));
> > @@ -286,6 +296,10 @@ vhost_err:
> >  virtio_err:
> >      g_free(s->dev.vqs);
> >      virtio_cleanup(vdev);
> > +
> > +    vhost_user_cleanup(user);
> > +    g_free(user);
> > +    s->vhost_user = NULL;
> >  }
> >  
> >  static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
> > @@ -297,6 +311,12 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
> >      vhost_dev_cleanup(&s->dev);
> >      g_free(s->dev.vqs);
> >      virtio_cleanup(vdev);
> > +
> > +    if (s->vhost_user) {
> > +        vhost_user_cleanup(s->vhost_user);
> > +        g_free(s->vhost_user);
> > +        s->vhost_user = NULL;
> > +    }
> >  }
> >  
> >  static void vhost_user_blk_instance_init(Object *obj)
> > diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> > index 9389ed48e0..9355cfdf07 100644
> > --- a/hw/scsi/vhost-user-scsi.c
> > +++ b/hw/scsi/vhost-user-scsi.c
> > @@ -69,6 +69,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
> >      VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> >      VHostUserSCSI *s = VHOST_USER_SCSI(dev);
> >      VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> > +    VhostUserState *user;
> >      Error *err = NULL;
> >      int ret;
> >  
> > @@ -85,19 +86,30 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
> >          return;
> >      }
> >  
> > +    user = vhost_user_init();
> > +    if (!user) {
> > +        error_setg(errp, "vhost-user-scsi: failed to init vhost_user");
> > +        return;
> > +    }
> > +    user->chr = &vs->conf.chardev;
> > +
> >      vsc->dev.nvqs = 2 + vs->conf.num_queues;
> >      vsc->dev.vqs = g_new(struct vhost_virtqueue, vsc->dev.nvqs);
> >      vsc->dev.vq_index = 0;
> >      vsc->dev.backend_features = 0;
> >  
> > -    ret = vhost_dev_init(&vsc->dev, (void *)&vs->conf.chardev,
> > +    ret = vhost_dev_init(&vsc->dev, user,
> >                           VHOST_BACKEND_TYPE_USER, 0);
> >      if (ret < 0) {
> >          error_setg(errp, "vhost-user-scsi: vhost initialization failed: %s",
> >                     strerror(-ret));
> > +        vhost_user_cleanup(user);
> > +        g_free(user);
> >          return;
> >      }
> >  
> > +    s->vhost_user = user;
> > +
> >      /* Channel and lun both are 0 for bootable vhost-user-scsi disk */
> >      vsc->channel = 0;
> >      vsc->lun = 0;
> > @@ -117,6 +129,12 @@ static void vhost_user_scsi_unrealize(DeviceState *dev, Error **errp)
> >      g_free(vsc->dev.vqs);
> >  
> >      virtio_scsi_common_unrealize(dev, errp);
> > +
> > +    if (s->vhost_user) {
> > +        vhost_user_cleanup(s->vhost_user);
> > +        g_free(s->vhost_user);
> > +        s->vhost_user = NULL;
> > +    }
> >  }
> >  
> >  static uint64_t vhost_user_scsi_get_features(VirtIODevice *vdev,
> > diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
> > index 765d363c1f..030969e28c 100644
> > --- a/hw/virtio/Makefile.objs
> > +++ b/hw/virtio/Makefile.objs
> > @@ -11,5 +11,5 @@ obj-y += virtio-crypto.o
> >  obj-$(CONFIG_VIRTIO_PCI) += virtio-crypto-pci.o
> >  endif
> >  
> > -common-obj-$(call lnot,$(CONFIG_LINUX)) += vhost-stub.o
> > +common-obj-$(call lnot,$(call land,$(CONFIG_VIRTIO),$(CONFIG_LINUX))) += vhost-stub.o
> >  common-obj-$(CONFIG_ALL) += vhost-stub.o
> > diff --git a/hw/virtio/vhost-stub.c b/hw/virtio/vhost-stub.c
> > index 2d76cdebdc..049089b5e2 100644
> > --- a/hw/virtio/vhost-stub.c
> > +++ b/hw/virtio/vhost-stub.c
> > @@ -1,7 +1,17 @@
> >  #include "qemu/osdep.h"
> >  #include "hw/virtio/vhost.h"
> > +#include "hw/virtio/vhost-user.h"
> >  
> >  bool vhost_has_free_slot(void)
> >  {
> >      return true;
> >  }
> > +
> > +VhostUserState *vhost_user_init(void)
> > +{
> > +    return NULL;
> > +}
> > +
> > +void vhost_user_cleanup(VhostUserState *user)
> > +{
> > +}
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 38da8692bb..91edd95453 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -11,6 +11,7 @@
> >  #include "qemu/osdep.h"
> >  #include "qapi/error.h"
> >  #include "hw/virtio/vhost.h"
> > +#include "hw/virtio/vhost-user.h"
> >  #include "hw/virtio/vhost-backend.h"
> >  #include "hw/virtio/virtio-net.h"
> >  #include "chardev/char-fe.h"
> > @@ -173,7 +174,8 @@ static VhostUserMsg m __attribute__ ((unused));
> >  
> >  struct vhost_user {
> >      struct vhost_dev *dev;
> > -    CharBackend *chr;
> > +    /* Shared between vhost devs of the same virtio device */
> > +    VhostUserState *user;
> >      int slave_fd;
> >      NotifierWithReturn postcopy_notifier;
> >      struct PostCopyFD  postcopy_fd;
> > @@ -199,7 +201,7 @@ static bool ioeventfd_enabled(void)
> >  static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
> >  {
> >      struct vhost_user *u = dev->opaque;
> > -    CharBackend *chr = u->chr;
> > +    CharBackend *chr = u->user->chr;
> >      uint8_t *p = (uint8_t *) msg;
> >      int r, size = VHOST_USER_HDR_SIZE;
> >  
> > @@ -285,7 +287,7 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
> >                              int *fds, int fd_num)
> >  {
> >      struct vhost_user *u = dev->opaque;
> > -    CharBackend *chr = u->chr;
> > +    CharBackend *chr = u->user->chr;
> >      int ret, size = VHOST_USER_HDR_SIZE + msg->hdr.size;
> >  
> >      /*
> > @@ -1044,7 +1046,7 @@ static int vhost_user_postcopy_waker(struct PostCopyFD *pcfd, RAMBlock *rb,
> >  static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
> >  {
> >      struct vhost_user *u = dev->opaque;
> > -    CharBackend *chr = u->chr;
> > +    CharBackend *chr = u->user->chr;
> >      int ufd;
> >      VhostUserMsg msg = {
> >          .hdr.request = VHOST_USER_POSTCOPY_ADVISE,
> > @@ -1182,7 +1184,7 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
> >      return 0;
> >  }
> >  
> > -static int vhost_user_init(struct vhost_dev *dev, void *opaque)
> > +static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
> >  {
> >      uint64_t features, protocol_features;
> >      struct vhost_user *u;
> > @@ -1191,7 +1193,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
> >      assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
> >  
> >      u = g_new0(struct vhost_user, 1);
> > -    u->chr = opaque;
> > +    u->user = opaque;
> >      u->slave_fd = -1;
> >      u->dev = dev;
> >      dev->opaque = u;
> > @@ -1267,7 +1269,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
> >      return 0;
> >  }
> >  
> > -static int vhost_user_cleanup(struct vhost_dev *dev)
> > +static int vhost_user_backend_cleanup(struct vhost_dev *dev)
> >  {
> >      struct vhost_user *u;
> >  
> > @@ -1581,10 +1583,21 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
> >      return 0;
> >  }
> >  
> > +VhostUserState *vhost_user_init(void)
> > +{
> > +    VhostUserState *user = g_new0(struct VhostUserState, 1);
> > +
> > +    return user;
> > +}
> > +
> > +void vhost_user_cleanup(VhostUserState *user)
> > +{
> > +}
> > +
> >  const VhostOps user_ops = {
> >          .backend_type = VHOST_BACKEND_TYPE_USER,
> > -        .vhost_backend_init = vhost_user_init,
> > -        .vhost_backend_cleanup = vhost_user_cleanup,
> > +        .vhost_backend_init = vhost_user_backend_init,
> > +        .vhost_backend_cleanup = vhost_user_backend_cleanup,
> >          .vhost_backend_memslots_limit = vhost_user_memslots_limit,
> >          .vhost_set_log_base = vhost_user_set_log_base,
> >          .vhost_set_mem_table = vhost_user_set_mem_table,
> > diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
> > index 5804cc904a..f1258ae545 100644
> > --- a/include/hw/virtio/vhost-user-blk.h
> > +++ b/include/hw/virtio/vhost-user-blk.h
> > @@ -21,6 +21,7 @@
> >  #include "hw/block/block.h"
> >  #include "chardev/char-fe.h"
> >  #include "hw/virtio/vhost.h"
> > +#include "hw/virtio/vhost-user.h"
> >  
> >  #define TYPE_VHOST_USER_BLK "vhost-user-blk"
> >  #define VHOST_USER_BLK(obj) \
> > @@ -36,6 +37,7 @@ typedef struct VHostUserBlk {
> >      uint32_t config_wce;
> >      uint32_t config_ro;
> >      struct vhost_dev dev;
> > +    VhostUserState *vhost_user;
> >  } VHostUserBlk;
> >  
> >  #endif
> > diff --git a/include/hw/virtio/vhost-user-scsi.h b/include/hw/virtio/vhost-user-scsi.h
> > index 01861f78d0..3ec34ae867 100644
> > --- a/include/hw/virtio/vhost-user-scsi.h
> > +++ b/include/hw/virtio/vhost-user-scsi.h
> > @@ -21,6 +21,7 @@
> >  #include "hw/qdev.h"
> >  #include "hw/virtio/virtio-scsi.h"
> >  #include "hw/virtio/vhost.h"
> > +#include "hw/virtio/vhost-user.h"
> >  #include "hw/virtio/vhost-scsi-common.h"
> >  
> >  #define TYPE_VHOST_USER_SCSI "vhost-user-scsi"
> > @@ -30,6 +31,7 @@
> >  typedef struct VHostUserSCSI {
> >      VHostSCSICommon parent_obj;
> >      uint64_t host_features;
> > +    VhostUserState *vhost_user;
> >  } VHostUserSCSI;
> >  
> >  #endif /* VHOST_USER_SCSI_H */
> > diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> > new file mode 100644
> > index 0000000000..eb8bc0d90d
> > --- /dev/null
> > +++ b/include/hw/virtio/vhost-user.h
> > @@ -0,0 +1,20 @@
> > +/*
> > + * Copyright (c) 2017-2018 Intel Corporation
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#ifndef HW_VIRTIO_VHOST_USER_H
> > +#define HW_VIRTIO_VHOST_USER_H
> > +
> > +#include "chardev/char-fe.h"
> > +
> > +typedef struct VhostUserState {
> > +    CharBackend *chr;
> > +} VhostUserState;
> > +
> > +VhostUserState *vhost_user_init(void);
> > +void vhost_user_cleanup(VhostUserState *user);
> > +
> > +#endif
> > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > index fa28aad12d..525a061acf 100644
> > --- a/net/vhost-user.c
> > +++ b/net/vhost-user.c
> > @@ -12,6 +12,7 @@
> >  #include "clients.h"
> >  #include "net/vhost_net.h"
> >  #include "net/vhost-user.h"
> > +#include "hw/virtio/vhost-user.h"
> >  #include "chardev/char-fe.h"
> >  #include "qapi/error.h"
> >  #include "qapi/qapi-commands-net.h"
> > @@ -23,6 +24,7 @@
> >  typedef struct NetVhostUserState {
> >      NetClientState nc;
> >      CharBackend chr; /* only queue index 0 */
> > +    VhostUserState *vhost_user;
> >      VHostNetState *vhost_net;
> >      guint watch;
> >      uint64_t acked_features;
> > @@ -64,7 +66,8 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
> >      }
> >  }
> >  
> > -static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
> > +static int vhost_user_start(int queues, NetClientState *ncs[],
> > +                            VhostUserState *be)
> >  {
> >      VhostNetOptions options;
> >      struct vhost_net *net = NULL;
> > @@ -144,7 +147,7 @@ static ssize_t vhost_user_receive(NetClientState *nc, const uint8_t *buf,
> >      return size;
> >  }
> >  
> > -static void vhost_user_cleanup(NetClientState *nc)
> > +static void net_vhost_user_cleanup(NetClientState *nc)
> >  {
> >      NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
> >  
> > @@ -153,6 +156,11 @@ static void vhost_user_cleanup(NetClientState *nc)
> >          g_free(s->vhost_net);
> >          s->vhost_net = NULL;
> >      }
> > +    if (s->vhost_user) {
> > +        vhost_user_cleanup(s->vhost_user);
> > +        g_free(s->vhost_user);
> > +        s->vhost_user = NULL;
> > +    }
> >      if (nc->queue_index == 0) {
> >          if (s->watch) {
> >              g_source_remove(s->watch);
> > @@ -182,7 +190,7 @@ static NetClientInfo net_vhost_user_info = {
> >          .type = NET_CLIENT_DRIVER_VHOST_USER,
> >          .size = sizeof(NetVhostUserState),
> >          .receive = vhost_user_receive,
> > -        .cleanup = vhost_user_cleanup,
> > +        .cleanup = net_vhost_user_cleanup,
> >          .has_vnet_hdr = vhost_user_has_vnet_hdr,
> >          .has_ufo = vhost_user_has_ufo,
> >  };
> > @@ -244,7 +252,7 @@ static void net_vhost_user_event(void *opaque, int event)
> >      trace_vhost_user_event(chr->label, event);
> >      switch (event) {
> >      case CHR_EVENT_OPENED:
> > -        if (vhost_user_start(queues, ncs, &s->chr) < 0) {
> > +        if (vhost_user_start(queues, ncs, s->vhost_user) < 0) {
> >              qemu_chr_fe_disconnect(&s->chr);
> >              return;
> >          }
> > @@ -283,12 +291,19 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> >  {
> >      Error *err = NULL;
> >      NetClientState *nc, *nc0 = NULL;
> > +    VhostUserState *user = NULL;
> >      NetVhostUserState *s;
> >      int i;
> >  
> >      assert(name);
> >      assert(queues > 0);
> >  
> > +    user = vhost_user_init();
> > +    if (!user) {
> > +        error_report("failed to init vhost_user");
> > +        goto err;
> > +    }
> > +
> >      for (i = 0; i < queues; i++) {
> >          nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
> >          snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to %s",
> > @@ -299,17 +314,19 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> >              s = DO_UPCAST(NetVhostUserState, nc, nc);
> >              if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
> >                  error_report_err(err);
> > -                return -1;
> > +                goto err;
> >              }
> > +            user->chr = &s->chr;
> >          }
> > -
> > +        s = DO_UPCAST(NetVhostUserState, nc, nc);
> > +        s->vhost_user = user;
> >      }
> >  
> >      s = DO_UPCAST(NetVhostUserState, nc, nc0);
> >      do {
> >          if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
> >              error_report_err(err);
> > -            return -1;
> > +            goto err;
> >          }
> >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> >      assert(s->vhost_net);
> >  
> >      return 0;
> > +
> > +err:
> > +    if (user) {
> > +        vhost_user_cleanup(user);
> > +        g_free(user);
> > +        s->vhost_user = NULL;
> > +    }
> > +
> > +    return -1;
> >  }
> >  
> >  static Chardev *net_vhost_claim_chardev(
> > -- 
> > 2.11.0

So far I figured out that commenting the free of
the structure removes the crash, so we seem to
be dealing with a use-after free here.
I suspect that in a MQ situation, one queue gets
closed and attempts to free the structure
while others still use it.

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 525a061..6a1573b 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
         s->vhost_net = NULL;
     }
     if (s->vhost_user) {
-        vhost_user_cleanup(s->vhost_user);
-        g_free(s->vhost_user);
+        //vhost_user_cleanup(s->vhost_user);
+        //g_free(s->vhost_user);
         s->vhost_user = NULL;
     }
     if (nc->queue_index == 0) {
@@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
 
 err:
     if (user) {
-        vhost_user_cleanup(user);
-        g_free(user);
+        //vhost_user_cleanup(user);
+        //g_free(user);
         s->vhost_user = NULL;
     }
 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
  2018-05-23 15:36       ` [virtio-dev] " Michael S. Tsirkin
@ 2018-05-23 15:43         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-05-23 15:43 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Wed, May 23, 2018 at 06:36:05PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > > When multi queue is enabled e.g. for a virtio-net device,
> > > each queue pair will have a vhost_dev, and the only thing
> > > shared between vhost devs currently is the chardev. This
> > > patch introduces a vhost-user state structure which will
> > > be shared by all vhost devs of the same virtio device.
> > > 
> > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > 
> > Unfortunately this patch seems to cause crashes.
> > To reproduce, simply run
> > make check-qtest-x86_64
> > 
> > Sorry that it took me a while to find - it triggers 90% of runs but not
> > 100% which complicates bisection somewhat.
> > 
> > > ---
> > >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> > >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> > >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> > >  hw/virtio/Makefile.objs             |  2 +-
> > >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> > >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> > >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> > >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> > >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> > >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> > >  10 files changed, 149 insertions(+), 20 deletions(-)
> > >  create mode 100644 include/hw/virtio/vhost-user.h
> > > 
> > > diff --git a/backends/cryptodev-vhost-user.c b/backends/cryptodev-vhost-user.c
> > > index 862d4f2580..d52daccfcd 100644
> > > --- a/backends/cryptodev-vhost-user.c
> > > +++ b/backends/cryptodev-vhost-user.c
> > > @@ -26,6 +26,7 @@
> > >  #include "qapi/error.h"
> > >  #include "qapi/qmp/qerror.h"
> > >  #include "qemu/error-report.h"
> > > +#include "hw/virtio/vhost-user.h"
> > >  #include "standard-headers/linux/virtio_crypto.h"
> > >  #include "sysemu/cryptodev-vhost.h"
> > >  #include "chardev/char-fe.h"
> > > @@ -46,6 +47,7 @@
> > >  typedef struct CryptoDevBackendVhostUser {
> > >      CryptoDevBackend parent_obj;
> > >  
> > > +    VhostUserState *vhost_user;
> > >      CharBackend chr;
> > >      char *chr_name;
> > >      bool opened;
> > > @@ -102,7 +104,7 @@ cryptodev_vhost_user_start(int queues,
> > >              continue;
> > >          }
> > >  
> > > -        options.opaque = &s->chr;
> > > +        options.opaque = s->vhost_user;
> > >          options.backend_type = VHOST_BACKEND_TYPE_USER;
> > >          options.cc = b->conf.peers.ccs[i];
> > >          s->vhost_crypto[i] = cryptodev_vhost_init(&options);
> > > @@ -185,6 +187,7 @@ static void cryptodev_vhost_user_init(
> > >      size_t i;
> > >      Error *local_err = NULL;
> > >      Chardev *chr;
> > > +    VhostUserState *user;
> > >      CryptoDevBackendClient *cc;
> > >      CryptoDevBackendVhostUser *s =
> > >                        CRYPTODEV_BACKEND_VHOST_USER(backend);
> > > @@ -215,6 +218,15 @@ static void cryptodev_vhost_user_init(
> > >          }
> > >      }
> > >  
> > > +    user = vhost_user_init();
> > > +    if (!user) {
> > > +        error_setg(errp, "Failed to init vhost_user");
> > > +        return;
> > > +    }
> > > +
> > > +    user->chr = &s->chr;
> > > +    s->vhost_user = user;
> > > +
> > >      qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > >                       cryptodev_vhost_user_event, NULL, s, NULL, true);
> > >  
> > > @@ -299,6 +311,12 @@ static void cryptodev_vhost_user_cleanup(
> > >              backend->conf.peers.ccs[i] = NULL;
> > >          }
> > >      }
> > > +
> > > +    if (s->vhost_user) {
> > > +        vhost_user_cleanup(s->vhost_user);
> > > +        g_free(s->vhost_user);
> > > +        s->vhost_user = NULL;
> > > +    }
> > >  }
> > >  
> > >  static void cryptodev_vhost_user_set_chardev(Object *obj,
> > > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > > index 262baca432..4021d71c31 100644
> > > --- a/hw/block/vhost-user-blk.c
> > > +++ b/hw/block/vhost-user-blk.c
> > > @@ -229,6 +229,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> > >  {
> > >      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > > +    VhostUserState *user;
> > >      int i, ret;
> > >  
> > >      if (!s->chardev.chr) {
> > > @@ -246,6 +247,15 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> > >          return;
> > >      }
> > >  
> > > +    user = vhost_user_init();
> > > +    if (!user) {
> > > +        error_setg(errp, "vhost-user-blk: failed to init vhost_user");
> > > +        return;
> > > +    }
> > > +
> > > +    user->chr = &s->chardev;
> > > +    s->vhost_user = user;
> > > +
> > >      virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
> > >                  sizeof(struct virtio_blk_config));
> > >  
> > > @@ -261,7 +271,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> > >  
> > >      vhost_dev_set_config_notifier(&s->dev, &blk_ops);
> > >  
> > > -    ret = vhost_dev_init(&s->dev, &s->chardev, VHOST_BACKEND_TYPE_USER, 0);
> > > +    ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
> > >      if (ret < 0) {
> > >          error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
> > >                     strerror(-ret));
> > > @@ -286,6 +296,10 @@ vhost_err:
> > >  virtio_err:
> > >      g_free(s->dev.vqs);
> > >      virtio_cleanup(vdev);
> > > +
> > > +    vhost_user_cleanup(user);
> > > +    g_free(user);
> > > +    s->vhost_user = NULL;
> > >  }
> > >  
> > >  static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
> > > @@ -297,6 +311,12 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
> > >      vhost_dev_cleanup(&s->dev);
> > >      g_free(s->dev.vqs);
> > >      virtio_cleanup(vdev);
> > > +
> > > +    if (s->vhost_user) {
> > > +        vhost_user_cleanup(s->vhost_user);
> > > +        g_free(s->vhost_user);
> > > +        s->vhost_user = NULL;
> > > +    }
> > >  }
> > >  
> > >  static void vhost_user_blk_instance_init(Object *obj)
> > > diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> > > index 9389ed48e0..9355cfdf07 100644
> > > --- a/hw/scsi/vhost-user-scsi.c
> > > +++ b/hw/scsi/vhost-user-scsi.c
> > > @@ -69,6 +69,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
> > >      VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> > >      VHostUserSCSI *s = VHOST_USER_SCSI(dev);
> > >      VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> > > +    VhostUserState *user;
> > >      Error *err = NULL;
> > >      int ret;
> > >  
> > > @@ -85,19 +86,30 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
> > >          return;
> > >      }
> > >  
> > > +    user = vhost_user_init();
> > > +    if (!user) {
> > > +        error_setg(errp, "vhost-user-scsi: failed to init vhost_user");
> > > +        return;
> > > +    }
> > > +    user->chr = &vs->conf.chardev;
> > > +
> > >      vsc->dev.nvqs = 2 + vs->conf.num_queues;
> > >      vsc->dev.vqs = g_new(struct vhost_virtqueue, vsc->dev.nvqs);
> > >      vsc->dev.vq_index = 0;
> > >      vsc->dev.backend_features = 0;
> > >  
> > > -    ret = vhost_dev_init(&vsc->dev, (void *)&vs->conf.chardev,
> > > +    ret = vhost_dev_init(&vsc->dev, user,
> > >                           VHOST_BACKEND_TYPE_USER, 0);
> > >      if (ret < 0) {
> > >          error_setg(errp, "vhost-user-scsi: vhost initialization failed: %s",
> > >                     strerror(-ret));
> > > +        vhost_user_cleanup(user);
> > > +        g_free(user);
> > >          return;
> > >      }
> > >  
> > > +    s->vhost_user = user;
> > > +
> > >      /* Channel and lun both are 0 for bootable vhost-user-scsi disk */
> > >      vsc->channel = 0;
> > >      vsc->lun = 0;
> > > @@ -117,6 +129,12 @@ static void vhost_user_scsi_unrealize(DeviceState *dev, Error **errp)
> > >      g_free(vsc->dev.vqs);
> > >  
> > >      virtio_scsi_common_unrealize(dev, errp);
> > > +
> > > +    if (s->vhost_user) {
> > > +        vhost_user_cleanup(s->vhost_user);
> > > +        g_free(s->vhost_user);
> > > +        s->vhost_user = NULL;
> > > +    }
> > >  }
> > >  
> > >  static uint64_t vhost_user_scsi_get_features(VirtIODevice *vdev,
> > > diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
> > > index 765d363c1f..030969e28c 100644
> > > --- a/hw/virtio/Makefile.objs
> > > +++ b/hw/virtio/Makefile.objs
> > > @@ -11,5 +11,5 @@ obj-y += virtio-crypto.o
> > >  obj-$(CONFIG_VIRTIO_PCI) += virtio-crypto-pci.o
> > >  endif
> > >  
> > > -common-obj-$(call lnot,$(CONFIG_LINUX)) += vhost-stub.o
> > > +common-obj-$(call lnot,$(call land,$(CONFIG_VIRTIO),$(CONFIG_LINUX))) += vhost-stub.o
> > >  common-obj-$(CONFIG_ALL) += vhost-stub.o
> > > diff --git a/hw/virtio/vhost-stub.c b/hw/virtio/vhost-stub.c
> > > index 2d76cdebdc..049089b5e2 100644
> > > --- a/hw/virtio/vhost-stub.c
> > > +++ b/hw/virtio/vhost-stub.c
> > > @@ -1,7 +1,17 @@
> > >  #include "qemu/osdep.h"
> > >  #include "hw/virtio/vhost.h"
> > > +#include "hw/virtio/vhost-user.h"
> > >  
> > >  bool vhost_has_free_slot(void)
> > >  {
> > >      return true;
> > >  }
> > > +
> > > +VhostUserState *vhost_user_init(void)
> > > +{
> > > +    return NULL;
> > > +}
> > > +
> > > +void vhost_user_cleanup(VhostUserState *user)
> > > +{
> > > +}
> > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index 38da8692bb..91edd95453 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -11,6 +11,7 @@
> > >  #include "qemu/osdep.h"
> > >  #include "qapi/error.h"
> > >  #include "hw/virtio/vhost.h"
> > > +#include "hw/virtio/vhost-user.h"
> > >  #include "hw/virtio/vhost-backend.h"
> > >  #include "hw/virtio/virtio-net.h"
> > >  #include "chardev/char-fe.h"
> > > @@ -173,7 +174,8 @@ static VhostUserMsg m __attribute__ ((unused));
> > >  
> > >  struct vhost_user {
> > >      struct vhost_dev *dev;
> > > -    CharBackend *chr;
> > > +    /* Shared between vhost devs of the same virtio device */
> > > +    VhostUserState *user;
> > >      int slave_fd;
> > >      NotifierWithReturn postcopy_notifier;
> > >      struct PostCopyFD  postcopy_fd;
> > > @@ -199,7 +201,7 @@ static bool ioeventfd_enabled(void)
> > >  static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
> > >  {
> > >      struct vhost_user *u = dev->opaque;
> > > -    CharBackend *chr = u->chr;
> > > +    CharBackend *chr = u->user->chr;
> > >      uint8_t *p = (uint8_t *) msg;
> > >      int r, size = VHOST_USER_HDR_SIZE;
> > >  
> > > @@ -285,7 +287,7 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
> > >                              int *fds, int fd_num)
> > >  {
> > >      struct vhost_user *u = dev->opaque;
> > > -    CharBackend *chr = u->chr;
> > > +    CharBackend *chr = u->user->chr;
> > >      int ret, size = VHOST_USER_HDR_SIZE + msg->hdr.size;
> > >  
> > >      /*
> > > @@ -1044,7 +1046,7 @@ static int vhost_user_postcopy_waker(struct PostCopyFD *pcfd, RAMBlock *rb,
> > >  static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
> > >  {
> > >      struct vhost_user *u = dev->opaque;
> > > -    CharBackend *chr = u->chr;
> > > +    CharBackend *chr = u->user->chr;
> > >      int ufd;
> > >      VhostUserMsg msg = {
> > >          .hdr.request = VHOST_USER_POSTCOPY_ADVISE,
> > > @@ -1182,7 +1184,7 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
> > >      return 0;
> > >  }
> > >  
> > > -static int vhost_user_init(struct vhost_dev *dev, void *opaque)
> > > +static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
> > >  {
> > >      uint64_t features, protocol_features;
> > >      struct vhost_user *u;
> > > @@ -1191,7 +1193,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
> > >      assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
> > >  
> > >      u = g_new0(struct vhost_user, 1);
> > > -    u->chr = opaque;
> > > +    u->user = opaque;
> > >      u->slave_fd = -1;
> > >      u->dev = dev;
> > >      dev->opaque = u;
> > > @@ -1267,7 +1269,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
> > >      return 0;
> > >  }
> > >  
> > > -static int vhost_user_cleanup(struct vhost_dev *dev)
> > > +static int vhost_user_backend_cleanup(struct vhost_dev *dev)
> > >  {
> > >      struct vhost_user *u;
> > >  
> > > @@ -1581,10 +1583,21 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
> > >      return 0;
> > >  }
> > >  
> > > +VhostUserState *vhost_user_init(void)
> > > +{
> > > +    VhostUserState *user = g_new0(struct VhostUserState, 1);
> > > +
> > > +    return user;
> > > +}
> > > +
> > > +void vhost_user_cleanup(VhostUserState *user)
> > > +{
> > > +}
> > > +
> > >  const VhostOps user_ops = {
> > >          .backend_type = VHOST_BACKEND_TYPE_USER,
> > > -        .vhost_backend_init = vhost_user_init,
> > > -        .vhost_backend_cleanup = vhost_user_cleanup,
> > > +        .vhost_backend_init = vhost_user_backend_init,
> > > +        .vhost_backend_cleanup = vhost_user_backend_cleanup,
> > >          .vhost_backend_memslots_limit = vhost_user_memslots_limit,
> > >          .vhost_set_log_base = vhost_user_set_log_base,
> > >          .vhost_set_mem_table = vhost_user_set_mem_table,
> > > diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
> > > index 5804cc904a..f1258ae545 100644
> > > --- a/include/hw/virtio/vhost-user-blk.h
> > > +++ b/include/hw/virtio/vhost-user-blk.h
> > > @@ -21,6 +21,7 @@
> > >  #include "hw/block/block.h"
> > >  #include "chardev/char-fe.h"
> > >  #include "hw/virtio/vhost.h"
> > > +#include "hw/virtio/vhost-user.h"
> > >  
> > >  #define TYPE_VHOST_USER_BLK "vhost-user-blk"
> > >  #define VHOST_USER_BLK(obj) \
> > > @@ -36,6 +37,7 @@ typedef struct VHostUserBlk {
> > >      uint32_t config_wce;
> > >      uint32_t config_ro;
> > >      struct vhost_dev dev;
> > > +    VhostUserState *vhost_user;
> > >  } VHostUserBlk;
> > >  
> > >  #endif
> > > diff --git a/include/hw/virtio/vhost-user-scsi.h b/include/hw/virtio/vhost-user-scsi.h
> > > index 01861f78d0..3ec34ae867 100644
> > > --- a/include/hw/virtio/vhost-user-scsi.h
> > > +++ b/include/hw/virtio/vhost-user-scsi.h
> > > @@ -21,6 +21,7 @@
> > >  #include "hw/qdev.h"
> > >  #include "hw/virtio/virtio-scsi.h"
> > >  #include "hw/virtio/vhost.h"
> > > +#include "hw/virtio/vhost-user.h"
> > >  #include "hw/virtio/vhost-scsi-common.h"
> > >  
> > >  #define TYPE_VHOST_USER_SCSI "vhost-user-scsi"
> > > @@ -30,6 +31,7 @@
> > >  typedef struct VHostUserSCSI {
> > >      VHostSCSICommon parent_obj;
> > >      uint64_t host_features;
> > > +    VhostUserState *vhost_user;
> > >  } VHostUserSCSI;
> > >  
> > >  #endif /* VHOST_USER_SCSI_H */
> > > diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> > > new file mode 100644
> > > index 0000000000..eb8bc0d90d
> > > --- /dev/null
> > > +++ b/include/hw/virtio/vhost-user.h
> > > @@ -0,0 +1,20 @@
> > > +/*
> > > + * Copyright (c) 2017-2018 Intel Corporation
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2.
> > > + * See the COPYING file in the top-level directory.
> > > + */
> > > +
> > > +#ifndef HW_VIRTIO_VHOST_USER_H
> > > +#define HW_VIRTIO_VHOST_USER_H
> > > +
> > > +#include "chardev/char-fe.h"
> > > +
> > > +typedef struct VhostUserState {
> > > +    CharBackend *chr;
> > > +} VhostUserState;
> > > +
> > > +VhostUserState *vhost_user_init(void);
> > > +void vhost_user_cleanup(VhostUserState *user);
> > > +
> > > +#endif
> > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > index fa28aad12d..525a061acf 100644
> > > --- a/net/vhost-user.c
> > > +++ b/net/vhost-user.c
> > > @@ -12,6 +12,7 @@
> > >  #include "clients.h"
> > >  #include "net/vhost_net.h"
> > >  #include "net/vhost-user.h"
> > > +#include "hw/virtio/vhost-user.h"
> > >  #include "chardev/char-fe.h"
> > >  #include "qapi/error.h"
> > >  #include "qapi/qapi-commands-net.h"
> > > @@ -23,6 +24,7 @@
> > >  typedef struct NetVhostUserState {
> > >      NetClientState nc;
> > >      CharBackend chr; /* only queue index 0 */
> > > +    VhostUserState *vhost_user;
> > >      VHostNetState *vhost_net;
> > >      guint watch;
> > >      uint64_t acked_features;
> > > @@ -64,7 +66,8 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
> > >      }
> > >  }
> > >  
> > > -static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
> > > +static int vhost_user_start(int queues, NetClientState *ncs[],
> > > +                            VhostUserState *be)
> > >  {
> > >      VhostNetOptions options;
> > >      struct vhost_net *net = NULL;
> > > @@ -144,7 +147,7 @@ static ssize_t vhost_user_receive(NetClientState *nc, const uint8_t *buf,
> > >      return size;
> > >  }
> > >  
> > > -static void vhost_user_cleanup(NetClientState *nc)
> > > +static void net_vhost_user_cleanup(NetClientState *nc)
> > >  {
> > >      NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
> > >  
> > > @@ -153,6 +156,11 @@ static void vhost_user_cleanup(NetClientState *nc)
> > >          g_free(s->vhost_net);
> > >          s->vhost_net = NULL;
> > >      }
> > > +    if (s->vhost_user) {
> > > +        vhost_user_cleanup(s->vhost_user);
> > > +        g_free(s->vhost_user);
> > > +        s->vhost_user = NULL;
> > > +    }
> > >      if (nc->queue_index == 0) {
> > >          if (s->watch) {
> > >              g_source_remove(s->watch);
> > > @@ -182,7 +190,7 @@ static NetClientInfo net_vhost_user_info = {
> > >          .type = NET_CLIENT_DRIVER_VHOST_USER,
> > >          .size = sizeof(NetVhostUserState),
> > >          .receive = vhost_user_receive,
> > > -        .cleanup = vhost_user_cleanup,
> > > +        .cleanup = net_vhost_user_cleanup,
> > >          .has_vnet_hdr = vhost_user_has_vnet_hdr,
> > >          .has_ufo = vhost_user_has_ufo,
> > >  };
> > > @@ -244,7 +252,7 @@ static void net_vhost_user_event(void *opaque, int event)
> > >      trace_vhost_user_event(chr->label, event);
> > >      switch (event) {
> > >      case CHR_EVENT_OPENED:
> > > -        if (vhost_user_start(queues, ncs, &s->chr) < 0) {
> > > +        if (vhost_user_start(queues, ncs, s->vhost_user) < 0) {
> > >              qemu_chr_fe_disconnect(&s->chr);
> > >              return;
> > >          }
> > > @@ -283,12 +291,19 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > >  {
> > >      Error *err = NULL;
> > >      NetClientState *nc, *nc0 = NULL;
> > > +    VhostUserState *user = NULL;
> > >      NetVhostUserState *s;
> > >      int i;
> > >  
> > >      assert(name);
> > >      assert(queues > 0);
> > >  
> > > +    user = vhost_user_init();
> > > +    if (!user) {
> > > +        error_report("failed to init vhost_user");
> > > +        goto err;
> > > +    }
> > > +
> > >      for (i = 0; i < queues; i++) {
> > >          nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
> > >          snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to %s",
> > > @@ -299,17 +314,19 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > >              s = DO_UPCAST(NetVhostUserState, nc, nc);
> > >              if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
> > >                  error_report_err(err);
> > > -                return -1;
> > > +                goto err;
> > >              }
> > > +            user->chr = &s->chr;
> > >          }
> > > -
> > > +        s = DO_UPCAST(NetVhostUserState, nc, nc);
> > > +        s->vhost_user = user;
> > >      }
> > >  
> > >      s = DO_UPCAST(NetVhostUserState, nc, nc0);
> > >      do {
> > >          if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
> > >              error_report_err(err);
> > > -            return -1;
> > > +            goto err;
> > >          }
> > >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > >      assert(s->vhost_net);
> > >  
> > >      return 0;
> > > +
> > > +err:
> > > +    if (user) {
> > > +        vhost_user_cleanup(user);
> > > +        g_free(user);
> > > +        s->vhost_user = NULL;
> > > +    }
> > > +
> > > +    return -1;
> > >  }
> > >  
> > >  static Chardev *net_vhost_claim_chardev(
> > > -- 
> > > 2.11.0
> 
> So far I figured out that commenting the free of
> the structure removes the crash, so we seem to
> be dealing with a use-after free here.
> I suspect that in a MQ situation, one queue gets
> closed and attempts to free the structure
> while others still use it.
> 
> diff --git a/net/vhost-user.c b/net/vhost-user.c
> index 525a061..6a1573b 100644
> --- a/net/vhost-user.c
> +++ b/net/vhost-user.c
> @@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
>          s->vhost_net = NULL;
>      }
>      if (s->vhost_user) {
> -        vhost_user_cleanup(s->vhost_user);
> -        g_free(s->vhost_user);
> +        //vhost_user_cleanup(s->vhost_user);
> +        //g_free(s->vhost_user);
>          s->vhost_user = NULL;
>      }
>      if (nc->queue_index == 0) {
> @@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
>  
>  err:
>      if (user) {
> -        vhost_user_cleanup(user);
> -        g_free(user);
> +        //vhost_user_cleanup(user);
> +        //g_free(user);
>          s->vhost_user = NULL;
>      }
>  


So the following at least gets rid of the crashes.
I am not sure it does not leak memory though,
and not sure there aren't any configurations where
the 1st queue gets cleaned up first.

Thoughts?

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 525a061..7549d25 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -156,19 +156,20 @@ static void net_vhost_user_cleanup(NetClientState *nc)
         g_free(s->vhost_net);
         s->vhost_net = NULL;
     }
-    if (s->vhost_user) {
-        vhost_user_cleanup(s->vhost_user);
-        g_free(s->vhost_user);
-        s->vhost_user = NULL;
-    }
     if (nc->queue_index == 0) {
         if (s->watch) {
             g_source_remove(s->watch);
             s->watch = 0;
         }
         qemu_chr_fe_deinit(&s->chr, true);
+        if (s->vhost_user) {
+            vhost_user_cleanup(s->vhost_user);
+            g_free(s->vhost_user);
+        }
     }
 
+    s->vhost_user = NULL;
+
     qemu_purge_queued_packets(nc);
 }
 
@@ -341,7 +342,6 @@ err:
     if (user) {
         vhost_user_cleanup(user);
         g_free(user);
-        s->vhost_user = NULL;
     }
 
     return -1;

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
@ 2018-05-23 15:43         ` Michael S. Tsirkin
  0 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-05-23 15:43 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Wed, May 23, 2018 at 06:36:05PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > > When multi queue is enabled e.g. for a virtio-net device,
> > > each queue pair will have a vhost_dev, and the only thing
> > > shared between vhost devs currently is the chardev. This
> > > patch introduces a vhost-user state structure which will
> > > be shared by all vhost devs of the same virtio device.
> > > 
> > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > 
> > Unfortunately this patch seems to cause crashes.
> > To reproduce, simply run
> > make check-qtest-x86_64
> > 
> > Sorry that it took me a while to find - it triggers 90% of runs but not
> > 100% which complicates bisection somewhat.
> > 
> > > ---
> > >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> > >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> > >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> > >  hw/virtio/Makefile.objs             |  2 +-
> > >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> > >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> > >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> > >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> > >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> > >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> > >  10 files changed, 149 insertions(+), 20 deletions(-)
> > >  create mode 100644 include/hw/virtio/vhost-user.h
> > > 
> > > diff --git a/backends/cryptodev-vhost-user.c b/backends/cryptodev-vhost-user.c
> > > index 862d4f2580..d52daccfcd 100644
> > > --- a/backends/cryptodev-vhost-user.c
> > > +++ b/backends/cryptodev-vhost-user.c
> > > @@ -26,6 +26,7 @@
> > >  #include "qapi/error.h"
> > >  #include "qapi/qmp/qerror.h"
> > >  #include "qemu/error-report.h"
> > > +#include "hw/virtio/vhost-user.h"
> > >  #include "standard-headers/linux/virtio_crypto.h"
> > >  #include "sysemu/cryptodev-vhost.h"
> > >  #include "chardev/char-fe.h"
> > > @@ -46,6 +47,7 @@
> > >  typedef struct CryptoDevBackendVhostUser {
> > >      CryptoDevBackend parent_obj;
> > >  
> > > +    VhostUserState *vhost_user;
> > >      CharBackend chr;
> > >      char *chr_name;
> > >      bool opened;
> > > @@ -102,7 +104,7 @@ cryptodev_vhost_user_start(int queues,
> > >              continue;
> > >          }
> > >  
> > > -        options.opaque = &s->chr;
> > > +        options.opaque = s->vhost_user;
> > >          options.backend_type = VHOST_BACKEND_TYPE_USER;
> > >          options.cc = b->conf.peers.ccs[i];
> > >          s->vhost_crypto[i] = cryptodev_vhost_init(&options);
> > > @@ -185,6 +187,7 @@ static void cryptodev_vhost_user_init(
> > >      size_t i;
> > >      Error *local_err = NULL;
> > >      Chardev *chr;
> > > +    VhostUserState *user;
> > >      CryptoDevBackendClient *cc;
> > >      CryptoDevBackendVhostUser *s =
> > >                        CRYPTODEV_BACKEND_VHOST_USER(backend);
> > > @@ -215,6 +218,15 @@ static void cryptodev_vhost_user_init(
> > >          }
> > >      }
> > >  
> > > +    user = vhost_user_init();
> > > +    if (!user) {
> > > +        error_setg(errp, "Failed to init vhost_user");
> > > +        return;
> > > +    }
> > > +
> > > +    user->chr = &s->chr;
> > > +    s->vhost_user = user;
> > > +
> > >      qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > >                       cryptodev_vhost_user_event, NULL, s, NULL, true);
> > >  
> > > @@ -299,6 +311,12 @@ static void cryptodev_vhost_user_cleanup(
> > >              backend->conf.peers.ccs[i] = NULL;
> > >          }
> > >      }
> > > +
> > > +    if (s->vhost_user) {
> > > +        vhost_user_cleanup(s->vhost_user);
> > > +        g_free(s->vhost_user);
> > > +        s->vhost_user = NULL;
> > > +    }
> > >  }
> > >  
> > >  static void cryptodev_vhost_user_set_chardev(Object *obj,
> > > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > > index 262baca432..4021d71c31 100644
> > > --- a/hw/block/vhost-user-blk.c
> > > +++ b/hw/block/vhost-user-blk.c
> > > @@ -229,6 +229,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> > >  {
> > >      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > > +    VhostUserState *user;
> > >      int i, ret;
> > >  
> > >      if (!s->chardev.chr) {
> > > @@ -246,6 +247,15 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> > >          return;
> > >      }
> > >  
> > > +    user = vhost_user_init();
> > > +    if (!user) {
> > > +        error_setg(errp, "vhost-user-blk: failed to init vhost_user");
> > > +        return;
> > > +    }
> > > +
> > > +    user->chr = &s->chardev;
> > > +    s->vhost_user = user;
> > > +
> > >      virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
> > >                  sizeof(struct virtio_blk_config));
> > >  
> > > @@ -261,7 +271,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> > >  
> > >      vhost_dev_set_config_notifier(&s->dev, &blk_ops);
> > >  
> > > -    ret = vhost_dev_init(&s->dev, &s->chardev, VHOST_BACKEND_TYPE_USER, 0);
> > > +    ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
> > >      if (ret < 0) {
> > >          error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
> > >                     strerror(-ret));
> > > @@ -286,6 +296,10 @@ vhost_err:
> > >  virtio_err:
> > >      g_free(s->dev.vqs);
> > >      virtio_cleanup(vdev);
> > > +
> > > +    vhost_user_cleanup(user);
> > > +    g_free(user);
> > > +    s->vhost_user = NULL;
> > >  }
> > >  
> > >  static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
> > > @@ -297,6 +311,12 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
> > >      vhost_dev_cleanup(&s->dev);
> > >      g_free(s->dev.vqs);
> > >      virtio_cleanup(vdev);
> > > +
> > > +    if (s->vhost_user) {
> > > +        vhost_user_cleanup(s->vhost_user);
> > > +        g_free(s->vhost_user);
> > > +        s->vhost_user = NULL;
> > > +    }
> > >  }
> > >  
> > >  static void vhost_user_blk_instance_init(Object *obj)
> > > diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> > > index 9389ed48e0..9355cfdf07 100644
> > > --- a/hw/scsi/vhost-user-scsi.c
> > > +++ b/hw/scsi/vhost-user-scsi.c
> > > @@ -69,6 +69,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
> > >      VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> > >      VHostUserSCSI *s = VHOST_USER_SCSI(dev);
> > >      VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> > > +    VhostUserState *user;
> > >      Error *err = NULL;
> > >      int ret;
> > >  
> > > @@ -85,19 +86,30 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
> > >          return;
> > >      }
> > >  
> > > +    user = vhost_user_init();
> > > +    if (!user) {
> > > +        error_setg(errp, "vhost-user-scsi: failed to init vhost_user");
> > > +        return;
> > > +    }
> > > +    user->chr = &vs->conf.chardev;
> > > +
> > >      vsc->dev.nvqs = 2 + vs->conf.num_queues;
> > >      vsc->dev.vqs = g_new(struct vhost_virtqueue, vsc->dev.nvqs);
> > >      vsc->dev.vq_index = 0;
> > >      vsc->dev.backend_features = 0;
> > >  
> > > -    ret = vhost_dev_init(&vsc->dev, (void *)&vs->conf.chardev,
> > > +    ret = vhost_dev_init(&vsc->dev, user,
> > >                           VHOST_BACKEND_TYPE_USER, 0);
> > >      if (ret < 0) {
> > >          error_setg(errp, "vhost-user-scsi: vhost initialization failed: %s",
> > >                     strerror(-ret));
> > > +        vhost_user_cleanup(user);
> > > +        g_free(user);
> > >          return;
> > >      }
> > >  
> > > +    s->vhost_user = user;
> > > +
> > >      /* Channel and lun both are 0 for bootable vhost-user-scsi disk */
> > >      vsc->channel = 0;
> > >      vsc->lun = 0;
> > > @@ -117,6 +129,12 @@ static void vhost_user_scsi_unrealize(DeviceState *dev, Error **errp)
> > >      g_free(vsc->dev.vqs);
> > >  
> > >      virtio_scsi_common_unrealize(dev, errp);
> > > +
> > > +    if (s->vhost_user) {
> > > +        vhost_user_cleanup(s->vhost_user);
> > > +        g_free(s->vhost_user);
> > > +        s->vhost_user = NULL;
> > > +    }
> > >  }
> > >  
> > >  static uint64_t vhost_user_scsi_get_features(VirtIODevice *vdev,
> > > diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
> > > index 765d363c1f..030969e28c 100644
> > > --- a/hw/virtio/Makefile.objs
> > > +++ b/hw/virtio/Makefile.objs
> > > @@ -11,5 +11,5 @@ obj-y += virtio-crypto.o
> > >  obj-$(CONFIG_VIRTIO_PCI) += virtio-crypto-pci.o
> > >  endif
> > >  
> > > -common-obj-$(call lnot,$(CONFIG_LINUX)) += vhost-stub.o
> > > +common-obj-$(call lnot,$(call land,$(CONFIG_VIRTIO),$(CONFIG_LINUX))) += vhost-stub.o
> > >  common-obj-$(CONFIG_ALL) += vhost-stub.o
> > > diff --git a/hw/virtio/vhost-stub.c b/hw/virtio/vhost-stub.c
> > > index 2d76cdebdc..049089b5e2 100644
> > > --- a/hw/virtio/vhost-stub.c
> > > +++ b/hw/virtio/vhost-stub.c
> > > @@ -1,7 +1,17 @@
> > >  #include "qemu/osdep.h"
> > >  #include "hw/virtio/vhost.h"
> > > +#include "hw/virtio/vhost-user.h"
> > >  
> > >  bool vhost_has_free_slot(void)
> > >  {
> > >      return true;
> > >  }
> > > +
> > > +VhostUserState *vhost_user_init(void)
> > > +{
> > > +    return NULL;
> > > +}
> > > +
> > > +void vhost_user_cleanup(VhostUserState *user)
> > > +{
> > > +}
> > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index 38da8692bb..91edd95453 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -11,6 +11,7 @@
> > >  #include "qemu/osdep.h"
> > >  #include "qapi/error.h"
> > >  #include "hw/virtio/vhost.h"
> > > +#include "hw/virtio/vhost-user.h"
> > >  #include "hw/virtio/vhost-backend.h"
> > >  #include "hw/virtio/virtio-net.h"
> > >  #include "chardev/char-fe.h"
> > > @@ -173,7 +174,8 @@ static VhostUserMsg m __attribute__ ((unused));
> > >  
> > >  struct vhost_user {
> > >      struct vhost_dev *dev;
> > > -    CharBackend *chr;
> > > +    /* Shared between vhost devs of the same virtio device */
> > > +    VhostUserState *user;
> > >      int slave_fd;
> > >      NotifierWithReturn postcopy_notifier;
> > >      struct PostCopyFD  postcopy_fd;
> > > @@ -199,7 +201,7 @@ static bool ioeventfd_enabled(void)
> > >  static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
> > >  {
> > >      struct vhost_user *u = dev->opaque;
> > > -    CharBackend *chr = u->chr;
> > > +    CharBackend *chr = u->user->chr;
> > >      uint8_t *p = (uint8_t *) msg;
> > >      int r, size = VHOST_USER_HDR_SIZE;
> > >  
> > > @@ -285,7 +287,7 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
> > >                              int *fds, int fd_num)
> > >  {
> > >      struct vhost_user *u = dev->opaque;
> > > -    CharBackend *chr = u->chr;
> > > +    CharBackend *chr = u->user->chr;
> > >      int ret, size = VHOST_USER_HDR_SIZE + msg->hdr.size;
> > >  
> > >      /*
> > > @@ -1044,7 +1046,7 @@ static int vhost_user_postcopy_waker(struct PostCopyFD *pcfd, RAMBlock *rb,
> > >  static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
> > >  {
> > >      struct vhost_user *u = dev->opaque;
> > > -    CharBackend *chr = u->chr;
> > > +    CharBackend *chr = u->user->chr;
> > >      int ufd;
> > >      VhostUserMsg msg = {
> > >          .hdr.request = VHOST_USER_POSTCOPY_ADVISE,
> > > @@ -1182,7 +1184,7 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
> > >      return 0;
> > >  }
> > >  
> > > -static int vhost_user_init(struct vhost_dev *dev, void *opaque)
> > > +static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
> > >  {
> > >      uint64_t features, protocol_features;
> > >      struct vhost_user *u;
> > > @@ -1191,7 +1193,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
> > >      assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
> > >  
> > >      u = g_new0(struct vhost_user, 1);
> > > -    u->chr = opaque;
> > > +    u->user = opaque;
> > >      u->slave_fd = -1;
> > >      u->dev = dev;
> > >      dev->opaque = u;
> > > @@ -1267,7 +1269,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
> > >      return 0;
> > >  }
> > >  
> > > -static int vhost_user_cleanup(struct vhost_dev *dev)
> > > +static int vhost_user_backend_cleanup(struct vhost_dev *dev)
> > >  {
> > >      struct vhost_user *u;
> > >  
> > > @@ -1581,10 +1583,21 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
> > >      return 0;
> > >  }
> > >  
> > > +VhostUserState *vhost_user_init(void)
> > > +{
> > > +    VhostUserState *user = g_new0(struct VhostUserState, 1);
> > > +
> > > +    return user;
> > > +}
> > > +
> > > +void vhost_user_cleanup(VhostUserState *user)
> > > +{
> > > +}
> > > +
> > >  const VhostOps user_ops = {
> > >          .backend_type = VHOST_BACKEND_TYPE_USER,
> > > -        .vhost_backend_init = vhost_user_init,
> > > -        .vhost_backend_cleanup = vhost_user_cleanup,
> > > +        .vhost_backend_init = vhost_user_backend_init,
> > > +        .vhost_backend_cleanup = vhost_user_backend_cleanup,
> > >          .vhost_backend_memslots_limit = vhost_user_memslots_limit,
> > >          .vhost_set_log_base = vhost_user_set_log_base,
> > >          .vhost_set_mem_table = vhost_user_set_mem_table,
> > > diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
> > > index 5804cc904a..f1258ae545 100644
> > > --- a/include/hw/virtio/vhost-user-blk.h
> > > +++ b/include/hw/virtio/vhost-user-blk.h
> > > @@ -21,6 +21,7 @@
> > >  #include "hw/block/block.h"
> > >  #include "chardev/char-fe.h"
> > >  #include "hw/virtio/vhost.h"
> > > +#include "hw/virtio/vhost-user.h"
> > >  
> > >  #define TYPE_VHOST_USER_BLK "vhost-user-blk"
> > >  #define VHOST_USER_BLK(obj) \
> > > @@ -36,6 +37,7 @@ typedef struct VHostUserBlk {
> > >      uint32_t config_wce;
> > >      uint32_t config_ro;
> > >      struct vhost_dev dev;
> > > +    VhostUserState *vhost_user;
> > >  } VHostUserBlk;
> > >  
> > >  #endif
> > > diff --git a/include/hw/virtio/vhost-user-scsi.h b/include/hw/virtio/vhost-user-scsi.h
> > > index 01861f78d0..3ec34ae867 100644
> > > --- a/include/hw/virtio/vhost-user-scsi.h
> > > +++ b/include/hw/virtio/vhost-user-scsi.h
> > > @@ -21,6 +21,7 @@
> > >  #include "hw/qdev.h"
> > >  #include "hw/virtio/virtio-scsi.h"
> > >  #include "hw/virtio/vhost.h"
> > > +#include "hw/virtio/vhost-user.h"
> > >  #include "hw/virtio/vhost-scsi-common.h"
> > >  
> > >  #define TYPE_VHOST_USER_SCSI "vhost-user-scsi"
> > > @@ -30,6 +31,7 @@
> > >  typedef struct VHostUserSCSI {
> > >      VHostSCSICommon parent_obj;
> > >      uint64_t host_features;
> > > +    VhostUserState *vhost_user;
> > >  } VHostUserSCSI;
> > >  
> > >  #endif /* VHOST_USER_SCSI_H */
> > > diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> > > new file mode 100644
> > > index 0000000000..eb8bc0d90d
> > > --- /dev/null
> > > +++ b/include/hw/virtio/vhost-user.h
> > > @@ -0,0 +1,20 @@
> > > +/*
> > > + * Copyright (c) 2017-2018 Intel Corporation
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2.
> > > + * See the COPYING file in the top-level directory.
> > > + */
> > > +
> > > +#ifndef HW_VIRTIO_VHOST_USER_H
> > > +#define HW_VIRTIO_VHOST_USER_H
> > > +
> > > +#include "chardev/char-fe.h"
> > > +
> > > +typedef struct VhostUserState {
> > > +    CharBackend *chr;
> > > +} VhostUserState;
> > > +
> > > +VhostUserState *vhost_user_init(void);
> > > +void vhost_user_cleanup(VhostUserState *user);
> > > +
> > > +#endif
> > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > index fa28aad12d..525a061acf 100644
> > > --- a/net/vhost-user.c
> > > +++ b/net/vhost-user.c
> > > @@ -12,6 +12,7 @@
> > >  #include "clients.h"
> > >  #include "net/vhost_net.h"
> > >  #include "net/vhost-user.h"
> > > +#include "hw/virtio/vhost-user.h"
> > >  #include "chardev/char-fe.h"
> > >  #include "qapi/error.h"
> > >  #include "qapi/qapi-commands-net.h"
> > > @@ -23,6 +24,7 @@
> > >  typedef struct NetVhostUserState {
> > >      NetClientState nc;
> > >      CharBackend chr; /* only queue index 0 */
> > > +    VhostUserState *vhost_user;
> > >      VHostNetState *vhost_net;
> > >      guint watch;
> > >      uint64_t acked_features;
> > > @@ -64,7 +66,8 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
> > >      }
> > >  }
> > >  
> > > -static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
> > > +static int vhost_user_start(int queues, NetClientState *ncs[],
> > > +                            VhostUserState *be)
> > >  {
> > >      VhostNetOptions options;
> > >      struct vhost_net *net = NULL;
> > > @@ -144,7 +147,7 @@ static ssize_t vhost_user_receive(NetClientState *nc, const uint8_t *buf,
> > >      return size;
> > >  }
> > >  
> > > -static void vhost_user_cleanup(NetClientState *nc)
> > > +static void net_vhost_user_cleanup(NetClientState *nc)
> > >  {
> > >      NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
> > >  
> > > @@ -153,6 +156,11 @@ static void vhost_user_cleanup(NetClientState *nc)
> > >          g_free(s->vhost_net);
> > >          s->vhost_net = NULL;
> > >      }
> > > +    if (s->vhost_user) {
> > > +        vhost_user_cleanup(s->vhost_user);
> > > +        g_free(s->vhost_user);
> > > +        s->vhost_user = NULL;
> > > +    }
> > >      if (nc->queue_index == 0) {
> > >          if (s->watch) {
> > >              g_source_remove(s->watch);
> > > @@ -182,7 +190,7 @@ static NetClientInfo net_vhost_user_info = {
> > >          .type = NET_CLIENT_DRIVER_VHOST_USER,
> > >          .size = sizeof(NetVhostUserState),
> > >          .receive = vhost_user_receive,
> > > -        .cleanup = vhost_user_cleanup,
> > > +        .cleanup = net_vhost_user_cleanup,
> > >          .has_vnet_hdr = vhost_user_has_vnet_hdr,
> > >          .has_ufo = vhost_user_has_ufo,
> > >  };
> > > @@ -244,7 +252,7 @@ static void net_vhost_user_event(void *opaque, int event)
> > >      trace_vhost_user_event(chr->label, event);
> > >      switch (event) {
> > >      case CHR_EVENT_OPENED:
> > > -        if (vhost_user_start(queues, ncs, &s->chr) < 0) {
> > > +        if (vhost_user_start(queues, ncs, s->vhost_user) < 0) {
> > >              qemu_chr_fe_disconnect(&s->chr);
> > >              return;
> > >          }
> > > @@ -283,12 +291,19 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > >  {
> > >      Error *err = NULL;
> > >      NetClientState *nc, *nc0 = NULL;
> > > +    VhostUserState *user = NULL;
> > >      NetVhostUserState *s;
> > >      int i;
> > >  
> > >      assert(name);
> > >      assert(queues > 0);
> > >  
> > > +    user = vhost_user_init();
> > > +    if (!user) {
> > > +        error_report("failed to init vhost_user");
> > > +        goto err;
> > > +    }
> > > +
> > >      for (i = 0; i < queues; i++) {
> > >          nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
> > >          snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to %s",
> > > @@ -299,17 +314,19 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > >              s = DO_UPCAST(NetVhostUserState, nc, nc);
> > >              if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
> > >                  error_report_err(err);
> > > -                return -1;
> > > +                goto err;
> > >              }
> > > +            user->chr = &s->chr;
> > >          }
> > > -
> > > +        s = DO_UPCAST(NetVhostUserState, nc, nc);
> > > +        s->vhost_user = user;
> > >      }
> > >  
> > >      s = DO_UPCAST(NetVhostUserState, nc, nc0);
> > >      do {
> > >          if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
> > >              error_report_err(err);
> > > -            return -1;
> > > +            goto err;
> > >          }
> > >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > >      assert(s->vhost_net);
> > >  
> > >      return 0;
> > > +
> > > +err:
> > > +    if (user) {
> > > +        vhost_user_cleanup(user);
> > > +        g_free(user);
> > > +        s->vhost_user = NULL;
> > > +    }
> > > +
> > > +    return -1;
> > >  }
> > >  
> > >  static Chardev *net_vhost_claim_chardev(
> > > -- 
> > > 2.11.0
> 
> So far I figured out that commenting the free of
> the structure removes the crash, so we seem to
> be dealing with a use-after free here.
> I suspect that in a MQ situation, one queue gets
> closed and attempts to free the structure
> while others still use it.
> 
> diff --git a/net/vhost-user.c b/net/vhost-user.c
> index 525a061..6a1573b 100644
> --- a/net/vhost-user.c
> +++ b/net/vhost-user.c
> @@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
>          s->vhost_net = NULL;
>      }
>      if (s->vhost_user) {
> -        vhost_user_cleanup(s->vhost_user);
> -        g_free(s->vhost_user);
> +        //vhost_user_cleanup(s->vhost_user);
> +        //g_free(s->vhost_user);
>          s->vhost_user = NULL;
>      }
>      if (nc->queue_index == 0) {
> @@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
>  
>  err:
>      if (user) {
> -        vhost_user_cleanup(user);
> -        g_free(user);
> +        //vhost_user_cleanup(user);
> +        //g_free(user);
>          s->vhost_user = NULL;
>      }
>  


So the following at least gets rid of the crashes.
I am not sure it does not leak memory though,
and not sure there aren't any configurations where
the 1st queue gets cleaned up first.

Thoughts?

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 525a061..7549d25 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -156,19 +156,20 @@ static void net_vhost_user_cleanup(NetClientState *nc)
         g_free(s->vhost_net);
         s->vhost_net = NULL;
     }
-    if (s->vhost_user) {
-        vhost_user_cleanup(s->vhost_user);
-        g_free(s->vhost_user);
-        s->vhost_user = NULL;
-    }
     if (nc->queue_index == 0) {
         if (s->watch) {
             g_source_remove(s->watch);
             s->watch = 0;
         }
         qemu_chr_fe_deinit(&s->chr, true);
+        if (s->vhost_user) {
+            vhost_user_cleanup(s->vhost_user);
+            g_free(s->vhost_user);
+        }
     }
 
+    s->vhost_user = NULL;
+
     qemu_purge_queued_packets(nc);
 }
 
@@ -341,7 +342,6 @@ err:
     if (user) {
         vhost_user_cleanup(user);
         g_free(user);
-        s->vhost_user = NULL;
     }
 
     return -1;

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/6] vhost-user: support receiving file descriptors in slave_read
  2018-04-12 15:12   ` [virtio-dev] " Tiwei Bie
@ 2018-05-23 21:25     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-05-23 21:25 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Apr 12, 2018 at 11:12:29PM +0800, Tiwei Bie wrote:
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>


Thinking about it, I think we should add a protocol
feature for this. This way remote can find out whether
it's safe to send this data to us.

> ---
>  hw/virtio/vhost-user.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 91edd95453..9cea2c8c51 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -854,14 +854,44 @@ static void slave_read(void *opaque)
>      VhostUserHeader hdr = { 0, };
>      VhostUserPayload payload = { 0, };
>      int size, ret = 0;
> +    struct iovec iov;
> +    struct msghdr msgh;
> +    int fd = -1;
> +    char control[CMSG_SPACE(sizeof(fd))];
> +    struct cmsghdr *cmsg;
> +    size_t fdsize;
> +
> +    memset(&msgh, 0, sizeof(msgh));
> +    msgh.msg_iov = &iov;
> +    msgh.msg_iovlen = 1;
> +    msgh.msg_control = control;
> +    msgh.msg_controllen = sizeof(control);
>  
>      /* Read header */
> -    size = read(u->slave_fd, &hdr, VHOST_USER_HDR_SIZE);
> +    iov.iov_base = &hdr;
> +    iov.iov_len = VHOST_USER_HDR_SIZE;
> +
> +    size = recvmsg(u->slave_fd, &msgh, 0);
>      if (size != VHOST_USER_HDR_SIZE) {
>          error_report("Failed to read from slave.");
>          goto err;
>      }
>  
> +    if (msgh.msg_flags & MSG_CTRUNC) {
> +        error_report("Truncated message.");
> +        goto err;
> +    }
> +
> +    for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
> +         cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
> +            if (cmsg->cmsg_level == SOL_SOCKET &&
> +                cmsg->cmsg_type == SCM_RIGHTS) {
> +                    fdsize = cmsg->cmsg_len - CMSG_LEN(0);
> +                    memcpy(&fd, CMSG_DATA(cmsg), fdsize);
> +                    break;
> +            }
> +    }
> +
>      if (hdr.size > VHOST_USER_PAYLOAD_SIZE) {
>          error_report("Failed to read msg header."
>                  " Size %d exceeds the maximum %zu.", hdr.size,
> @@ -885,9 +915,15 @@ static void slave_read(void *opaque)
>          break;
>      default:
>          error_report("Received unexpected msg type.");
> +        if (fd != -1) {
> +            close(fd);
> +        }
>          ret = -EINVAL;
>      }
>  
> +    /* Message handlers need to make sure that fd will be consumed. */
> +    fd = -1;
> +
>      /*
>       * REPLY_ACK feature handling. Other reply types has to be managed
>       * directly in their request handlers.
> @@ -920,6 +956,9 @@ err:
>      qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
>      close(u->slave_fd);
>      u->slave_fd = -1;
> +    if (fd != -1) {
> +        close(fd);
> +    }
>      return;
>  }
>  
> -- 
> 2.11.0

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 3/6] vhost-user: support receiving file descriptors in slave_read
@ 2018-05-23 21:25     ` Michael S. Tsirkin
  0 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-05-23 21:25 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, Apr 12, 2018 at 11:12:29PM +0800, Tiwei Bie wrote:
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>


Thinking about it, I think we should add a protocol
feature for this. This way remote can find out whether
it's safe to send this data to us.

> ---
>  hw/virtio/vhost-user.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 91edd95453..9cea2c8c51 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -854,14 +854,44 @@ static void slave_read(void *opaque)
>      VhostUserHeader hdr = { 0, };
>      VhostUserPayload payload = { 0, };
>      int size, ret = 0;
> +    struct iovec iov;
> +    struct msghdr msgh;
> +    int fd = -1;
> +    char control[CMSG_SPACE(sizeof(fd))];
> +    struct cmsghdr *cmsg;
> +    size_t fdsize;
> +
> +    memset(&msgh, 0, sizeof(msgh));
> +    msgh.msg_iov = &iov;
> +    msgh.msg_iovlen = 1;
> +    msgh.msg_control = control;
> +    msgh.msg_controllen = sizeof(control);
>  
>      /* Read header */
> -    size = read(u->slave_fd, &hdr, VHOST_USER_HDR_SIZE);
> +    iov.iov_base = &hdr;
> +    iov.iov_len = VHOST_USER_HDR_SIZE;
> +
> +    size = recvmsg(u->slave_fd, &msgh, 0);
>      if (size != VHOST_USER_HDR_SIZE) {
>          error_report("Failed to read from slave.");
>          goto err;
>      }
>  
> +    if (msgh.msg_flags & MSG_CTRUNC) {
> +        error_report("Truncated message.");
> +        goto err;
> +    }
> +
> +    for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
> +         cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
> +            if (cmsg->cmsg_level == SOL_SOCKET &&
> +                cmsg->cmsg_type == SCM_RIGHTS) {
> +                    fdsize = cmsg->cmsg_len - CMSG_LEN(0);
> +                    memcpy(&fd, CMSG_DATA(cmsg), fdsize);
> +                    break;
> +            }
> +    }
> +
>      if (hdr.size > VHOST_USER_PAYLOAD_SIZE) {
>          error_report("Failed to read msg header."
>                  " Size %d exceeds the maximum %zu.", hdr.size,
> @@ -885,9 +915,15 @@ static void slave_read(void *opaque)
>          break;
>      default:
>          error_report("Received unexpected msg type.");
> +        if (fd != -1) {
> +            close(fd);
> +        }
>          ret = -EINVAL;
>      }
>  
> +    /* Message handlers need to make sure that fd will be consumed. */
> +    fd = -1;
> +
>      /*
>       * REPLY_ACK feature handling. Other reply types has to be managed
>       * directly in their request handlers.
> @@ -920,6 +956,9 @@ err:
>      qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
>      close(u->slave_fd);
>      u->slave_fd = -1;
> +    if (fd != -1) {
> +        close(fd);
> +    }
>      return;
>  }
>  
> -- 
> 2.11.0

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 3/6] vhost-user: support receiving file descriptors in slave_read
  2018-05-23 21:25     ` [virtio-dev] " Michael S. Tsirkin
@ 2018-05-23 23:12       ` Tiwei Bie
  -1 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-23 23:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 12:25:23AM +0300, Michael S. Tsirkin wrote:
> On Thu, Apr 12, 2018 at 11:12:29PM +0800, Tiwei Bie wrote:
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> 
> 
> Thinking about it, I think we should add a protocol
> feature for this. This way remote can find out whether
> it's safe to send this data to us.

Okay, I can add a protocol feature for this.
Do you think it's OK to keep this patch as is
(this patch just extends slave_read() to support
receiving file descriptors) and introduce the
new protocol feature in a new patch to allow
backends to send file descriptors?

Best regards,
Tiwei Bie

> 
> > ---
> >  hw/virtio/vhost-user.c | 41 ++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 40 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 91edd95453..9cea2c8c51 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -854,14 +854,44 @@ static void slave_read(void *opaque)
> >      VhostUserHeader hdr = { 0, };
> >      VhostUserPayload payload = { 0, };
> >      int size, ret = 0;
> > +    struct iovec iov;
> > +    struct msghdr msgh;
> > +    int fd = -1;
> > +    char control[CMSG_SPACE(sizeof(fd))];
> > +    struct cmsghdr *cmsg;
> > +    size_t fdsize;
> > +
> > +    memset(&msgh, 0, sizeof(msgh));
> > +    msgh.msg_iov = &iov;
> > +    msgh.msg_iovlen = 1;
> > +    msgh.msg_control = control;
> > +    msgh.msg_controllen = sizeof(control);
> >  
> >      /* Read header */
> > -    size = read(u->slave_fd, &hdr, VHOST_USER_HDR_SIZE);
> > +    iov.iov_base = &hdr;
> > +    iov.iov_len = VHOST_USER_HDR_SIZE;
> > +
> > +    size = recvmsg(u->slave_fd, &msgh, 0);
> >      if (size != VHOST_USER_HDR_SIZE) {
> >          error_report("Failed to read from slave.");
> >          goto err;
> >      }
> >  
> > +    if (msgh.msg_flags & MSG_CTRUNC) {
> > +        error_report("Truncated message.");
> > +        goto err;
> > +    }
> > +
> > +    for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
> > +         cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
> > +            if (cmsg->cmsg_level == SOL_SOCKET &&
> > +                cmsg->cmsg_type == SCM_RIGHTS) {
> > +                    fdsize = cmsg->cmsg_len - CMSG_LEN(0);
> > +                    memcpy(&fd, CMSG_DATA(cmsg), fdsize);
> > +                    break;
> > +            }
> > +    }
> > +
> >      if (hdr.size > VHOST_USER_PAYLOAD_SIZE) {
> >          error_report("Failed to read msg header."
> >                  " Size %d exceeds the maximum %zu.", hdr.size,
> > @@ -885,9 +915,15 @@ static void slave_read(void *opaque)
> >          break;
> >      default:
> >          error_report("Received unexpected msg type.");
> > +        if (fd != -1) {
> > +            close(fd);
> > +        }
> >          ret = -EINVAL;
> >      }
> >  
> > +    /* Message handlers need to make sure that fd will be consumed. */
> > +    fd = -1;
> > +
> >      /*
> >       * REPLY_ACK feature handling. Other reply types has to be managed
> >       * directly in their request handlers.
> > @@ -920,6 +956,9 @@ err:
> >      qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
> >      close(u->slave_fd);
> >      u->slave_fd = -1;
> > +    if (fd != -1) {
> > +        close(fd);
> > +    }
> >      return;
> >  }
> >  
> > -- 
> > 2.11.0
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [virtio-dev] Re: [PATCH v3 3/6] vhost-user: support receiving file descriptors in slave_read
@ 2018-05-23 23:12       ` Tiwei Bie
  0 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-23 23:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 12:25:23AM +0300, Michael S. Tsirkin wrote:
> On Thu, Apr 12, 2018 at 11:12:29PM +0800, Tiwei Bie wrote:
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> 
> 
> Thinking about it, I think we should add a protocol
> feature for this. This way remote can find out whether
> it's safe to send this data to us.

Okay, I can add a protocol feature for this.
Do you think it's OK to keep this patch as is
(this patch just extends slave_read() to support
receiving file descriptors) and introduce the
new protocol feature in a new patch to allow
backends to send file descriptors?

Best regards,
Tiwei Bie

> 
> > ---
> >  hw/virtio/vhost-user.c | 41 ++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 40 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 91edd95453..9cea2c8c51 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -854,14 +854,44 @@ static void slave_read(void *opaque)
> >      VhostUserHeader hdr = { 0, };
> >      VhostUserPayload payload = { 0, };
> >      int size, ret = 0;
> > +    struct iovec iov;
> > +    struct msghdr msgh;
> > +    int fd = -1;
> > +    char control[CMSG_SPACE(sizeof(fd))];
> > +    struct cmsghdr *cmsg;
> > +    size_t fdsize;
> > +
> > +    memset(&msgh, 0, sizeof(msgh));
> > +    msgh.msg_iov = &iov;
> > +    msgh.msg_iovlen = 1;
> > +    msgh.msg_control = control;
> > +    msgh.msg_controllen = sizeof(control);
> >  
> >      /* Read header */
> > -    size = read(u->slave_fd, &hdr, VHOST_USER_HDR_SIZE);
> > +    iov.iov_base = &hdr;
> > +    iov.iov_len = VHOST_USER_HDR_SIZE;
> > +
> > +    size = recvmsg(u->slave_fd, &msgh, 0);
> >      if (size != VHOST_USER_HDR_SIZE) {
> >          error_report("Failed to read from slave.");
> >          goto err;
> >      }
> >  
> > +    if (msgh.msg_flags & MSG_CTRUNC) {
> > +        error_report("Truncated message.");
> > +        goto err;
> > +    }
> > +
> > +    for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
> > +         cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
> > +            if (cmsg->cmsg_level == SOL_SOCKET &&
> > +                cmsg->cmsg_type == SCM_RIGHTS) {
> > +                    fdsize = cmsg->cmsg_len - CMSG_LEN(0);
> > +                    memcpy(&fd, CMSG_DATA(cmsg), fdsize);
> > +                    break;
> > +            }
> > +    }
> > +
> >      if (hdr.size > VHOST_USER_PAYLOAD_SIZE) {
> >          error_report("Failed to read msg header."
> >                  " Size %d exceeds the maximum %zu.", hdr.size,
> > @@ -885,9 +915,15 @@ static void slave_read(void *opaque)
> >          break;
> >      default:
> >          error_report("Received unexpected msg type.");
> > +        if (fd != -1) {
> > +            close(fd);
> > +        }
> >          ret = -EINVAL;
> >      }
> >  
> > +    /* Message handlers need to make sure that fd will be consumed. */
> > +    fd = -1;
> > +
> >      /*
> >       * REPLY_ACK feature handling. Other reply types has to be managed
> >       * directly in their request handlers.
> > @@ -920,6 +956,9 @@ err:
> >      qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
> >      close(u->slave_fd);
> >      u->slave_fd = -1;
> > +    if (fd != -1) {
> > +        close(fd);
> > +    }
> >      return;
> >  }
> >  
> > -- 
> > 2.11.0
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
  2018-05-23 15:43         ` [virtio-dev] " Michael S. Tsirkin
@ 2018-05-23 23:21           ` Tiwei Bie
  -1 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-23 23:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Wed, May 23, 2018 at 06:43:29PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 23, 2018 at 06:36:05PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> > > On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > > > When multi queue is enabled e.g. for a virtio-net device,
> > > > each queue pair will have a vhost_dev, and the only thing
> > > > shared between vhost devs currently is the chardev. This
> > > > patch introduces a vhost-user state structure which will
> > > > be shared by all vhost devs of the same virtio device.
> > > > 
> > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > 
> > > Unfortunately this patch seems to cause crashes.
> > > To reproduce, simply run
> > > make check-qtest-x86_64
> > > 
> > > Sorry that it took me a while to find - it triggers 90% of runs but not
> > > 100% which complicates bisection somewhat.

It's my fault to not notice this bug before.
I'm very sorry. Thank you so much for finding
the root cause!

> > > 
> > > > ---
> > > >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> > > >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> > > >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> > > >  hw/virtio/Makefile.objs             |  2 +-
> > > >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> > > >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> > > >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> > > >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> > > >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> > > >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> > > >  10 files changed, 149 insertions(+), 20 deletions(-)
> > > >  create mode 100644 include/hw/virtio/vhost-user.h
[...]
> > > >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > > >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > > > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > >      assert(s->vhost_net);
> > > >  
> > > >      return 0;
> > > > +
> > > > +err:
> > > > +    if (user) {
> > > > +        vhost_user_cleanup(user);
> > > > +        g_free(user);
> > > > +        s->vhost_user = NULL;
> > > > +    }
> > > > +
> > > > +    return -1;
> > > >  }
> > > >  
> > > >  static Chardev *net_vhost_claim_chardev(
> > > > -- 
> > > > 2.11.0
> > 
> > So far I figured out that commenting the free of
> > the structure removes the crash, so we seem to
> > be dealing with a use-after free here.
> > I suspect that in a MQ situation, one queue gets
> > closed and attempts to free the structure
> > while others still use it.
> > 
> > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > index 525a061..6a1573b 100644
> > --- a/net/vhost-user.c
> > +++ b/net/vhost-user.c
> > @@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> >          s->vhost_net = NULL;
> >      }
> >      if (s->vhost_user) {
> > -        vhost_user_cleanup(s->vhost_user);
> > -        g_free(s->vhost_user);
> > +        //vhost_user_cleanup(s->vhost_user);
> > +        //g_free(s->vhost_user);
> >          s->vhost_user = NULL;
> >      }
> >      if (nc->queue_index == 0) {
> > @@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> >  
> >  err:
> >      if (user) {
> > -        vhost_user_cleanup(user);
> > -        g_free(user);
> > +        //vhost_user_cleanup(user);
> > +        //g_free(user);
> >          s->vhost_user = NULL;
> >      }
> >  
> 
> 
> So the following at least gets rid of the crashes.
> I am not sure it does not leak memory though,
> and not sure there aren't any configurations where
> the 1st queue gets cleaned up first.
> 
> Thoughts?

Thank you so much for catching it and fixing
it! I'll keep your SoB there. Thank you so
much! I do appreciate it!

You are right. This structure is freed multiple
times when multi-queue is enabled.

I think it's safe to let the first queue pair
free the vhost-user structure, because it won't
be touched by other queue pairs during cleanup.

Best regards,
Tiwei Bie


> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> ---
> 
> diff --git a/net/vhost-user.c b/net/vhost-user.c
> index 525a061..7549d25 100644
> --- a/net/vhost-user.c
> +++ b/net/vhost-user.c
> @@ -156,19 +156,20 @@ static void net_vhost_user_cleanup(NetClientState *nc)
>          g_free(s->vhost_net);
>          s->vhost_net = NULL;
>      }
> -    if (s->vhost_user) {
> -        vhost_user_cleanup(s->vhost_user);
> -        g_free(s->vhost_user);
> -        s->vhost_user = NULL;
> -    }
>      if (nc->queue_index == 0) {
>          if (s->watch) {
>              g_source_remove(s->watch);
>              s->watch = 0;
>          }
>          qemu_chr_fe_deinit(&s->chr, true);
> +        if (s->vhost_user) {
> +            vhost_user_cleanup(s->vhost_user);
> +            g_free(s->vhost_user);
> +        }
>      }
>  
> +    s->vhost_user = NULL;
> +
>      qemu_purge_queued_packets(nc);
>  }
>  
> @@ -341,7 +342,6 @@ err:
>      if (user) {
>          vhost_user_cleanup(user);
>          g_free(user);
> -        s->vhost_user = NULL;
>      }
>  
>      return -1;

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
@ 2018-05-23 23:21           ` Tiwei Bie
  0 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-23 23:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Wed, May 23, 2018 at 06:43:29PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 23, 2018 at 06:36:05PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> > > On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > > > When multi queue is enabled e.g. for a virtio-net device,
> > > > each queue pair will have a vhost_dev, and the only thing
> > > > shared between vhost devs currently is the chardev. This
> > > > patch introduces a vhost-user state structure which will
> > > > be shared by all vhost devs of the same virtio device.
> > > > 
> > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > 
> > > Unfortunately this patch seems to cause crashes.
> > > To reproduce, simply run
> > > make check-qtest-x86_64
> > > 
> > > Sorry that it took me a while to find - it triggers 90% of runs but not
> > > 100% which complicates bisection somewhat.

It's my fault to not notice this bug before.
I'm very sorry. Thank you so much for finding
the root cause!

> > > 
> > > > ---
> > > >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> > > >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> > > >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> > > >  hw/virtio/Makefile.objs             |  2 +-
> > > >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> > > >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> > > >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> > > >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> > > >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> > > >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> > > >  10 files changed, 149 insertions(+), 20 deletions(-)
> > > >  create mode 100644 include/hw/virtio/vhost-user.h
[...]
> > > >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > > >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > > > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > >      assert(s->vhost_net);
> > > >  
> > > >      return 0;
> > > > +
> > > > +err:
> > > > +    if (user) {
> > > > +        vhost_user_cleanup(user);
> > > > +        g_free(user);
> > > > +        s->vhost_user = NULL;
> > > > +    }
> > > > +
> > > > +    return -1;
> > > >  }
> > > >  
> > > >  static Chardev *net_vhost_claim_chardev(
> > > > -- 
> > > > 2.11.0
> > 
> > So far I figured out that commenting the free of
> > the structure removes the crash, so we seem to
> > be dealing with a use-after free here.
> > I suspect that in a MQ situation, one queue gets
> > closed and attempts to free the structure
> > while others still use it.
> > 
> > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > index 525a061..6a1573b 100644
> > --- a/net/vhost-user.c
> > +++ b/net/vhost-user.c
> > @@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> >          s->vhost_net = NULL;
> >      }
> >      if (s->vhost_user) {
> > -        vhost_user_cleanup(s->vhost_user);
> > -        g_free(s->vhost_user);
> > +        //vhost_user_cleanup(s->vhost_user);
> > +        //g_free(s->vhost_user);
> >          s->vhost_user = NULL;
> >      }
> >      if (nc->queue_index == 0) {
> > @@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> >  
> >  err:
> >      if (user) {
> > -        vhost_user_cleanup(user);
> > -        g_free(user);
> > +        //vhost_user_cleanup(user);
> > +        //g_free(user);
> >          s->vhost_user = NULL;
> >      }
> >  
> 
> 
> So the following at least gets rid of the crashes.
> I am not sure it does not leak memory though,
> and not sure there aren't any configurations where
> the 1st queue gets cleaned up first.
> 
> Thoughts?

Thank you so much for catching it and fixing
it! I'll keep your SoB there. Thank you so
much! I do appreciate it!

You are right. This structure is freed multiple
times when multi-queue is enabled.

I think it's safe to let the first queue pair
free the vhost-user structure, because it won't
be touched by other queue pairs during cleanup.

Best regards,
Tiwei Bie


> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> ---
> 
> diff --git a/net/vhost-user.c b/net/vhost-user.c
> index 525a061..7549d25 100644
> --- a/net/vhost-user.c
> +++ b/net/vhost-user.c
> @@ -156,19 +156,20 @@ static void net_vhost_user_cleanup(NetClientState *nc)
>          g_free(s->vhost_net);
>          s->vhost_net = NULL;
>      }
> -    if (s->vhost_user) {
> -        vhost_user_cleanup(s->vhost_user);
> -        g_free(s->vhost_user);
> -        s->vhost_user = NULL;
> -    }
>      if (nc->queue_index == 0) {
>          if (s->watch) {
>              g_source_remove(s->watch);
>              s->watch = 0;
>          }
>          qemu_chr_fe_deinit(&s->chr, true);
> +        if (s->vhost_user) {
> +            vhost_user_cleanup(s->vhost_user);
> +            g_free(s->vhost_user);
> +        }
>      }
>  
> +    s->vhost_user = NULL;
> +
>      qemu_purge_queued_packets(nc);
>  }
>  
> @@ -341,7 +342,6 @@ err:
>      if (user) {
>          vhost_user_cleanup(user);
>          g_free(user);
> -        s->vhost_user = NULL;
>      }
>  
>      return -1;

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
  2018-05-23 23:21           ` [virtio-dev] " Tiwei Bie
@ 2018-05-24  2:24             ` Tiwei Bie
  -1 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-24  2:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 07:21:01AM +0800, Tiwei Bie wrote:
> On Wed, May 23, 2018 at 06:43:29PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 23, 2018 at 06:36:05PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> > > > On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > > > > When multi queue is enabled e.g. for a virtio-net device,
> > > > > each queue pair will have a vhost_dev, and the only thing
> > > > > shared between vhost devs currently is the chardev. This
> > > > > patch introduces a vhost-user state structure which will
> > > > > be shared by all vhost devs of the same virtio device.
> > > > > 
> > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > 
> > > > Unfortunately this patch seems to cause crashes.
> > > > To reproduce, simply run
> > > > make check-qtest-x86_64
> > > > 
> > > > Sorry that it took me a while to find - it triggers 90% of runs but not
> > > > 100% which complicates bisection somewhat.
> 
> It's my fault to not notice this bug before.
> I'm very sorry. Thank you so much for finding
> the root cause!
> 
> > > > 
> > > > > ---
> > > > >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> > > > >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> > > > >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> > > > >  hw/virtio/Makefile.objs             |  2 +-
> > > > >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> > > > >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> > > > >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> > > > >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> > > > >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> > > > >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> > > > >  10 files changed, 149 insertions(+), 20 deletions(-)
> > > > >  create mode 100644 include/hw/virtio/vhost-user.h
> [...]
> > > > >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > > > >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > > > > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > >      assert(s->vhost_net);
> > > > >  
> > > > >      return 0;
> > > > > +
> > > > > +err:
> > > > > +    if (user) {
> > > > > +        vhost_user_cleanup(user);
> > > > > +        g_free(user);
> > > > > +        s->vhost_user = NULL;
> > > > > +    }
> > > > > +
> > > > > +    return -1;
> > > > >  }
> > > > >  
> > > > >  static Chardev *net_vhost_claim_chardev(
> > > > > -- 
> > > > > 2.11.0
> > > 
> > > So far I figured out that commenting the free of
> > > the structure removes the crash, so we seem to
> > > be dealing with a use-after free here.
> > > I suspect that in a MQ situation, one queue gets
> > > closed and attempts to free the structure
> > > while others still use it.
> > > 
> > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > index 525a061..6a1573b 100644
> > > --- a/net/vhost-user.c
> > > +++ b/net/vhost-user.c
> > > @@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> > >          s->vhost_net = NULL;
> > >      }
> > >      if (s->vhost_user) {
> > > -        vhost_user_cleanup(s->vhost_user);
> > > -        g_free(s->vhost_user);
> > > +        //vhost_user_cleanup(s->vhost_user);
> > > +        //g_free(s->vhost_user);
> > >          s->vhost_user = NULL;
> > >      }
> > >      if (nc->queue_index == 0) {
> > > @@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > >  
> > >  err:
> > >      if (user) {
> > > -        vhost_user_cleanup(user);
> > > -        g_free(user);
> > > +        //vhost_user_cleanup(user);
> > > +        //g_free(user);
> > >          s->vhost_user = NULL;
> > >      }
> > >  
> > 
> > 
> > So the following at least gets rid of the crashes.
> > I am not sure it does not leak memory though,
> > and not sure there aren't any configurations where
> > the 1st queue gets cleaned up first.
> > 
> > Thoughts?
> 
> Thank you so much for catching it and fixing
> it! I'll keep your SoB there. Thank you so
> much! I do appreciate it!
> 
> You are right. This structure is freed multiple
> times when multi-queue is enabled.

After a deeper digging, I got your point now..
It could be a use-after-free instead of a double
free.. As it's safe to deinit the char which is
shared by all queue pairs when cleanup the 1st
queue pair, it should be safe to free vhost-user
structure there too.

> 
> I think it's safe to let the first queue pair
> free the vhost-user structure, because it won't
> be touched by other queue pairs during cleanup.
> 
> Best regards,
> Tiwei Bie
> 
> 
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > ---
> > 
> > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > index 525a061..7549d25 100644
> > --- a/net/vhost-user.c
> > +++ b/net/vhost-user.c
> > @@ -156,19 +156,20 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> >          g_free(s->vhost_net);
> >          s->vhost_net = NULL;
> >      }
> > -    if (s->vhost_user) {
> > -        vhost_user_cleanup(s->vhost_user);
> > -        g_free(s->vhost_user);
> > -        s->vhost_user = NULL;
> > -    }
> >      if (nc->queue_index == 0) {
> >          if (s->watch) {
> >              g_source_remove(s->watch);
> >              s->watch = 0;
> >          }
> >          qemu_chr_fe_deinit(&s->chr, true);
> > +        if (s->vhost_user) {
> > +            vhost_user_cleanup(s->vhost_user);
> > +            g_free(s->vhost_user);
> > +        }
> >      }
> >  
> > +    s->vhost_user = NULL;

Maybe we should move above line, like:

     if (nc->queue_index == 0) {
         if (s->watch) {
             g_source_remove(s->watch);
             s->watch = 0;
         }
         qemu_chr_fe_deinit(&s->chr, true);
+        if (s->vhost_user) {
+            vhost_user_cleanup(s->vhost_user);
+            g_free(s->vhost_user);
+            s->vhost_user = NULL;
+        }
     }

otherwise s->vhost_user may not be freed.

> > +
> >      qemu_purge_queued_packets(nc);
> >  }
> >  
> > @@ -341,7 +342,6 @@ err:
> >      if (user) {
> >          vhost_user_cleanup(user);
> >          g_free(user);
> > -        s->vhost_user = NULL;

I don't get why cannot zero it in this case.

> >      }
> >  
> >      return -1;

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
@ 2018-05-24  2:24             ` Tiwei Bie
  0 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-24  2:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 07:21:01AM +0800, Tiwei Bie wrote:
> On Wed, May 23, 2018 at 06:43:29PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 23, 2018 at 06:36:05PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> > > > On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > > > > When multi queue is enabled e.g. for a virtio-net device,
> > > > > each queue pair will have a vhost_dev, and the only thing
> > > > > shared between vhost devs currently is the chardev. This
> > > > > patch introduces a vhost-user state structure which will
> > > > > be shared by all vhost devs of the same virtio device.
> > > > > 
> > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > 
> > > > Unfortunately this patch seems to cause crashes.
> > > > To reproduce, simply run
> > > > make check-qtest-x86_64
> > > > 
> > > > Sorry that it took me a while to find - it triggers 90% of runs but not
> > > > 100% which complicates bisection somewhat.
> 
> It's my fault to not notice this bug before.
> I'm very sorry. Thank you so much for finding
> the root cause!
> 
> > > > 
> > > > > ---
> > > > >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> > > > >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> > > > >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> > > > >  hw/virtio/Makefile.objs             |  2 +-
> > > > >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> > > > >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> > > > >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> > > > >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> > > > >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> > > > >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> > > > >  10 files changed, 149 insertions(+), 20 deletions(-)
> > > > >  create mode 100644 include/hw/virtio/vhost-user.h
> [...]
> > > > >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > > > >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > > > > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > >      assert(s->vhost_net);
> > > > >  
> > > > >      return 0;
> > > > > +
> > > > > +err:
> > > > > +    if (user) {
> > > > > +        vhost_user_cleanup(user);
> > > > > +        g_free(user);
> > > > > +        s->vhost_user = NULL;
> > > > > +    }
> > > > > +
> > > > > +    return -1;
> > > > >  }
> > > > >  
> > > > >  static Chardev *net_vhost_claim_chardev(
> > > > > -- 
> > > > > 2.11.0
> > > 
> > > So far I figured out that commenting the free of
> > > the structure removes the crash, so we seem to
> > > be dealing with a use-after free here.
> > > I suspect that in a MQ situation, one queue gets
> > > closed and attempts to free the structure
> > > while others still use it.
> > > 
> > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > index 525a061..6a1573b 100644
> > > --- a/net/vhost-user.c
> > > +++ b/net/vhost-user.c
> > > @@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> > >          s->vhost_net = NULL;
> > >      }
> > >      if (s->vhost_user) {
> > > -        vhost_user_cleanup(s->vhost_user);
> > > -        g_free(s->vhost_user);
> > > +        //vhost_user_cleanup(s->vhost_user);
> > > +        //g_free(s->vhost_user);
> > >          s->vhost_user = NULL;
> > >      }
> > >      if (nc->queue_index == 0) {
> > > @@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > >  
> > >  err:
> > >      if (user) {
> > > -        vhost_user_cleanup(user);
> > > -        g_free(user);
> > > +        //vhost_user_cleanup(user);
> > > +        //g_free(user);
> > >          s->vhost_user = NULL;
> > >      }
> > >  
> > 
> > 
> > So the following at least gets rid of the crashes.
> > I am not sure it does not leak memory though,
> > and not sure there aren't any configurations where
> > the 1st queue gets cleaned up first.
> > 
> > Thoughts?
> 
> Thank you so much for catching it and fixing
> it! I'll keep your SoB there. Thank you so
> much! I do appreciate it!
> 
> You are right. This structure is freed multiple
> times when multi-queue is enabled.

After a deeper digging, I got your point now..
It could be a use-after-free instead of a double
free.. As it's safe to deinit the char which is
shared by all queue pairs when cleanup the 1st
queue pair, it should be safe to free vhost-user
structure there too.

> 
> I think it's safe to let the first queue pair
> free the vhost-user structure, because it won't
> be touched by other queue pairs during cleanup.
> 
> Best regards,
> Tiwei Bie
> 
> 
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > ---
> > 
> > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > index 525a061..7549d25 100644
> > --- a/net/vhost-user.c
> > +++ b/net/vhost-user.c
> > @@ -156,19 +156,20 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> >          g_free(s->vhost_net);
> >          s->vhost_net = NULL;
> >      }
> > -    if (s->vhost_user) {
> > -        vhost_user_cleanup(s->vhost_user);
> > -        g_free(s->vhost_user);
> > -        s->vhost_user = NULL;
> > -    }
> >      if (nc->queue_index == 0) {
> >          if (s->watch) {
> >              g_source_remove(s->watch);
> >              s->watch = 0;
> >          }
> >          qemu_chr_fe_deinit(&s->chr, true);
> > +        if (s->vhost_user) {
> > +            vhost_user_cleanup(s->vhost_user);
> > +            g_free(s->vhost_user);
> > +        }
> >      }
> >  
> > +    s->vhost_user = NULL;

Maybe we should move above line, like:

     if (nc->queue_index == 0) {
         if (s->watch) {
             g_source_remove(s->watch);
             s->watch = 0;
         }
         qemu_chr_fe_deinit(&s->chr, true);
+        if (s->vhost_user) {
+            vhost_user_cleanup(s->vhost_user);
+            g_free(s->vhost_user);
+            s->vhost_user = NULL;
+        }
     }

otherwise s->vhost_user may not be freed.

> > +
> >      qemu_purge_queued_packets(nc);
> >  }
> >  
> > @@ -341,7 +342,6 @@ err:
> >      if (user) {
> >          vhost_user_cleanup(user);
> >          g_free(user);
> > -        s->vhost_user = NULL;

I don't get why cannot zero it in this case.

> >      }
> >  
> >      return -1;

Best regards,
Tiwei Bie

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
  2018-05-24  2:24             ` [virtio-dev] " Tiwei Bie
@ 2018-05-24  7:03               ` Tiwei Bie
  -1 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-24  7:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 10:24:40AM +0800, Tiwei Bie wrote:
[...]
> > > +
> > >      qemu_purge_queued_packets(nc);
> > >  }
> > >  
> > > @@ -341,7 +342,6 @@ err:
> > >      if (user) {
> > >          vhost_user_cleanup(user);
> > >          g_free(user);
> > > -        s->vhost_user = NULL;
> 
> I don't get why cannot zero it in this case.

Hmm.. Please just ignore above comment.
Thanks for catching this bug!

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
@ 2018-05-24  7:03               ` Tiwei Bie
  0 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-24  7:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 10:24:40AM +0800, Tiwei Bie wrote:
[...]
> > > +
> > >      qemu_purge_queued_packets(nc);
> > >  }
> > >  
> > > @@ -341,7 +342,6 @@ err:
> > >      if (user) {
> > >          vhost_user_cleanup(user);
> > >          g_free(user);
> > > -        s->vhost_user = NULL;
> 
> I don't get why cannot zero it in this case.

Hmm.. Please just ignore above comment.
Thanks for catching this bug!

Best regards,
Tiwei Bie

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
  2018-05-24  2:24             ` [virtio-dev] " Tiwei Bie
@ 2018-05-24 10:59               ` Tiwei Bie
  -1 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-24 10:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 10:24:40AM +0800, Tiwei Bie wrote:
> On Thu, May 24, 2018 at 07:21:01AM +0800, Tiwei Bie wrote:
> > On Wed, May 23, 2018 at 06:43:29PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 23, 2018 at 06:36:05PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> > > > > On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > > > > > When multi queue is enabled e.g. for a virtio-net device,
> > > > > > each queue pair will have a vhost_dev, and the only thing
> > > > > > shared between vhost devs currently is the chardev. This
> > > > > > patch introduces a vhost-user state structure which will
> > > > > > be shared by all vhost devs of the same virtio device.
> > > > > > 
> > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > 
> > > > > Unfortunately this patch seems to cause crashes.
> > > > > To reproduce, simply run
> > > > > make check-qtest-x86_64
> > > > > 
> > > > > Sorry that it took me a while to find - it triggers 90% of runs but not
> > > > > 100% which complicates bisection somewhat.
> > 
> > It's my fault to not notice this bug before.
> > I'm very sorry. Thank you so much for finding
> > the root cause!
> > 
> > > > > 
> > > > > > ---
> > > > > >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> > > > > >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> > > > > >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> > > > > >  hw/virtio/Makefile.objs             |  2 +-
> > > > > >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> > > > > >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> > > > > >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> > > > > >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> > > > > >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> > > > > >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> > > > > >  10 files changed, 149 insertions(+), 20 deletions(-)
> > > > > >  create mode 100644 include/hw/virtio/vhost-user.h
> > [...]
> > > > > >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > > > > >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > > > > > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > > >      assert(s->vhost_net);
> > > > > >  
> > > > > >      return 0;
> > > > > > +
> > > > > > +err:
> > > > > > +    if (user) {
> > > > > > +        vhost_user_cleanup(user);
> > > > > > +        g_free(user);
> > > > > > +        s->vhost_user = NULL;
> > > > > > +    }
> > > > > > +
> > > > > > +    return -1;
> > > > > >  }
> > > > > >  
> > > > > >  static Chardev *net_vhost_claim_chardev(
> > > > > > -- 
> > > > > > 2.11.0
> > > > 
> > > > So far I figured out that commenting the free of
> > > > the structure removes the crash, so we seem to
> > > > be dealing with a use-after free here.
> > > > I suspect that in a MQ situation, one queue gets
> > > > closed and attempts to free the structure
> > > > while others still use it.
> > > > 
> > > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > > index 525a061..6a1573b 100644
> > > > --- a/net/vhost-user.c
> > > > +++ b/net/vhost-user.c
> > > > @@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> > > >          s->vhost_net = NULL;
> > > >      }
> > > >      if (s->vhost_user) {
> > > > -        vhost_user_cleanup(s->vhost_user);
> > > > -        g_free(s->vhost_user);
> > > > +        //vhost_user_cleanup(s->vhost_user);
> > > > +        //g_free(s->vhost_user);
> > > >          s->vhost_user = NULL;
> > > >      }
> > > >      if (nc->queue_index == 0) {
> > > > @@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > >  
> > > >  err:
> > > >      if (user) {
> > > > -        vhost_user_cleanup(user);
> > > > -        g_free(user);
> > > > +        //vhost_user_cleanup(user);
> > > > +        //g_free(user);
> > > >          s->vhost_user = NULL;
> > > >      }
> > > >  
> > > 
> > > 
> > > So the following at least gets rid of the crashes.
> > > I am not sure it does not leak memory though,
> > > and not sure there aren't any configurations where
> > > the 1st queue gets cleaned up first.
> > > 
> > > Thoughts?
> > 
> > Thank you so much for catching it and fixing
> > it! I'll keep your SoB there. Thank you so
> > much! I do appreciate it!
> > 
> > You are right. This structure is freed multiple
> > times when multi-queue is enabled.
> 
> After a deeper digging, I got your point now..
> It could be a use-after-free instead of a double
> free.. As it's safe to deinit the char which is
> shared by all queue pairs when cleanup the 1st
> queue pair, it should be safe to free vhost-user
> structure there too.

One thing worried me is that, I can't reproduce
the crash with `make check-qtest-x86_64`. I tried
to run it a lot of times, but the only output I
got each time was:

        CHK version_gen.h
  GTESTER check-qtest-x86_64

I did a quick glance of the `check-qtest-x86_64`
target in the makefile, it seems that the relevant
test is `tests/vhost-user-test`. So I also tried
to run this test directly:

make tests/vhost-user-test
while true; do
  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 ./tests/vhost-user-test
done

And the only output in each loop I got was:

/x86_64/vhost-user/migrate: OK
/x86_64/vhost-user/multiqueue: OK
/x86_64/vhost-user/read-guest-mem/memfd: OK
/x86_64/vhost-user/read-guest-mem/memfile: OK

So I'm still not quite sure what caused the crash
on your side. But it does make more sense to free
the vhost-user structure only when cleanup the
1st queue pair (i.e. where the `chr` is deinit-ed).

I have sent a new patch set with above change:

http://lists.gnu.org/archive/html/qemu-devel/2018-05/msg05508.html
https://patchwork.kernel.org/bundle/btw/vhost-user-host-notifiers/

Because the above change is got from your diff
and also based on your analysis, I kept your SoB
in that patch (if you have any concern about it,
please let me know).

In this patch set, I also introduced a protocol
feature to allow slave to send fds to master via
the slave channel.

If you still see crashes with the new patch set,
please provide me a bit more details, e.g. the
crash message. Thanks a lot!

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
@ 2018-05-24 10:59               ` Tiwei Bie
  0 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-24 10:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 10:24:40AM +0800, Tiwei Bie wrote:
> On Thu, May 24, 2018 at 07:21:01AM +0800, Tiwei Bie wrote:
> > On Wed, May 23, 2018 at 06:43:29PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 23, 2018 at 06:36:05PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> > > > > On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > > > > > When multi queue is enabled e.g. for a virtio-net device,
> > > > > > each queue pair will have a vhost_dev, and the only thing
> > > > > > shared between vhost devs currently is the chardev. This
> > > > > > patch introduces a vhost-user state structure which will
> > > > > > be shared by all vhost devs of the same virtio device.
> > > > > > 
> > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > 
> > > > > Unfortunately this patch seems to cause crashes.
> > > > > To reproduce, simply run
> > > > > make check-qtest-x86_64
> > > > > 
> > > > > Sorry that it took me a while to find - it triggers 90% of runs but not
> > > > > 100% which complicates bisection somewhat.
> > 
> > It's my fault to not notice this bug before.
> > I'm very sorry. Thank you so much for finding
> > the root cause!
> > 
> > > > > 
> > > > > > ---
> > > > > >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> > > > > >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> > > > > >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> > > > > >  hw/virtio/Makefile.objs             |  2 +-
> > > > > >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> > > > > >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> > > > > >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> > > > > >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> > > > > >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> > > > > >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> > > > > >  10 files changed, 149 insertions(+), 20 deletions(-)
> > > > > >  create mode 100644 include/hw/virtio/vhost-user.h
> > [...]
> > > > > >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > > > > >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > > > > > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > > >      assert(s->vhost_net);
> > > > > >  
> > > > > >      return 0;
> > > > > > +
> > > > > > +err:
> > > > > > +    if (user) {
> > > > > > +        vhost_user_cleanup(user);
> > > > > > +        g_free(user);
> > > > > > +        s->vhost_user = NULL;
> > > > > > +    }
> > > > > > +
> > > > > > +    return -1;
> > > > > >  }
> > > > > >  
> > > > > >  static Chardev *net_vhost_claim_chardev(
> > > > > > -- 
> > > > > > 2.11.0
> > > > 
> > > > So far I figured out that commenting the free of
> > > > the structure removes the crash, so we seem to
> > > > be dealing with a use-after free here.
> > > > I suspect that in a MQ situation, one queue gets
> > > > closed and attempts to free the structure
> > > > while others still use it.
> > > > 
> > > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > > index 525a061..6a1573b 100644
> > > > --- a/net/vhost-user.c
> > > > +++ b/net/vhost-user.c
> > > > @@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> > > >          s->vhost_net = NULL;
> > > >      }
> > > >      if (s->vhost_user) {
> > > > -        vhost_user_cleanup(s->vhost_user);
> > > > -        g_free(s->vhost_user);
> > > > +        //vhost_user_cleanup(s->vhost_user);
> > > > +        //g_free(s->vhost_user);
> > > >          s->vhost_user = NULL;
> > > >      }
> > > >      if (nc->queue_index == 0) {
> > > > @@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > >  
> > > >  err:
> > > >      if (user) {
> > > > -        vhost_user_cleanup(user);
> > > > -        g_free(user);
> > > > +        //vhost_user_cleanup(user);
> > > > +        //g_free(user);
> > > >          s->vhost_user = NULL;
> > > >      }
> > > >  
> > > 
> > > 
> > > So the following at least gets rid of the crashes.
> > > I am not sure it does not leak memory though,
> > > and not sure there aren't any configurations where
> > > the 1st queue gets cleaned up first.
> > > 
> > > Thoughts?
> > 
> > Thank you so much for catching it and fixing
> > it! I'll keep your SoB there. Thank you so
> > much! I do appreciate it!
> > 
> > You are right. This structure is freed multiple
> > times when multi-queue is enabled.
> 
> After a deeper digging, I got your point now..
> It could be a use-after-free instead of a double
> free.. As it's safe to deinit the char which is
> shared by all queue pairs when cleanup the 1st
> queue pair, it should be safe to free vhost-user
> structure there too.

One thing worried me is that, I can't reproduce
the crash with `make check-qtest-x86_64`. I tried
to run it a lot of times, but the only output I
got each time was:

        CHK version_gen.h
  GTESTER check-qtest-x86_64

I did a quick glance of the `check-qtest-x86_64`
target in the makefile, it seems that the relevant
test is `tests/vhost-user-test`. So I also tried
to run this test directly:

make tests/vhost-user-test
while true; do
  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 ./tests/vhost-user-test
done

And the only output in each loop I got was:

/x86_64/vhost-user/migrate: OK
/x86_64/vhost-user/multiqueue: OK
/x86_64/vhost-user/read-guest-mem/memfd: OK
/x86_64/vhost-user/read-guest-mem/memfile: OK

So I'm still not quite sure what caused the crash
on your side. But it does make more sense to free
the vhost-user structure only when cleanup the
1st queue pair (i.e. where the `chr` is deinit-ed).

I have sent a new patch set with above change:

http://lists.gnu.org/archive/html/qemu-devel/2018-05/msg05508.html
https://patchwork.kernel.org/bundle/btw/vhost-user-host-notifiers/

Because the above change is got from your diff
and also based on your analysis, I kept your SoB
in that patch (if you have any concern about it,
please let me know).

In this patch set, I also introduced a protocol
feature to allow slave to send fds to master via
the slave channel.

If you still see crashes with the new patch set,
please provide me a bit more details, e.g. the
crash message. Thanks a lot!

Best regards,
Tiwei Bie

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 3/6] vhost-user: support receiving file descriptors in slave_read
  2018-05-23 23:12       ` Tiwei Bie
@ 2018-05-24 13:48         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 13:48 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 07:12:15AM +0800, Tiwei Bie wrote:
> On Thu, May 24, 2018 at 12:25:23AM +0300, Michael S. Tsirkin wrote:
> > On Thu, Apr 12, 2018 at 11:12:29PM +0800, Tiwei Bie wrote:
> > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > 
> > 
> > Thinking about it, I think we should add a protocol
> > feature for this. This way remote can find out whether
> > it's safe to send this data to us.
> 
> Okay, I can add a protocol feature for this.
> Do you think it's OK to keep this patch as is
> (this patch just extends slave_read() to support
> receiving file descriptors) and introduce the
> new protocol feature in a new patch to allow
> backends to send file descriptors?
> 
> Best regards,
> Tiwei Bie

I already merged patch as is, we can add a patch on top that limits this
to a protocol feature.

> > 
> > > ---
> > >  hw/virtio/vhost-user.c | 41 ++++++++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 40 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index 91edd95453..9cea2c8c51 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -854,14 +854,44 @@ static void slave_read(void *opaque)
> > >      VhostUserHeader hdr = { 0, };
> > >      VhostUserPayload payload = { 0, };
> > >      int size, ret = 0;
> > > +    struct iovec iov;
> > > +    struct msghdr msgh;
> > > +    int fd = -1;
> > > +    char control[CMSG_SPACE(sizeof(fd))];
> > > +    struct cmsghdr *cmsg;
> > > +    size_t fdsize;
> > > +
> > > +    memset(&msgh, 0, sizeof(msgh));
> > > +    msgh.msg_iov = &iov;
> > > +    msgh.msg_iovlen = 1;
> > > +    msgh.msg_control = control;
> > > +    msgh.msg_controllen = sizeof(control);
> > >  
> > >      /* Read header */
> > > -    size = read(u->slave_fd, &hdr, VHOST_USER_HDR_SIZE);
> > > +    iov.iov_base = &hdr;
> > > +    iov.iov_len = VHOST_USER_HDR_SIZE;
> > > +
> > > +    size = recvmsg(u->slave_fd, &msgh, 0);
> > >      if (size != VHOST_USER_HDR_SIZE) {
> > >          error_report("Failed to read from slave.");
> > >          goto err;
> > >      }
> > >  
> > > +    if (msgh.msg_flags & MSG_CTRUNC) {
> > > +        error_report("Truncated message.");
> > > +        goto err;
> > > +    }
> > > +
> > > +    for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
> > > +         cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
> > > +            if (cmsg->cmsg_level == SOL_SOCKET &&
> > > +                cmsg->cmsg_type == SCM_RIGHTS) {
> > > +                    fdsize = cmsg->cmsg_len - CMSG_LEN(0);
> > > +                    memcpy(&fd, CMSG_DATA(cmsg), fdsize);
> > > +                    break;
> > > +            }
> > > +    }
> > > +
> > >      if (hdr.size > VHOST_USER_PAYLOAD_SIZE) {
> > >          error_report("Failed to read msg header."
> > >                  " Size %d exceeds the maximum %zu.", hdr.size,
> > > @@ -885,9 +915,15 @@ static void slave_read(void *opaque)
> > >          break;
> > >      default:
> > >          error_report("Received unexpected msg type.");
> > > +        if (fd != -1) {
> > > +            close(fd);
> > > +        }
> > >          ret = -EINVAL;
> > >      }
> > >  
> > > +    /* Message handlers need to make sure that fd will be consumed. */
> > > +    fd = -1;
> > > +
> > >      /*
> > >       * REPLY_ACK feature handling. Other reply types has to be managed
> > >       * directly in their request handlers.
> > > @@ -920,6 +956,9 @@ err:
> > >      qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
> > >      close(u->slave_fd);
> > >      u->slave_fd = -1;
> > > +    if (fd != -1) {
> > > +        close(fd);
> > > +    }
> > >      return;
> > >  }
> > >  
> > > -- 
> > > 2.11.0
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [virtio-dev] Re: [PATCH v3 3/6] vhost-user: support receiving file descriptors in slave_read
@ 2018-05-24 13:48         ` Michael S. Tsirkin
  0 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 13:48 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 07:12:15AM +0800, Tiwei Bie wrote:
> On Thu, May 24, 2018 at 12:25:23AM +0300, Michael S. Tsirkin wrote:
> > On Thu, Apr 12, 2018 at 11:12:29PM +0800, Tiwei Bie wrote:
> > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > 
> > 
> > Thinking about it, I think we should add a protocol
> > feature for this. This way remote can find out whether
> > it's safe to send this data to us.
> 
> Okay, I can add a protocol feature for this.
> Do you think it's OK to keep this patch as is
> (this patch just extends slave_read() to support
> receiving file descriptors) and introduce the
> new protocol feature in a new patch to allow
> backends to send file descriptors?
> 
> Best regards,
> Tiwei Bie

I already merged patch as is, we can add a patch on top that limits this
to a protocol feature.

> > 
> > > ---
> > >  hw/virtio/vhost-user.c | 41 ++++++++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 40 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index 91edd95453..9cea2c8c51 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -854,14 +854,44 @@ static void slave_read(void *opaque)
> > >      VhostUserHeader hdr = { 0, };
> > >      VhostUserPayload payload = { 0, };
> > >      int size, ret = 0;
> > > +    struct iovec iov;
> > > +    struct msghdr msgh;
> > > +    int fd = -1;
> > > +    char control[CMSG_SPACE(sizeof(fd))];
> > > +    struct cmsghdr *cmsg;
> > > +    size_t fdsize;
> > > +
> > > +    memset(&msgh, 0, sizeof(msgh));
> > > +    msgh.msg_iov = &iov;
> > > +    msgh.msg_iovlen = 1;
> > > +    msgh.msg_control = control;
> > > +    msgh.msg_controllen = sizeof(control);
> > >  
> > >      /* Read header */
> > > -    size = read(u->slave_fd, &hdr, VHOST_USER_HDR_SIZE);
> > > +    iov.iov_base = &hdr;
> > > +    iov.iov_len = VHOST_USER_HDR_SIZE;
> > > +
> > > +    size = recvmsg(u->slave_fd, &msgh, 0);
> > >      if (size != VHOST_USER_HDR_SIZE) {
> > >          error_report("Failed to read from slave.");
> > >          goto err;
> > >      }
> > >  
> > > +    if (msgh.msg_flags & MSG_CTRUNC) {
> > > +        error_report("Truncated message.");
> > > +        goto err;
> > > +    }
> > > +
> > > +    for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
> > > +         cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
> > > +            if (cmsg->cmsg_level == SOL_SOCKET &&
> > > +                cmsg->cmsg_type == SCM_RIGHTS) {
> > > +                    fdsize = cmsg->cmsg_len - CMSG_LEN(0);
> > > +                    memcpy(&fd, CMSG_DATA(cmsg), fdsize);
> > > +                    break;
> > > +            }
> > > +    }
> > > +
> > >      if (hdr.size > VHOST_USER_PAYLOAD_SIZE) {
> > >          error_report("Failed to read msg header."
> > >                  " Size %d exceeds the maximum %zu.", hdr.size,
> > > @@ -885,9 +915,15 @@ static void slave_read(void *opaque)
> > >          break;
> > >      default:
> > >          error_report("Received unexpected msg type.");
> > > +        if (fd != -1) {
> > > +            close(fd);
> > > +        }
> > >          ret = -EINVAL;
> > >      }
> > >  
> > > +    /* Message handlers need to make sure that fd will be consumed. */
> > > +    fd = -1;
> > > +
> > >      /*
> > >       * REPLY_ACK feature handling. Other reply types has to be managed
> > >       * directly in their request handlers.
> > > @@ -920,6 +956,9 @@ err:
> > >      qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
> > >      close(u->slave_fd);
> > >      u->slave_fd = -1;
> > > +    if (fd != -1) {
> > > +        close(fd);
> > > +    }
> > >      return;
> > >  }
> > >  
> > > -- 
> > > 2.11.0
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
  2018-05-24  2:24             ` [virtio-dev] " Tiwei Bie
@ 2018-05-24 13:50               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 13:50 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 10:24:40AM +0800, Tiwei Bie wrote:
> On Thu, May 24, 2018 at 07:21:01AM +0800, Tiwei Bie wrote:
> > On Wed, May 23, 2018 at 06:43:29PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 23, 2018 at 06:36:05PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> > > > > On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > > > > > When multi queue is enabled e.g. for a virtio-net device,
> > > > > > each queue pair will have a vhost_dev, and the only thing
> > > > > > shared between vhost devs currently is the chardev. This
> > > > > > patch introduces a vhost-user state structure which will
> > > > > > be shared by all vhost devs of the same virtio device.
> > > > > > 
> > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > 
> > > > > Unfortunately this patch seems to cause crashes.
> > > > > To reproduce, simply run
> > > > > make check-qtest-x86_64
> > > > > 
> > > > > Sorry that it took me a while to find - it triggers 90% of runs but not
> > > > > 100% which complicates bisection somewhat.
> > 
> > It's my fault to not notice this bug before.
> > I'm very sorry. Thank you so much for finding
> > the root cause!
> > 
> > > > > 
> > > > > > ---
> > > > > >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> > > > > >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> > > > > >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> > > > > >  hw/virtio/Makefile.objs             |  2 +-
> > > > > >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> > > > > >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> > > > > >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> > > > > >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> > > > > >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> > > > > >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> > > > > >  10 files changed, 149 insertions(+), 20 deletions(-)
> > > > > >  create mode 100644 include/hw/virtio/vhost-user.h
> > [...]
> > > > > >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > > > > >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > > > > > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > > >      assert(s->vhost_net);
> > > > > >  
> > > > > >      return 0;
> > > > > > +
> > > > > > +err:
> > > > > > +    if (user) {
> > > > > > +        vhost_user_cleanup(user);
> > > > > > +        g_free(user);
> > > > > > +        s->vhost_user = NULL;
> > > > > > +    }
> > > > > > +
> > > > > > +    return -1;
> > > > > >  }
> > > > > >  
> > > > > >  static Chardev *net_vhost_claim_chardev(
> > > > > > -- 
> > > > > > 2.11.0
> > > > 
> > > > So far I figured out that commenting the free of
> > > > the structure removes the crash, so we seem to
> > > > be dealing with a use-after free here.
> > > > I suspect that in a MQ situation, one queue gets
> > > > closed and attempts to free the structure
> > > > while others still use it.
> > > > 
> > > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > > index 525a061..6a1573b 100644
> > > > --- a/net/vhost-user.c
> > > > +++ b/net/vhost-user.c
> > > > @@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> > > >          s->vhost_net = NULL;
> > > >      }
> > > >      if (s->vhost_user) {
> > > > -        vhost_user_cleanup(s->vhost_user);
> > > > -        g_free(s->vhost_user);
> > > > +        //vhost_user_cleanup(s->vhost_user);
> > > > +        //g_free(s->vhost_user);
> > > >          s->vhost_user = NULL;
> > > >      }
> > > >      if (nc->queue_index == 0) {
> > > > @@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > >  
> > > >  err:
> > > >      if (user) {
> > > > -        vhost_user_cleanup(user);
> > > > -        g_free(user);
> > > > +        //vhost_user_cleanup(user);
> > > > +        //g_free(user);
> > > >          s->vhost_user = NULL;
> > > >      }
> > > >  
> > > 
> > > 
> > > So the following at least gets rid of the crashes.
> > > I am not sure it does not leak memory though,
> > > and not sure there aren't any configurations where
> > > the 1st queue gets cleaned up first.
> > > 
> > > Thoughts?
> > 
> > Thank you so much for catching it and fixing
> > it! I'll keep your SoB there. Thank you so
> > much! I do appreciate it!
> > 
> > You are right. This structure is freed multiple
> > times when multi-queue is enabled.
> 
> After a deeper digging, I got your point now..
> It could be a use-after-free instead of a double
> free.. As it's safe to deinit the char which is
> shared by all queue pairs when cleanup the 1st
> queue pair, it should be safe to free vhost-user
> structure there too.
> 
> > 
> > I think it's safe to let the first queue pair
> > free the vhost-user structure, because it won't
> > be touched by other queue pairs during cleanup.
> > 
> > Best regards,
> > Tiwei Bie
> > 
> > 
> > > 
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > 
> > > ---
> > > 
> > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > index 525a061..7549d25 100644
> > > --- a/net/vhost-user.c
> > > +++ b/net/vhost-user.c
> > > @@ -156,19 +156,20 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> > >          g_free(s->vhost_net);
> > >          s->vhost_net = NULL;
> > >      }
> > > -    if (s->vhost_user) {
> > > -        vhost_user_cleanup(s->vhost_user);
> > > -        g_free(s->vhost_user);
> > > -        s->vhost_user = NULL;
> > > -    }
> > >      if (nc->queue_index == 0) {
> > >          if (s->watch) {
> > >              g_source_remove(s->watch);
> > >              s->watch = 0;
> > >          }
> > >          qemu_chr_fe_deinit(&s->chr, true);
> > > +        if (s->vhost_user) {
> > > +            vhost_user_cleanup(s->vhost_user);
> > > +            g_free(s->vhost_user);
> > > +        }
> > >      }
> > >  
> > > +    s->vhost_user = NULL;
> 
> Maybe we should move above line, like:
> 
>      if (nc->queue_index == 0) {
>          if (s->watch) {
>              g_source_remove(s->watch);
>              s->watch = 0;
>          }
>          qemu_chr_fe_deinit(&s->chr, true);
> +        if (s->vhost_user) {
> +            vhost_user_cleanup(s->vhost_user);
> +            g_free(s->vhost_user);
> +            s->vhost_user = NULL;
> +        }
>      }
> 
> otherwise s->vhost_user may not be freed.
> 
> > > +
> > >      qemu_purge_queued_packets(nc);
> > >  }
> > >  
> > > @@ -341,7 +342,6 @@ err:
> > >      if (user) {
> > >          vhost_user_cleanup(user);
> > >          g_free(user);
> > > -        s->vhost_user = NULL;
> 
> I don't get why cannot zero it in this case.

You don't even know s is initialized.
Just make sure s->vhost_user is only set after you know
init succeeded.

> > >      }
> > >  
> > >      return -1;
> 
> Best regards,
> Tiwei Bie

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
@ 2018-05-24 13:50               ` Michael S. Tsirkin
  0 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 13:50 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 10:24:40AM +0800, Tiwei Bie wrote:
> On Thu, May 24, 2018 at 07:21:01AM +0800, Tiwei Bie wrote:
> > On Wed, May 23, 2018 at 06:43:29PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 23, 2018 at 06:36:05PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> > > > > On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > > > > > When multi queue is enabled e.g. for a virtio-net device,
> > > > > > each queue pair will have a vhost_dev, and the only thing
> > > > > > shared between vhost devs currently is the chardev. This
> > > > > > patch introduces a vhost-user state structure which will
> > > > > > be shared by all vhost devs of the same virtio device.
> > > > > > 
> > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > 
> > > > > Unfortunately this patch seems to cause crashes.
> > > > > To reproduce, simply run
> > > > > make check-qtest-x86_64
> > > > > 
> > > > > Sorry that it took me a while to find - it triggers 90% of runs but not
> > > > > 100% which complicates bisection somewhat.
> > 
> > It's my fault to not notice this bug before.
> > I'm very sorry. Thank you so much for finding
> > the root cause!
> > 
> > > > > 
> > > > > > ---
> > > > > >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> > > > > >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> > > > > >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> > > > > >  hw/virtio/Makefile.objs             |  2 +-
> > > > > >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> > > > > >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> > > > > >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> > > > > >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> > > > > >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> > > > > >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> > > > > >  10 files changed, 149 insertions(+), 20 deletions(-)
> > > > > >  create mode 100644 include/hw/virtio/vhost-user.h
> > [...]
> > > > > >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > > > > >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > > > > > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > > >      assert(s->vhost_net);
> > > > > >  
> > > > > >      return 0;
> > > > > > +
> > > > > > +err:
> > > > > > +    if (user) {
> > > > > > +        vhost_user_cleanup(user);
> > > > > > +        g_free(user);
> > > > > > +        s->vhost_user = NULL;
> > > > > > +    }
> > > > > > +
> > > > > > +    return -1;
> > > > > >  }
> > > > > >  
> > > > > >  static Chardev *net_vhost_claim_chardev(
> > > > > > -- 
> > > > > > 2.11.0
> > > > 
> > > > So far I figured out that commenting the free of
> > > > the structure removes the crash, so we seem to
> > > > be dealing with a use-after free here.
> > > > I suspect that in a MQ situation, one queue gets
> > > > closed and attempts to free the structure
> > > > while others still use it.
> > > > 
> > > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > > index 525a061..6a1573b 100644
> > > > --- a/net/vhost-user.c
> > > > +++ b/net/vhost-user.c
> > > > @@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> > > >          s->vhost_net = NULL;
> > > >      }
> > > >      if (s->vhost_user) {
> > > > -        vhost_user_cleanup(s->vhost_user);
> > > > -        g_free(s->vhost_user);
> > > > +        //vhost_user_cleanup(s->vhost_user);
> > > > +        //g_free(s->vhost_user);
> > > >          s->vhost_user = NULL;
> > > >      }
> > > >      if (nc->queue_index == 0) {
> > > > @@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > >  
> > > >  err:
> > > >      if (user) {
> > > > -        vhost_user_cleanup(user);
> > > > -        g_free(user);
> > > > +        //vhost_user_cleanup(user);
> > > > +        //g_free(user);
> > > >          s->vhost_user = NULL;
> > > >      }
> > > >  
> > > 
> > > 
> > > So the following at least gets rid of the crashes.
> > > I am not sure it does not leak memory though,
> > > and not sure there aren't any configurations where
> > > the 1st queue gets cleaned up first.
> > > 
> > > Thoughts?
> > 
> > Thank you so much for catching it and fixing
> > it! I'll keep your SoB there. Thank you so
> > much! I do appreciate it!
> > 
> > You are right. This structure is freed multiple
> > times when multi-queue is enabled.
> 
> After a deeper digging, I got your point now..
> It could be a use-after-free instead of a double
> free.. As it's safe to deinit the char which is
> shared by all queue pairs when cleanup the 1st
> queue pair, it should be safe to free vhost-user
> structure there too.
> 
> > 
> > I think it's safe to let the first queue pair
> > free the vhost-user structure, because it won't
> > be touched by other queue pairs during cleanup.
> > 
> > Best regards,
> > Tiwei Bie
> > 
> > 
> > > 
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > 
> > > ---
> > > 
> > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > index 525a061..7549d25 100644
> > > --- a/net/vhost-user.c
> > > +++ b/net/vhost-user.c
> > > @@ -156,19 +156,20 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> > >          g_free(s->vhost_net);
> > >          s->vhost_net = NULL;
> > >      }
> > > -    if (s->vhost_user) {
> > > -        vhost_user_cleanup(s->vhost_user);
> > > -        g_free(s->vhost_user);
> > > -        s->vhost_user = NULL;
> > > -    }
> > >      if (nc->queue_index == 0) {
> > >          if (s->watch) {
> > >              g_source_remove(s->watch);
> > >              s->watch = 0;
> > >          }
> > >          qemu_chr_fe_deinit(&s->chr, true);
> > > +        if (s->vhost_user) {
> > > +            vhost_user_cleanup(s->vhost_user);
> > > +            g_free(s->vhost_user);
> > > +        }
> > >      }
> > >  
> > > +    s->vhost_user = NULL;
> 
> Maybe we should move above line, like:
> 
>      if (nc->queue_index == 0) {
>          if (s->watch) {
>              g_source_remove(s->watch);
>              s->watch = 0;
>          }
>          qemu_chr_fe_deinit(&s->chr, true);
> +        if (s->vhost_user) {
> +            vhost_user_cleanup(s->vhost_user);
> +            g_free(s->vhost_user);
> +            s->vhost_user = NULL;
> +        }
>      }
> 
> otherwise s->vhost_user may not be freed.
> 
> > > +
> > >      qemu_purge_queued_packets(nc);
> > >  }
> > >  
> > > @@ -341,7 +342,6 @@ err:
> > >      if (user) {
> > >          vhost_user_cleanup(user);
> > >          g_free(user);
> > > -        s->vhost_user = NULL;
> 
> I don't get why cannot zero it in this case.

You don't even know s is initialized.
Just make sure s->vhost_user is only set after you know
init succeeded.

> > >      }
> > >  
> > >      return -1;
> 
> Best regards,
> Tiwei Bie

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
  2018-05-24 10:59               ` [virtio-dev] " Tiwei Bie
@ 2018-05-24 13:55                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 13:55 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 06:59:36PM +0800, Tiwei Bie wrote:
> On Thu, May 24, 2018 at 10:24:40AM +0800, Tiwei Bie wrote:
> > On Thu, May 24, 2018 at 07:21:01AM +0800, Tiwei Bie wrote:
> > > On Wed, May 23, 2018 at 06:43:29PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 23, 2018 at 06:36:05PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > > > > > > When multi queue is enabled e.g. for a virtio-net device,
> > > > > > > each queue pair will have a vhost_dev, and the only thing
> > > > > > > shared between vhost devs currently is the chardev. This
> > > > > > > patch introduces a vhost-user state structure which will
> > > > > > > be shared by all vhost devs of the same virtio device.
> > > > > > > 
> > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > 
> > > > > > Unfortunately this patch seems to cause crashes.
> > > > > > To reproduce, simply run
> > > > > > make check-qtest-x86_64
> > > > > > 
> > > > > > Sorry that it took me a while to find - it triggers 90% of runs but not
> > > > > > 100% which complicates bisection somewhat.
> > > 
> > > It's my fault to not notice this bug before.
> > > I'm very sorry. Thank you so much for finding
> > > the root cause!
> > > 
> > > > > > 
> > > > > > > ---
> > > > > > >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> > > > > > >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> > > > > > >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> > > > > > >  hw/virtio/Makefile.objs             |  2 +-
> > > > > > >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> > > > > > >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> > > > > > >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> > > > > > >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> > > > > > >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> > > > > > >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> > > > > > >  10 files changed, 149 insertions(+), 20 deletions(-)
> > > > > > >  create mode 100644 include/hw/virtio/vhost-user.h
> > > [...]
> > > > > > >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > > > > > >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > > > > > > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > > > >      assert(s->vhost_net);
> > > > > > >  
> > > > > > >      return 0;
> > > > > > > +
> > > > > > > +err:
> > > > > > > +    if (user) {
> > > > > > > +        vhost_user_cleanup(user);
> > > > > > > +        g_free(user);
> > > > > > > +        s->vhost_user = NULL;
> > > > > > > +    }
> > > > > > > +
> > > > > > > +    return -1;
> > > > > > >  }
> > > > > > >  
> > > > > > >  static Chardev *net_vhost_claim_chardev(
> > > > > > > -- 
> > > > > > > 2.11.0
> > > > > 
> > > > > So far I figured out that commenting the free of
> > > > > the structure removes the crash, so we seem to
> > > > > be dealing with a use-after free here.
> > > > > I suspect that in a MQ situation, one queue gets
> > > > > closed and attempts to free the structure
> > > > > while others still use it.
> > > > > 
> > > > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > > > index 525a061..6a1573b 100644
> > > > > --- a/net/vhost-user.c
> > > > > +++ b/net/vhost-user.c
> > > > > @@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> > > > >          s->vhost_net = NULL;
> > > > >      }
> > > > >      if (s->vhost_user) {
> > > > > -        vhost_user_cleanup(s->vhost_user);
> > > > > -        g_free(s->vhost_user);
> > > > > +        //vhost_user_cleanup(s->vhost_user);
> > > > > +        //g_free(s->vhost_user);
> > > > >          s->vhost_user = NULL;
> > > > >      }
> > > > >      if (nc->queue_index == 0) {
> > > > > @@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > >  
> > > > >  err:
> > > > >      if (user) {
> > > > > -        vhost_user_cleanup(user);
> > > > > -        g_free(user);
> > > > > +        //vhost_user_cleanup(user);
> > > > > +        //g_free(user);
> > > > >          s->vhost_user = NULL;
> > > > >      }
> > > > >  
> > > > 
> > > > 
> > > > So the following at least gets rid of the crashes.
> > > > I am not sure it does not leak memory though,
> > > > and not sure there aren't any configurations where
> > > > the 1st queue gets cleaned up first.
> > > > 
> > > > Thoughts?
> > > 
> > > Thank you so much for catching it and fixing
> > > it! I'll keep your SoB there. Thank you so
> > > much! I do appreciate it!
> > > 
> > > You are right. This structure is freed multiple
> > > times when multi-queue is enabled.
> > 
> > After a deeper digging, I got your point now..
> > It could be a use-after-free instead of a double
> > free.. As it's safe to deinit the char which is
> > shared by all queue pairs when cleanup the 1st
> > queue pair, it should be safe to free vhost-user
> > structure there too.
> 
> One thing worried me is that, I can't reproduce
> the crash with `make check-qtest-x86_64`. I tried
> to run it a lot of times, but the only output I
> got each time was:
> 
>         CHK version_gen.h
>   GTESTER check-qtest-x86_64

The issue triggers on cleanup, this is why the test passes. Enable
coredump, then you will see the core file afterwards.

We should probably teach the tests to check qemu
return status, that will catch this faster.


> I did a quick glance of the `check-qtest-x86_64`
> target in the makefile, it seems that the relevant
> test is `tests/vhost-user-test`. So I also tried
> to run this test directly:
> 
> make tests/vhost-user-test
> while true; do
>   QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 ./tests/vhost-user-test
> done
> 
> And the only output in each loop I got was:
> 
> /x86_64/vhost-user/migrate: OK
> /x86_64/vhost-user/multiqueue: OK
> /x86_64/vhost-user/read-guest-mem/memfd: OK
> /x86_64/vhost-user/read-guest-mem/memfile: OK
> 
> So I'm still not quite sure what caused the crash
> on your side. But it does make more sense to free
> the vhost-user structure only when cleanup the
> 1st queue pair (i.e. where the `chr` is deinit-ed).
> 
> I have sent a new patch set with above change:
> 
> http://lists.gnu.org/archive/html/qemu-devel/2018-05/msg05508.html
> https://patchwork.kernel.org/bundle/btw/vhost-user-host-notifiers/
> 
> Because the above change is got from your diff
> and also based on your analysis, I kept your SoB
> in that patch (if you have any concern about it,
> please let me know).
> 
> In this patch set, I also introduced a protocol
> feature to allow slave to send fds to master via
> the slave channel.
> 
> If you still see crashes with the new patch set,
> please provide me a bit more details, e.g. the
> crash message. Thanks a lot!
> 
> Best regards,
> Tiwei Bie

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
@ 2018-05-24 13:55                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 13:55 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 06:59:36PM +0800, Tiwei Bie wrote:
> On Thu, May 24, 2018 at 10:24:40AM +0800, Tiwei Bie wrote:
> > On Thu, May 24, 2018 at 07:21:01AM +0800, Tiwei Bie wrote:
> > > On Wed, May 23, 2018 at 06:43:29PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 23, 2018 at 06:36:05PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > > > > > > When multi queue is enabled e.g. for a virtio-net device,
> > > > > > > each queue pair will have a vhost_dev, and the only thing
> > > > > > > shared between vhost devs currently is the chardev. This
> > > > > > > patch introduces a vhost-user state structure which will
> > > > > > > be shared by all vhost devs of the same virtio device.
> > > > > > > 
> > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > 
> > > > > > Unfortunately this patch seems to cause crashes.
> > > > > > To reproduce, simply run
> > > > > > make check-qtest-x86_64
> > > > > > 
> > > > > > Sorry that it took me a while to find - it triggers 90% of runs but not
> > > > > > 100% which complicates bisection somewhat.
> > > 
> > > It's my fault to not notice this bug before.
> > > I'm very sorry. Thank you so much for finding
> > > the root cause!
> > > 
> > > > > > 
> > > > > > > ---
> > > > > > >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> > > > > > >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> > > > > > >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> > > > > > >  hw/virtio/Makefile.objs             |  2 +-
> > > > > > >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> > > > > > >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> > > > > > >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> > > > > > >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> > > > > > >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> > > > > > >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> > > > > > >  10 files changed, 149 insertions(+), 20 deletions(-)
> > > > > > >  create mode 100644 include/hw/virtio/vhost-user.h
> > > [...]
> > > > > > >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > > > > > >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > > > > > > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > > > >      assert(s->vhost_net);
> > > > > > >  
> > > > > > >      return 0;
> > > > > > > +
> > > > > > > +err:
> > > > > > > +    if (user) {
> > > > > > > +        vhost_user_cleanup(user);
> > > > > > > +        g_free(user);
> > > > > > > +        s->vhost_user = NULL;
> > > > > > > +    }
> > > > > > > +
> > > > > > > +    return -1;
> > > > > > >  }
> > > > > > >  
> > > > > > >  static Chardev *net_vhost_claim_chardev(
> > > > > > > -- 
> > > > > > > 2.11.0
> > > > > 
> > > > > So far I figured out that commenting the free of
> > > > > the structure removes the crash, so we seem to
> > > > > be dealing with a use-after free here.
> > > > > I suspect that in a MQ situation, one queue gets
> > > > > closed and attempts to free the structure
> > > > > while others still use it.
> > > > > 
> > > > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > > > index 525a061..6a1573b 100644
> > > > > --- a/net/vhost-user.c
> > > > > +++ b/net/vhost-user.c
> > > > > @@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> > > > >          s->vhost_net = NULL;
> > > > >      }
> > > > >      if (s->vhost_user) {
> > > > > -        vhost_user_cleanup(s->vhost_user);
> > > > > -        g_free(s->vhost_user);
> > > > > +        //vhost_user_cleanup(s->vhost_user);
> > > > > +        //g_free(s->vhost_user);
> > > > >          s->vhost_user = NULL;
> > > > >      }
> > > > >      if (nc->queue_index == 0) {
> > > > > @@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > >  
> > > > >  err:
> > > > >      if (user) {
> > > > > -        vhost_user_cleanup(user);
> > > > > -        g_free(user);
> > > > > +        //vhost_user_cleanup(user);
> > > > > +        //g_free(user);
> > > > >          s->vhost_user = NULL;
> > > > >      }
> > > > >  
> > > > 
> > > > 
> > > > So the following at least gets rid of the crashes.
> > > > I am not sure it does not leak memory though,
> > > > and not sure there aren't any configurations where
> > > > the 1st queue gets cleaned up first.
> > > > 
> > > > Thoughts?
> > > 
> > > Thank you so much for catching it and fixing
> > > it! I'll keep your SoB there. Thank you so
> > > much! I do appreciate it!
> > > 
> > > You are right. This structure is freed multiple
> > > times when multi-queue is enabled.
> > 
> > After a deeper digging, I got your point now..
> > It could be a use-after-free instead of a double
> > free.. As it's safe to deinit the char which is
> > shared by all queue pairs when cleanup the 1st
> > queue pair, it should be safe to free vhost-user
> > structure there too.
> 
> One thing worried me is that, I can't reproduce
> the crash with `make check-qtest-x86_64`. I tried
> to run it a lot of times, but the only output I
> got each time was:
> 
>         CHK version_gen.h
>   GTESTER check-qtest-x86_64

The issue triggers on cleanup, this is why the test passes. Enable
coredump, then you will see the core file afterwards.

We should probably teach the tests to check qemu
return status, that will catch this faster.


> I did a quick glance of the `check-qtest-x86_64`
> target in the makefile, it seems that the relevant
> test is `tests/vhost-user-test`. So I also tried
> to run this test directly:
> 
> make tests/vhost-user-test
> while true; do
>   QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 ./tests/vhost-user-test
> done
> 
> And the only output in each loop I got was:
> 
> /x86_64/vhost-user/migrate: OK
> /x86_64/vhost-user/multiqueue: OK
> /x86_64/vhost-user/read-guest-mem/memfd: OK
> /x86_64/vhost-user/read-guest-mem/memfile: OK
> 
> So I'm still not quite sure what caused the crash
> on your side. But it does make more sense to free
> the vhost-user structure only when cleanup the
> 1st queue pair (i.e. where the `chr` is deinit-ed).
> 
> I have sent a new patch set with above change:
> 
> http://lists.gnu.org/archive/html/qemu-devel/2018-05/msg05508.html
> https://patchwork.kernel.org/bundle/btw/vhost-user-host-notifiers/
> 
> Because the above change is got from your diff
> and also based on your analysis, I kept your SoB
> in that patch (if you have any concern about it,
> please let me know).
> 
> In this patch set, I also introduced a protocol
> feature to allow slave to send fds to master via
> the slave channel.
> 
> If you still see crashes with the new patch set,
> please provide me a bit more details, e.g. the
> crash message. Thanks a lot!
> 
> Best regards,
> Tiwei Bie

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
  2018-05-24 10:59               ` [virtio-dev] " Tiwei Bie
@ 2018-05-24 14:30                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 14:30 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 06:59:36PM +0800, Tiwei Bie wrote:
> On Thu, May 24, 2018 at 10:24:40AM +0800, Tiwei Bie wrote:
> > On Thu, May 24, 2018 at 07:21:01AM +0800, Tiwei Bie wrote:
> > > On Wed, May 23, 2018 at 06:43:29PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 23, 2018 at 06:36:05PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > > > > > > When multi queue is enabled e.g. for a virtio-net device,
> > > > > > > each queue pair will have a vhost_dev, and the only thing
> > > > > > > shared between vhost devs currently is the chardev. This
> > > > > > > patch introduces a vhost-user state structure which will
> > > > > > > be shared by all vhost devs of the same virtio device.
> > > > > > > 
> > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > 
> > > > > > Unfortunately this patch seems to cause crashes.
> > > > > > To reproduce, simply run
> > > > > > make check-qtest-x86_64
> > > > > > 
> > > > > > Sorry that it took me a while to find - it triggers 90% of runs but not
> > > > > > 100% which complicates bisection somewhat.
> > > 
> > > It's my fault to not notice this bug before.
> > > I'm very sorry. Thank you so much for finding
> > > the root cause!
> > > 
> > > > > > 
> > > > > > > ---
> > > > > > >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> > > > > > >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> > > > > > >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> > > > > > >  hw/virtio/Makefile.objs             |  2 +-
> > > > > > >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> > > > > > >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> > > > > > >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> > > > > > >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> > > > > > >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> > > > > > >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> > > > > > >  10 files changed, 149 insertions(+), 20 deletions(-)
> > > > > > >  create mode 100644 include/hw/virtio/vhost-user.h
> > > [...]
> > > > > > >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > > > > > >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > > > > > > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > > > >      assert(s->vhost_net);
> > > > > > >  
> > > > > > >      return 0;
> > > > > > > +
> > > > > > > +err:
> > > > > > > +    if (user) {
> > > > > > > +        vhost_user_cleanup(user);
> > > > > > > +        g_free(user);
> > > > > > > +        s->vhost_user = NULL;
> > > > > > > +    }
> > > > > > > +
> > > > > > > +    return -1;
> > > > > > >  }
> > > > > > >  
> > > > > > >  static Chardev *net_vhost_claim_chardev(
> > > > > > > -- 
> > > > > > > 2.11.0
> > > > > 
> > > > > So far I figured out that commenting the free of
> > > > > the structure removes the crash, so we seem to
> > > > > be dealing with a use-after free here.
> > > > > I suspect that in a MQ situation, one queue gets
> > > > > closed and attempts to free the structure
> > > > > while others still use it.
> > > > > 
> > > > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > > > index 525a061..6a1573b 100644
> > > > > --- a/net/vhost-user.c
> > > > > +++ b/net/vhost-user.c
> > > > > @@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> > > > >          s->vhost_net = NULL;
> > > > >      }
> > > > >      if (s->vhost_user) {
> > > > > -        vhost_user_cleanup(s->vhost_user);
> > > > > -        g_free(s->vhost_user);
> > > > > +        //vhost_user_cleanup(s->vhost_user);
> > > > > +        //g_free(s->vhost_user);
> > > > >          s->vhost_user = NULL;
> > > > >      }
> > > > >      if (nc->queue_index == 0) {
> > > > > @@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > >  
> > > > >  err:
> > > > >      if (user) {
> > > > > -        vhost_user_cleanup(user);
> > > > > -        g_free(user);
> > > > > +        //vhost_user_cleanup(user);
> > > > > +        //g_free(user);
> > > > >          s->vhost_user = NULL;
> > > > >      }
> > > > >  
> > > > 
> > > > 
> > > > So the following at least gets rid of the crashes.
> > > > I am not sure it does not leak memory though,
> > > > and not sure there aren't any configurations where
> > > > the 1st queue gets cleaned up first.
> > > > 
> > > > Thoughts?
> > > 
> > > Thank you so much for catching it and fixing
> > > it! I'll keep your SoB there. Thank you so
> > > much! I do appreciate it!
> > > 
> > > You are right. This structure is freed multiple
> > > times when multi-queue is enabled.
> > 
> > After a deeper digging, I got your point now..
> > It could be a use-after-free instead of a double
> > free.. As it's safe to deinit the char which is
> > shared by all queue pairs when cleanup the 1st
> > queue pair, it should be safe to free vhost-user
> > structure there too.
> 
> One thing worried me is that, I can't reproduce
> the crash with `make check-qtest-x86_64`.

I sent a patch that will make test fail on qemu coredump.

> I tried
> to run it a lot of times, but the only output I
> got each time was:
> 
>         CHK version_gen.h
>   GTESTER check-qtest-x86_64
> 
> I did a quick glance of the `check-qtest-x86_64`
> target in the makefile, it seems that the relevant
> test is `tests/vhost-user-test`. So I also tried
> to run this test directly:
> 
> make tests/vhost-user-test
> while true; do
>   QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 ./tests/vhost-user-test
> done
> 
> And the only output in each loop I got was:
> 
> /x86_64/vhost-user/migrate: OK
> /x86_64/vhost-user/multiqueue: OK
> /x86_64/vhost-user/read-guest-mem/memfd: OK
> /x86_64/vhost-user/read-guest-mem/memfile: OK
> 
> So I'm still not quite sure what caused the crash
> on your side. But it does make more sense to free
> the vhost-user structure only when cleanup the
> 1st queue pair (i.e. where the `chr` is deinit-ed).
> 
> I have sent a new patch set with above change:
> 
> http://lists.gnu.org/archive/html/qemu-devel/2018-05/msg05508.html
> https://patchwork.kernel.org/bundle/btw/vhost-user-host-notifiers/
> 
> Because the above change is got from your diff
> and also based on your analysis, I kept your SoB
> in that patch (if you have any concern about it,
> please let me know).
> 
> In this patch set, I also introduced a protocol
> feature to allow slave to send fds to master via
> the slave channel.
> 
> If you still see crashes with the new patch set,
> please provide me a bit more details, e.g. the
> crash message. Thanks a lot!
> 
> Best regards,
> Tiwei Bie

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
@ 2018-05-24 14:30                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 98+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 14:30 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 06:59:36PM +0800, Tiwei Bie wrote:
> On Thu, May 24, 2018 at 10:24:40AM +0800, Tiwei Bie wrote:
> > On Thu, May 24, 2018 at 07:21:01AM +0800, Tiwei Bie wrote:
> > > On Wed, May 23, 2018 at 06:43:29PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 23, 2018 at 06:36:05PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > > > > > > When multi queue is enabled e.g. for a virtio-net device,
> > > > > > > each queue pair will have a vhost_dev, and the only thing
> > > > > > > shared between vhost devs currently is the chardev. This
> > > > > > > patch introduces a vhost-user state structure which will
> > > > > > > be shared by all vhost devs of the same virtio device.
> > > > > > > 
> > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > 
> > > > > > Unfortunately this patch seems to cause crashes.
> > > > > > To reproduce, simply run
> > > > > > make check-qtest-x86_64
> > > > > > 
> > > > > > Sorry that it took me a while to find - it triggers 90% of runs but not
> > > > > > 100% which complicates bisection somewhat.
> > > 
> > > It's my fault to not notice this bug before.
> > > I'm very sorry. Thank you so much for finding
> > > the root cause!
> > > 
> > > > > > 
> > > > > > > ---
> > > > > > >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> > > > > > >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> > > > > > >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> > > > > > >  hw/virtio/Makefile.objs             |  2 +-
> > > > > > >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> > > > > > >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> > > > > > >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> > > > > > >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> > > > > > >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> > > > > > >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> > > > > > >  10 files changed, 149 insertions(+), 20 deletions(-)
> > > > > > >  create mode 100644 include/hw/virtio/vhost-user.h
> > > [...]
> > > > > > >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > > > > > >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > > > > > > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > > > >      assert(s->vhost_net);
> > > > > > >  
> > > > > > >      return 0;
> > > > > > > +
> > > > > > > +err:
> > > > > > > +    if (user) {
> > > > > > > +        vhost_user_cleanup(user);
> > > > > > > +        g_free(user);
> > > > > > > +        s->vhost_user = NULL;
> > > > > > > +    }
> > > > > > > +
> > > > > > > +    return -1;
> > > > > > >  }
> > > > > > >  
> > > > > > >  static Chardev *net_vhost_claim_chardev(
> > > > > > > -- 
> > > > > > > 2.11.0
> > > > > 
> > > > > So far I figured out that commenting the free of
> > > > > the structure removes the crash, so we seem to
> > > > > be dealing with a use-after free here.
> > > > > I suspect that in a MQ situation, one queue gets
> > > > > closed and attempts to free the structure
> > > > > while others still use it.
> > > > > 
> > > > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > > > index 525a061..6a1573b 100644
> > > > > --- a/net/vhost-user.c
> > > > > +++ b/net/vhost-user.c
> > > > > @@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> > > > >          s->vhost_net = NULL;
> > > > >      }
> > > > >      if (s->vhost_user) {
> > > > > -        vhost_user_cleanup(s->vhost_user);
> > > > > -        g_free(s->vhost_user);
> > > > > +        //vhost_user_cleanup(s->vhost_user);
> > > > > +        //g_free(s->vhost_user);
> > > > >          s->vhost_user = NULL;
> > > > >      }
> > > > >      if (nc->queue_index == 0) {
> > > > > @@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > >  
> > > > >  err:
> > > > >      if (user) {
> > > > > -        vhost_user_cleanup(user);
> > > > > -        g_free(user);
> > > > > +        //vhost_user_cleanup(user);
> > > > > +        //g_free(user);
> > > > >          s->vhost_user = NULL;
> > > > >      }
> > > > >  
> > > > 
> > > > 
> > > > So the following at least gets rid of the crashes.
> > > > I am not sure it does not leak memory though,
> > > > and not sure there aren't any configurations where
> > > > the 1st queue gets cleaned up first.
> > > > 
> > > > Thoughts?
> > > 
> > > Thank you so much for catching it and fixing
> > > it! I'll keep your SoB there. Thank you so
> > > much! I do appreciate it!
> > > 
> > > You are right. This structure is freed multiple
> > > times when multi-queue is enabled.
> > 
> > After a deeper digging, I got your point now..
> > It could be a use-after-free instead of a double
> > free.. As it's safe to deinit the char which is
> > shared by all queue pairs when cleanup the 1st
> > queue pair, it should be safe to free vhost-user
> > structure there too.
> 
> One thing worried me is that, I can't reproduce
> the crash with `make check-qtest-x86_64`.

I sent a patch that will make test fail on qemu coredump.

> I tried
> to run it a lot of times, but the only output I
> got each time was:
> 
>         CHK version_gen.h
>   GTESTER check-qtest-x86_64
> 
> I did a quick glance of the `check-qtest-x86_64`
> target in the makefile, it seems that the relevant
> test is `tests/vhost-user-test`. So I also tried
> to run this test directly:
> 
> make tests/vhost-user-test
> while true; do
>   QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 ./tests/vhost-user-test
> done
> 
> And the only output in each loop I got was:
> 
> /x86_64/vhost-user/migrate: OK
> /x86_64/vhost-user/multiqueue: OK
> /x86_64/vhost-user/read-guest-mem/memfd: OK
> /x86_64/vhost-user/read-guest-mem/memfile: OK
> 
> So I'm still not quite sure what caused the crash
> on your side. But it does make more sense to free
> the vhost-user structure only when cleanup the
> 1st queue pair (i.e. where the `chr` is deinit-ed).
> 
> I have sent a new patch set with above change:
> 
> http://lists.gnu.org/archive/html/qemu-devel/2018-05/msg05508.html
> https://patchwork.kernel.org/bundle/btw/vhost-user-host-notifiers/
> 
> Because the above change is got from your diff
> and also based on your analysis, I kept your SoB
> in that patch (if you have any concern about it,
> please let me know).
> 
> In this patch set, I also introduced a protocol
> feature to allow slave to send fds to master via
> the slave channel.
> 
> If you still see crashes with the new patch set,
> please provide me a bit more details, e.g. the
> crash message. Thanks a lot!
> 
> Best regards,
> Tiwei Bie

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
  2018-05-24 13:55                 ` [virtio-dev] " Michael S. Tsirkin
@ 2018-05-24 14:54                   ` Tiwei Bie
  -1 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-24 14:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 04:55:04PM +0300, Michael S. Tsirkin wrote:
> On Thu, May 24, 2018 at 06:59:36PM +0800, Tiwei Bie wrote:
> > On Thu, May 24, 2018 at 10:24:40AM +0800, Tiwei Bie wrote:
> > > On Thu, May 24, 2018 at 07:21:01AM +0800, Tiwei Bie wrote:
> > > > On Wed, May 23, 2018 at 06:43:29PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, May 23, 2018 at 06:36:05PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> > > > > > > On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > > > > > > > When multi queue is enabled e.g. for a virtio-net device,
> > > > > > > > each queue pair will have a vhost_dev, and the only thing
> > > > > > > > shared between vhost devs currently is the chardev. This
> > > > > > > > patch introduces a vhost-user state structure which will
> > > > > > > > be shared by all vhost devs of the same virtio device.
> > > > > > > > 
> > > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > > 
> > > > > > > Unfortunately this patch seems to cause crashes.
> > > > > > > To reproduce, simply run
> > > > > > > make check-qtest-x86_64
> > > > > > > 
> > > > > > > Sorry that it took me a while to find - it triggers 90% of runs but not
> > > > > > > 100% which complicates bisection somewhat.
> > > > 
> > > > It's my fault to not notice this bug before.
> > > > I'm very sorry. Thank you so much for finding
> > > > the root cause!
> > > > 
> > > > > > > 
> > > > > > > > ---
> > > > > > > >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> > > > > > > >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> > > > > > > >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> > > > > > > >  hw/virtio/Makefile.objs             |  2 +-
> > > > > > > >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> > > > > > > >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> > > > > > > >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> > > > > > > >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> > > > > > > >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> > > > > > > >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> > > > > > > >  10 files changed, 149 insertions(+), 20 deletions(-)
> > > > > > > >  create mode 100644 include/hw/virtio/vhost-user.h
> > > > [...]
> > > > > > > >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > > > > > > >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > > > > > > > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > > > > >      assert(s->vhost_net);
> > > > > > > >  
> > > > > > > >      return 0;
> > > > > > > > +
> > > > > > > > +err:
> > > > > > > > +    if (user) {
> > > > > > > > +        vhost_user_cleanup(user);
> > > > > > > > +        g_free(user);
> > > > > > > > +        s->vhost_user = NULL;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > > +    return -1;
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  static Chardev *net_vhost_claim_chardev(
> > > > > > > > -- 
> > > > > > > > 2.11.0
> > > > > > 
> > > > > > So far I figured out that commenting the free of
> > > > > > the structure removes the crash, so we seem to
> > > > > > be dealing with a use-after free here.
> > > > > > I suspect that in a MQ situation, one queue gets
> > > > > > closed and attempts to free the structure
> > > > > > while others still use it.
> > > > > > 
> > > > > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > > > > index 525a061..6a1573b 100644
> > > > > > --- a/net/vhost-user.c
> > > > > > +++ b/net/vhost-user.c
> > > > > > @@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> > > > > >          s->vhost_net = NULL;
> > > > > >      }
> > > > > >      if (s->vhost_user) {
> > > > > > -        vhost_user_cleanup(s->vhost_user);
> > > > > > -        g_free(s->vhost_user);
> > > > > > +        //vhost_user_cleanup(s->vhost_user);
> > > > > > +        //g_free(s->vhost_user);
> > > > > >          s->vhost_user = NULL;
> > > > > >      }
> > > > > >      if (nc->queue_index == 0) {
> > > > > > @@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > > >  
> > > > > >  err:
> > > > > >      if (user) {
> > > > > > -        vhost_user_cleanup(user);
> > > > > > -        g_free(user);
> > > > > > +        //vhost_user_cleanup(user);
> > > > > > +        //g_free(user);
> > > > > >          s->vhost_user = NULL;
> > > > > >      }
> > > > > >  
> > > > > 
> > > > > 
> > > > > So the following at least gets rid of the crashes.
> > > > > I am not sure it does not leak memory though,
> > > > > and not sure there aren't any configurations where
> > > > > the 1st queue gets cleaned up first.
> > > > > 
> > > > > Thoughts?
> > > > 
> > > > Thank you so much for catching it and fixing
> > > > it! I'll keep your SoB there. Thank you so
> > > > much! I do appreciate it!
> > > > 
> > > > You are right. This structure is freed multiple
> > > > times when multi-queue is enabled.
> > > 
> > > After a deeper digging, I got your point now..
> > > It could be a use-after-free instead of a double
> > > free.. As it's safe to deinit the char which is
> > > shared by all queue pairs when cleanup the 1st
> > > queue pair, it should be safe to free vhost-user
> > > structure there too.
> > 
> > One thing worried me is that, I can't reproduce
> > the crash with `make check-qtest-x86_64`. I tried
> > to run it a lot of times, but the only output I
> > got each time was:
> > 
> >         CHK version_gen.h
> >   GTESTER check-qtest-x86_64
> 
> The issue triggers on cleanup, this is why the test passes. Enable
> coredump, then you will see the core file afterwards.
> 
> We should probably teach the tests to check qemu
> return status, that will catch this faster.

Awesome! You're right. I met the coredump after my
first try on my old patch set.

I checked my new patch set with below script, and
didn't meet the coredump any more:

```
SUCCESS_CNT=0
FAILURE_CNT=0

while true; do
    echo "-------"
    QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 ./tests/vhost-user-test
    if [[ -e core ]]; then
        FAILURE_CNT=$((FAILURE_CNT+1))
        rm -f core
    else
        SUCCESS_CNT=$((SUCCESS_CNT+1))
    fi
    echo "Success: $SUCCESS_CNT Failure: $FAILURE_CNT"
done
```

......
/x86_64/vhost-user/migrate: OK
/x86_64/vhost-user/multiqueue: OK
/x86_64/vhost-user/read-guest-mem/memfd: OK
/x86_64/vhost-user/read-guest-mem/memfile: OK
Success: 111 Failure: 0

I.e. for the new patch set, after 111 tests, I didn't
see any crash any more.

But using the same script to test my old patch set,
it always failed:

......
/x86_64/vhost-user/migrate: OK
/x86_64/vhost-user/multiqueue: OK
/x86_64/vhost-user/read-guest-mem/memfd: OK
/x86_64/vhost-user/read-guest-mem/memfile: OK
Success: 0 Failure: 26

I think the crash has been fixed in the new patch set.

FYI, below is the new patch set:
http://lists.gnu.org/archive/html/qemu-devel/2018-05/msg05508.html
And below is the patch that has been fixed in the new patch set:
http://lists.gnu.org/archive/html/qemu-devel/2018-05/msg05506.html

Thank you so much!

Best regards,
Tiwei Bie


> 
> 
> > I did a quick glance of the `check-qtest-x86_64`
> > target in the makefile, it seems that the relevant
> > test is `tests/vhost-user-test`. So I also tried
> > to run this test directly:
> > 
> > make tests/vhost-user-test
> > while true; do
> >   QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 ./tests/vhost-user-test
> > done
> > 
> > And the only output in each loop I got was:
> > 
> > /x86_64/vhost-user/migrate: OK
> > /x86_64/vhost-user/multiqueue: OK
> > /x86_64/vhost-user/read-guest-mem/memfd: OK
> > /x86_64/vhost-user/read-guest-mem/memfile: OK
> > 
> > So I'm still not quite sure what caused the crash
> > on your side. But it does make more sense to free
> > the vhost-user structure only when cleanup the
> > 1st queue pair (i.e. where the `chr` is deinit-ed).
> > 
> > I have sent a new patch set with above change:
> > 
> > http://lists.gnu.org/archive/html/qemu-devel/2018-05/msg05508.html
> > https://patchwork.kernel.org/bundle/btw/vhost-user-host-notifiers/
> > 
> > Because the above change is got from your diff
> > and also based on your analysis, I kept your SoB
> > in that patch (if you have any concern about it,
> > please let me know).
> > 
> > In this patch set, I also introduced a protocol
> > feature to allow slave to send fds to master via
> > the slave channel.
> > 
> > If you still see crashes with the new patch set,
> > please provide me a bit more details, e.g. the
> > crash message. Thanks a lot!
> > 
> > Best regards,
> > Tiwei Bie
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [virtio-dev] Re: [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
@ 2018-05-24 14:54                   ` Tiwei Bie
  0 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-24 14:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 04:55:04PM +0300, Michael S. Tsirkin wrote:
> On Thu, May 24, 2018 at 06:59:36PM +0800, Tiwei Bie wrote:
> > On Thu, May 24, 2018 at 10:24:40AM +0800, Tiwei Bie wrote:
> > > On Thu, May 24, 2018 at 07:21:01AM +0800, Tiwei Bie wrote:
> > > > On Wed, May 23, 2018 at 06:43:29PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, May 23, 2018 at 06:36:05PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> > > > > > > On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > > > > > > > When multi queue is enabled e.g. for a virtio-net device,
> > > > > > > > each queue pair will have a vhost_dev, and the only thing
> > > > > > > > shared between vhost devs currently is the chardev. This
> > > > > > > > patch introduces a vhost-user state structure which will
> > > > > > > > be shared by all vhost devs of the same virtio device.
> > > > > > > > 
> > > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > > 
> > > > > > > Unfortunately this patch seems to cause crashes.
> > > > > > > To reproduce, simply run
> > > > > > > make check-qtest-x86_64
> > > > > > > 
> > > > > > > Sorry that it took me a while to find - it triggers 90% of runs but not
> > > > > > > 100% which complicates bisection somewhat.
> > > > 
> > > > It's my fault to not notice this bug before.
> > > > I'm very sorry. Thank you so much for finding
> > > > the root cause!
> > > > 
> > > > > > > 
> > > > > > > > ---
> > > > > > > >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> > > > > > > >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> > > > > > > >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> > > > > > > >  hw/virtio/Makefile.objs             |  2 +-
> > > > > > > >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> > > > > > > >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> > > > > > > >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> > > > > > > >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> > > > > > > >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> > > > > > > >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> > > > > > > >  10 files changed, 149 insertions(+), 20 deletions(-)
> > > > > > > >  create mode 100644 include/hw/virtio/vhost-user.h
> > > > [...]
> > > > > > > >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > > > > > > >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > > > > > > > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > > > > >      assert(s->vhost_net);
> > > > > > > >  
> > > > > > > >      return 0;
> > > > > > > > +
> > > > > > > > +err:
> > > > > > > > +    if (user) {
> > > > > > > > +        vhost_user_cleanup(user);
> > > > > > > > +        g_free(user);
> > > > > > > > +        s->vhost_user = NULL;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > > +    return -1;
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  static Chardev *net_vhost_claim_chardev(
> > > > > > > > -- 
> > > > > > > > 2.11.0
> > > > > > 
> > > > > > So far I figured out that commenting the free of
> > > > > > the structure removes the crash, so we seem to
> > > > > > be dealing with a use-after free here.
> > > > > > I suspect that in a MQ situation, one queue gets
> > > > > > closed and attempts to free the structure
> > > > > > while others still use it.
> > > > > > 
> > > > > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > > > > index 525a061..6a1573b 100644
> > > > > > --- a/net/vhost-user.c
> > > > > > +++ b/net/vhost-user.c
> > > > > > @@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> > > > > >          s->vhost_net = NULL;
> > > > > >      }
> > > > > >      if (s->vhost_user) {
> > > > > > -        vhost_user_cleanup(s->vhost_user);
> > > > > > -        g_free(s->vhost_user);
> > > > > > +        //vhost_user_cleanup(s->vhost_user);
> > > > > > +        //g_free(s->vhost_user);
> > > > > >          s->vhost_user = NULL;
> > > > > >      }
> > > > > >      if (nc->queue_index == 0) {
> > > > > > @@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > > >  
> > > > > >  err:
> > > > > >      if (user) {
> > > > > > -        vhost_user_cleanup(user);
> > > > > > -        g_free(user);
> > > > > > +        //vhost_user_cleanup(user);
> > > > > > +        //g_free(user);
> > > > > >          s->vhost_user = NULL;
> > > > > >      }
> > > > > >  
> > > > > 
> > > > > 
> > > > > So the following at least gets rid of the crashes.
> > > > > I am not sure it does not leak memory though,
> > > > > and not sure there aren't any configurations where
> > > > > the 1st queue gets cleaned up first.
> > > > > 
> > > > > Thoughts?
> > > > 
> > > > Thank you so much for catching it and fixing
> > > > it! I'll keep your SoB there. Thank you so
> > > > much! I do appreciate it!
> > > > 
> > > > You are right. This structure is freed multiple
> > > > times when multi-queue is enabled.
> > > 
> > > After a deeper digging, I got your point now..
> > > It could be a use-after-free instead of a double
> > > free.. As it's safe to deinit the char which is
> > > shared by all queue pairs when cleanup the 1st
> > > queue pair, it should be safe to free vhost-user
> > > structure there too.
> > 
> > One thing worried me is that, I can't reproduce
> > the crash with `make check-qtest-x86_64`. I tried
> > to run it a lot of times, but the only output I
> > got each time was:
> > 
> >         CHK version_gen.h
> >   GTESTER check-qtest-x86_64
> 
> The issue triggers on cleanup, this is why the test passes. Enable
> coredump, then you will see the core file afterwards.
> 
> We should probably teach the tests to check qemu
> return status, that will catch this faster.

Awesome! You're right. I met the coredump after my
first try on my old patch set.

I checked my new patch set with below script, and
didn't meet the coredump any more:

```
SUCCESS_CNT=0
FAILURE_CNT=0

while true; do
    echo "-------"
    QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 ./tests/vhost-user-test
    if [[ -e core ]]; then
        FAILURE_CNT=$((FAILURE_CNT+1))
        rm -f core
    else
        SUCCESS_CNT=$((SUCCESS_CNT+1))
    fi
    echo "Success: $SUCCESS_CNT Failure: $FAILURE_CNT"
done
```

......
/x86_64/vhost-user/migrate: OK
/x86_64/vhost-user/multiqueue: OK
/x86_64/vhost-user/read-guest-mem/memfd: OK
/x86_64/vhost-user/read-guest-mem/memfile: OK
Success: 111 Failure: 0

I.e. for the new patch set, after 111 tests, I didn't
see any crash any more.

But using the same script to test my old patch set,
it always failed:

......
/x86_64/vhost-user/migrate: OK
/x86_64/vhost-user/multiqueue: OK
/x86_64/vhost-user/read-guest-mem/memfd: OK
/x86_64/vhost-user/read-guest-mem/memfile: OK
Success: 0 Failure: 26

I think the crash has been fixed in the new patch set.

FYI, below is the new patch set:
http://lists.gnu.org/archive/html/qemu-devel/2018-05/msg05508.html
And below is the patch that has been fixed in the new patch set:
http://lists.gnu.org/archive/html/qemu-devel/2018-05/msg05506.html

Thank you so much!

Best regards,
Tiwei Bie


> 
> 
> > I did a quick glance of the `check-qtest-x86_64`
> > target in the makefile, it seems that the relevant
> > test is `tests/vhost-user-test`. So I also tried
> > to run this test directly:
> > 
> > make tests/vhost-user-test
> > while true; do
> >   QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 ./tests/vhost-user-test
> > done
> > 
> > And the only output in each loop I got was:
> > 
> > /x86_64/vhost-user/migrate: OK
> > /x86_64/vhost-user/multiqueue: OK
> > /x86_64/vhost-user/read-guest-mem/memfd: OK
> > /x86_64/vhost-user/read-guest-mem/memfile: OK
> > 
> > So I'm still not quite sure what caused the crash
> > on your side. But it does make more sense to free
> > the vhost-user structure only when cleanup the
> > 1st queue pair (i.e. where the `chr` is deinit-ed).
> > 
> > I have sent a new patch set with above change:
> > 
> > http://lists.gnu.org/archive/html/qemu-devel/2018-05/msg05508.html
> > https://patchwork.kernel.org/bundle/btw/vhost-user-host-notifiers/
> > 
> > Because the above change is got from your diff
> > and also based on your analysis, I kept your SoB
> > in that patch (if you have any concern about it,
> > please let me know).
> > 
> > In this patch set, I also introduced a protocol
> > feature to allow slave to send fds to master via
> > the slave channel.
> > 
> > If you still see crashes with the new patch set,
> > please provide me a bit more details, e.g. the
> > crash message. Thanks a lot!
> > 
> > Best regards,
> > Tiwei Bie
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 3/6] vhost-user: support receiving file descriptors in slave_read
  2018-05-24 13:48         ` Michael S. Tsirkin
@ 2018-05-24 14:56           ` Tiwei Bie
  -1 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-24 14:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 04:48:09PM +0300, Michael S. Tsirkin wrote:
> On Thu, May 24, 2018 at 07:12:15AM +0800, Tiwei Bie wrote:
> > On Thu, May 24, 2018 at 12:25:23AM +0300, Michael S. Tsirkin wrote:
> > > On Thu, Apr 12, 2018 at 11:12:29PM +0800, Tiwei Bie wrote:
> > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > 
> > > 
> > > Thinking about it, I think we should add a protocol
> > > feature for this. This way remote can find out whether
> > > it's safe to send this data to us.
> > 
> > Okay, I can add a protocol feature for this.
> > Do you think it's OK to keep this patch as is
> > (this patch just extends slave_read() to support
> > receiving file descriptors) and introduce the
> > new protocol feature in a new patch to allow
> > backends to send file descriptors?
> > 
> > Best regards,
> > Tiwei Bie
> 
> I already merged patch as is, we can add a patch on top that limits this
> to a protocol feature.

Got it. Thanks a lot!

Best regards,
Tiwei Bie

> 
> > > 
> > > > ---
> > > >  hw/virtio/vhost-user.c | 41 ++++++++++++++++++++++++++++++++++++++++-
> > > >  1 file changed, 40 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > > index 91edd95453..9cea2c8c51 100644
> > > > --- a/hw/virtio/vhost-user.c
> > > > +++ b/hw/virtio/vhost-user.c
> > > > @@ -854,14 +854,44 @@ static void slave_read(void *opaque)
> > > >      VhostUserHeader hdr = { 0, };
> > > >      VhostUserPayload payload = { 0, };
> > > >      int size, ret = 0;
> > > > +    struct iovec iov;
> > > > +    struct msghdr msgh;
> > > > +    int fd = -1;
> > > > +    char control[CMSG_SPACE(sizeof(fd))];
> > > > +    struct cmsghdr *cmsg;
> > > > +    size_t fdsize;
> > > > +
> > > > +    memset(&msgh, 0, sizeof(msgh));
> > > > +    msgh.msg_iov = &iov;
> > > > +    msgh.msg_iovlen = 1;
> > > > +    msgh.msg_control = control;
> > > > +    msgh.msg_controllen = sizeof(control);
> > > >  
> > > >      /* Read header */
> > > > -    size = read(u->slave_fd, &hdr, VHOST_USER_HDR_SIZE);
> > > > +    iov.iov_base = &hdr;
> > > > +    iov.iov_len = VHOST_USER_HDR_SIZE;
> > > > +
> > > > +    size = recvmsg(u->slave_fd, &msgh, 0);
> > > >      if (size != VHOST_USER_HDR_SIZE) {
> > > >          error_report("Failed to read from slave.");
> > > >          goto err;
> > > >      }
> > > >  
> > > > +    if (msgh.msg_flags & MSG_CTRUNC) {
> > > > +        error_report("Truncated message.");
> > > > +        goto err;
> > > > +    }
> > > > +
> > > > +    for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
> > > > +         cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
> > > > +            if (cmsg->cmsg_level == SOL_SOCKET &&
> > > > +                cmsg->cmsg_type == SCM_RIGHTS) {
> > > > +                    fdsize = cmsg->cmsg_len - CMSG_LEN(0);
> > > > +                    memcpy(&fd, CMSG_DATA(cmsg), fdsize);
> > > > +                    break;
> > > > +            }
> > > > +    }
> > > > +
> > > >      if (hdr.size > VHOST_USER_PAYLOAD_SIZE) {
> > > >          error_report("Failed to read msg header."
> > > >                  " Size %d exceeds the maximum %zu.", hdr.size,
> > > > @@ -885,9 +915,15 @@ static void slave_read(void *opaque)
> > > >          break;
> > > >      default:
> > > >          error_report("Received unexpected msg type.");
> > > > +        if (fd != -1) {
> > > > +            close(fd);
> > > > +        }
> > > >          ret = -EINVAL;
> > > >      }
> > > >  
> > > > +    /* Message handlers need to make sure that fd will be consumed. */
> > > > +    fd = -1;
> > > > +
> > > >      /*
> > > >       * REPLY_ACK feature handling. Other reply types has to be managed
> > > >       * directly in their request handlers.
> > > > @@ -920,6 +956,9 @@ err:
> > > >      qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
> > > >      close(u->slave_fd);
> > > >      u->slave_fd = -1;
> > > > +    if (fd != -1) {
> > > > +        close(fd);
> > > > +    }
> > > >      return;
> > > >  }
> > > >  
> > > > -- 
> > > > 2.11.0
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [virtio-dev] Re: [PATCH v3 3/6] vhost-user: support receiving file descriptors in slave_read
@ 2018-05-24 14:56           ` Tiwei Bie
  0 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-24 14:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 04:48:09PM +0300, Michael S. Tsirkin wrote:
> On Thu, May 24, 2018 at 07:12:15AM +0800, Tiwei Bie wrote:
> > On Thu, May 24, 2018 at 12:25:23AM +0300, Michael S. Tsirkin wrote:
> > > On Thu, Apr 12, 2018 at 11:12:29PM +0800, Tiwei Bie wrote:
> > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > 
> > > 
> > > Thinking about it, I think we should add a protocol
> > > feature for this. This way remote can find out whether
> > > it's safe to send this data to us.
> > 
> > Okay, I can add a protocol feature for this.
> > Do you think it's OK to keep this patch as is
> > (this patch just extends slave_read() to support
> > receiving file descriptors) and introduce the
> > new protocol feature in a new patch to allow
> > backends to send file descriptors?
> > 
> > Best regards,
> > Tiwei Bie
> 
> I already merged patch as is, we can add a patch on top that limits this
> to a protocol feature.

Got it. Thanks a lot!

Best regards,
Tiwei Bie

> 
> > > 
> > > > ---
> > > >  hw/virtio/vhost-user.c | 41 ++++++++++++++++++++++++++++++++++++++++-
> > > >  1 file changed, 40 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > > index 91edd95453..9cea2c8c51 100644
> > > > --- a/hw/virtio/vhost-user.c
> > > > +++ b/hw/virtio/vhost-user.c
> > > > @@ -854,14 +854,44 @@ static void slave_read(void *opaque)
> > > >      VhostUserHeader hdr = { 0, };
> > > >      VhostUserPayload payload = { 0, };
> > > >      int size, ret = 0;
> > > > +    struct iovec iov;
> > > > +    struct msghdr msgh;
> > > > +    int fd = -1;
> > > > +    char control[CMSG_SPACE(sizeof(fd))];
> > > > +    struct cmsghdr *cmsg;
> > > > +    size_t fdsize;
> > > > +
> > > > +    memset(&msgh, 0, sizeof(msgh));
> > > > +    msgh.msg_iov = &iov;
> > > > +    msgh.msg_iovlen = 1;
> > > > +    msgh.msg_control = control;
> > > > +    msgh.msg_controllen = sizeof(control);
> > > >  
> > > >      /* Read header */
> > > > -    size = read(u->slave_fd, &hdr, VHOST_USER_HDR_SIZE);
> > > > +    iov.iov_base = &hdr;
> > > > +    iov.iov_len = VHOST_USER_HDR_SIZE;
> > > > +
> > > > +    size = recvmsg(u->slave_fd, &msgh, 0);
> > > >      if (size != VHOST_USER_HDR_SIZE) {
> > > >          error_report("Failed to read from slave.");
> > > >          goto err;
> > > >      }
> > > >  
> > > > +    if (msgh.msg_flags & MSG_CTRUNC) {
> > > > +        error_report("Truncated message.");
> > > > +        goto err;
> > > > +    }
> > > > +
> > > > +    for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
> > > > +         cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
> > > > +            if (cmsg->cmsg_level == SOL_SOCKET &&
> > > > +                cmsg->cmsg_type == SCM_RIGHTS) {
> > > > +                    fdsize = cmsg->cmsg_len - CMSG_LEN(0);
> > > > +                    memcpy(&fd, CMSG_DATA(cmsg), fdsize);
> > > > +                    break;
> > > > +            }
> > > > +    }
> > > > +
> > > >      if (hdr.size > VHOST_USER_PAYLOAD_SIZE) {
> > > >          error_report("Failed to read msg header."
> > > >                  " Size %d exceeds the maximum %zu.", hdr.size,
> > > > @@ -885,9 +915,15 @@ static void slave_read(void *opaque)
> > > >          break;
> > > >      default:
> > > >          error_report("Received unexpected msg type.");
> > > > +        if (fd != -1) {
> > > > +            close(fd);
> > > > +        }
> > > >          ret = -EINVAL;
> > > >      }
> > > >  
> > > > +    /* Message handlers need to make sure that fd will be consumed. */
> > > > +    fd = -1;
> > > > +
> > > >      /*
> > > >       * REPLY_ACK feature handling. Other reply types has to be managed
> > > >       * directly in their request handlers.
> > > > @@ -920,6 +956,9 @@ err:
> > > >      qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
> > > >      close(u->slave_fd);
> > > >      u->slave_fd = -1;
> > > > +    if (fd != -1) {
> > > > +        close(fd);
> > > > +    }
> > > >      return;
> > > >  }
> > > >  
> > > > -- 
> > > > 2.11.0
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
  2018-05-24 14:30                 ` [virtio-dev] " Michael S. Tsirkin
@ 2018-05-24 15:22                   ` Tiwei Bie
  -1 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-24 15:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 05:30:45PM +0300, Michael S. Tsirkin wrote:
> On Thu, May 24, 2018 at 06:59:36PM +0800, Tiwei Bie wrote:
> > On Thu, May 24, 2018 at 10:24:40AM +0800, Tiwei Bie wrote:
> > > On Thu, May 24, 2018 at 07:21:01AM +0800, Tiwei Bie wrote:
> > > > On Wed, May 23, 2018 at 06:43:29PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, May 23, 2018 at 06:36:05PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> > > > > > > On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > > > > > > > When multi queue is enabled e.g. for a virtio-net device,
> > > > > > > > each queue pair will have a vhost_dev, and the only thing
> > > > > > > > shared between vhost devs currently is the chardev. This
> > > > > > > > patch introduces a vhost-user state structure which will
> > > > > > > > be shared by all vhost devs of the same virtio device.
> > > > > > > > 
> > > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > > 
> > > > > > > Unfortunately this patch seems to cause crashes.
> > > > > > > To reproduce, simply run
> > > > > > > make check-qtest-x86_64
> > > > > > > 
> > > > > > > Sorry that it took me a while to find - it triggers 90% of runs but not
> > > > > > > 100% which complicates bisection somewhat.
> > > > 
> > > > It's my fault to not notice this bug before.
> > > > I'm very sorry. Thank you so much for finding
> > > > the root cause!
> > > > 
> > > > > > > 
> > > > > > > > ---
> > > > > > > >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> > > > > > > >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> > > > > > > >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> > > > > > > >  hw/virtio/Makefile.objs             |  2 +-
> > > > > > > >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> > > > > > > >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> > > > > > > >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> > > > > > > >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> > > > > > > >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> > > > > > > >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> > > > > > > >  10 files changed, 149 insertions(+), 20 deletions(-)
> > > > > > > >  create mode 100644 include/hw/virtio/vhost-user.h
> > > > [...]
> > > > > > > >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > > > > > > >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > > > > > > > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > > > > >      assert(s->vhost_net);
> > > > > > > >  
> > > > > > > >      return 0;
> > > > > > > > +
> > > > > > > > +err:
> > > > > > > > +    if (user) {
> > > > > > > > +        vhost_user_cleanup(user);
> > > > > > > > +        g_free(user);
> > > > > > > > +        s->vhost_user = NULL;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > > +    return -1;
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  static Chardev *net_vhost_claim_chardev(
> > > > > > > > -- 
> > > > > > > > 2.11.0
> > > > > > 
> > > > > > So far I figured out that commenting the free of
> > > > > > the structure removes the crash, so we seem to
> > > > > > be dealing with a use-after free here.
> > > > > > I suspect that in a MQ situation, one queue gets
> > > > > > closed and attempts to free the structure
> > > > > > while others still use it.
> > > > > > 
> > > > > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > > > > index 525a061..6a1573b 100644
> > > > > > --- a/net/vhost-user.c
> > > > > > +++ b/net/vhost-user.c
> > > > > > @@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> > > > > >          s->vhost_net = NULL;
> > > > > >      }
> > > > > >      if (s->vhost_user) {
> > > > > > -        vhost_user_cleanup(s->vhost_user);
> > > > > > -        g_free(s->vhost_user);
> > > > > > +        //vhost_user_cleanup(s->vhost_user);
> > > > > > +        //g_free(s->vhost_user);
> > > > > >          s->vhost_user = NULL;
> > > > > >      }
> > > > > >      if (nc->queue_index == 0) {
> > > > > > @@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > > >  
> > > > > >  err:
> > > > > >      if (user) {
> > > > > > -        vhost_user_cleanup(user);
> > > > > > -        g_free(user);
> > > > > > +        //vhost_user_cleanup(user);
> > > > > > +        //g_free(user);
> > > > > >          s->vhost_user = NULL;
> > > > > >      }
> > > > > >  
> > > > > 
> > > > > 
> > > > > So the following at least gets rid of the crashes.
> > > > > I am not sure it does not leak memory though,
> > > > > and not sure there aren't any configurations where
> > > > > the 1st queue gets cleaned up first.
> > > > > 
> > > > > Thoughts?
> > > > 
> > > > Thank you so much for catching it and fixing
> > > > it! I'll keep your SoB there. Thank you so
> > > > much! I do appreciate it!
> > > > 
> > > > You are right. This structure is freed multiple
> > > > times when multi-queue is enabled.
> > > 
> > > After a deeper digging, I got your point now..
> > > It could be a use-after-free instead of a double
> > > free.. As it's safe to deinit the char which is
> > > shared by all queue pairs when cleanup the 1st
> > > queue pair, it should be safe to free vhost-user
> > > structure there too.
> > 
> > One thing worried me is that, I can't reproduce
> > the crash with `make check-qtest-x86_64`.
> 
> I sent a patch that will make test fail on qemu coredump.

That would be very useful!

Without your explanation, I guess I won't be able to
notice such crash in this test.

Best regards,
Tiwei Bie

> 
> > I tried
> > to run it a lot of times, but the only output I
> > got each time was:
> > 
> >         CHK version_gen.h
> >   GTESTER check-qtest-x86_64
> > 
> > I did a quick glance of the `check-qtest-x86_64`
> > target in the makefile, it seems that the relevant
> > test is `tests/vhost-user-test`. So I also tried
> > to run this test directly:
> > 
> > make tests/vhost-user-test
> > while true; do
> >   QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 ./tests/vhost-user-test
> > done
> > 
> > And the only output in each loop I got was:
> > 
> > /x86_64/vhost-user/migrate: OK
> > /x86_64/vhost-user/multiqueue: OK
> > /x86_64/vhost-user/read-guest-mem/memfd: OK
> > /x86_64/vhost-user/read-guest-mem/memfile: OK
> > 
> > So I'm still not quite sure what caused the crash
> > on your side. But it does make more sense to free
> > the vhost-user structure only when cleanup the
> > 1st queue pair (i.e. where the `chr` is deinit-ed).
> > 
> > I have sent a new patch set with above change:
> > 
> > http://lists.gnu.org/archive/html/qemu-devel/2018-05/msg05508.html
> > https://patchwork.kernel.org/bundle/btw/vhost-user-host-notifiers/
> > 
> > Because the above change is got from your diff
> > and also based on your analysis, I kept your SoB
> > in that patch (if you have any concern about it,
> > please let me know).
> > 
> > In this patch set, I also introduced a protocol
> > feature to allow slave to send fds to master via
> > the slave channel.
> > 
> > If you still see crashes with the new patch set,
> > please provide me a bit more details, e.g. the
> > crash message. Thanks a lot!
> > 
> > Best regards,
> > Tiwei Bie

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [virtio-dev] Re: [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
@ 2018-05-24 15:22                   ` Tiwei Bie
  0 siblings, 0 replies; 98+ messages in thread
From: Tiwei Bie @ 2018-05-24 15:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, alex.williamson, pbonzini, stefanha, qemu-devel,
	virtio-dev, cunming.liang, dan.daly, jianfeng.tan, zhihong.wang,
	xiao.w.wang

On Thu, May 24, 2018 at 05:30:45PM +0300, Michael S. Tsirkin wrote:
> On Thu, May 24, 2018 at 06:59:36PM +0800, Tiwei Bie wrote:
> > On Thu, May 24, 2018 at 10:24:40AM +0800, Tiwei Bie wrote:
> > > On Thu, May 24, 2018 at 07:21:01AM +0800, Tiwei Bie wrote:
> > > > On Wed, May 23, 2018 at 06:43:29PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, May 23, 2018 at 06:36:05PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, May 23, 2018 at 04:44:51PM +0300, Michael S. Tsirkin wrote:
> > > > > > > On Thu, Apr 12, 2018 at 11:12:28PM +0800, Tiwei Bie wrote:
> > > > > > > > When multi queue is enabled e.g. for a virtio-net device,
> > > > > > > > each queue pair will have a vhost_dev, and the only thing
> > > > > > > > shared between vhost devs currently is the chardev. This
> > > > > > > > patch introduces a vhost-user state structure which will
> > > > > > > > be shared by all vhost devs of the same virtio device.
> > > > > > > > 
> > > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > > 
> > > > > > > Unfortunately this patch seems to cause crashes.
> > > > > > > To reproduce, simply run
> > > > > > > make check-qtest-x86_64
> > > > > > > 
> > > > > > > Sorry that it took me a while to find - it triggers 90% of runs but not
> > > > > > > 100% which complicates bisection somewhat.
> > > > 
> > > > It's my fault to not notice this bug before.
> > > > I'm very sorry. Thank you so much for finding
> > > > the root cause!
> > > > 
> > > > > > > 
> > > > > > > > ---
> > > > > > > >  backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
> > > > > > > >  hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
> > > > > > > >  hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
> > > > > > > >  hw/virtio/Makefile.objs             |  2 +-
> > > > > > > >  hw/virtio/vhost-stub.c              | 10 ++++++++++
> > > > > > > >  hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
> > > > > > > >  include/hw/virtio/vhost-user-blk.h  |  2 ++
> > > > > > > >  include/hw/virtio/vhost-user-scsi.h |  2 ++
> > > > > > > >  include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
> > > > > > > >  net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
> > > > > > > >  10 files changed, 149 insertions(+), 20 deletions(-)
> > > > > > > >  create mode 100644 include/hw/virtio/vhost-user.h
> > > > [...]
> > > > > > > >          qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
> > > > > > > >                                   net_vhost_user_event, NULL, nc0->name, NULL,
> > > > > > > > @@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > > > > >      assert(s->vhost_net);
> > > > > > > >  
> > > > > > > >      return 0;
> > > > > > > > +
> > > > > > > > +err:
> > > > > > > > +    if (user) {
> > > > > > > > +        vhost_user_cleanup(user);
> > > > > > > > +        g_free(user);
> > > > > > > > +        s->vhost_user = NULL;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > > +    return -1;
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  static Chardev *net_vhost_claim_chardev(
> > > > > > > > -- 
> > > > > > > > 2.11.0
> > > > > > 
> > > > > > So far I figured out that commenting the free of
> > > > > > the structure removes the crash, so we seem to
> > > > > > be dealing with a use-after free here.
> > > > > > I suspect that in a MQ situation, one queue gets
> > > > > > closed and attempts to free the structure
> > > > > > while others still use it.
> > > > > > 
> > > > > > diff --git a/net/vhost-user.c b/net/vhost-user.c
> > > > > > index 525a061..6a1573b 100644
> > > > > > --- a/net/vhost-user.c
> > > > > > +++ b/net/vhost-user.c
> > > > > > @@ -157,8 +157,8 @@ static void net_vhost_user_cleanup(NetClientState *nc)
> > > > > >          s->vhost_net = NULL;
> > > > > >      }
> > > > > >      if (s->vhost_user) {
> > > > > > -        vhost_user_cleanup(s->vhost_user);
> > > > > > -        g_free(s->vhost_user);
> > > > > > +        //vhost_user_cleanup(s->vhost_user);
> > > > > > +        //g_free(s->vhost_user);
> > > > > >          s->vhost_user = NULL;
> > > > > >      }
> > > > > >      if (nc->queue_index == 0) {
> > > > > > @@ -339,8 +339,8 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > > > >  
> > > > > >  err:
> > > > > >      if (user) {
> > > > > > -        vhost_user_cleanup(user);
> > > > > > -        g_free(user);
> > > > > > +        //vhost_user_cleanup(user);
> > > > > > +        //g_free(user);
> > > > > >          s->vhost_user = NULL;
> > > > > >      }
> > > > > >  
> > > > > 
> > > > > 
> > > > > So the following at least gets rid of the crashes.
> > > > > I am not sure it does not leak memory though,
> > > > > and not sure there aren't any configurations where
> > > > > the 1st queue gets cleaned up first.
> > > > > 
> > > > > Thoughts?
> > > > 
> > > > Thank you so much for catching it and fixing
> > > > it! I'll keep your SoB there. Thank you so
> > > > much! I do appreciate it!
> > > > 
> > > > You are right. This structure is freed multiple
> > > > times when multi-queue is enabled.
> > > 
> > > After a deeper digging, I got your point now..
> > > It could be a use-after-free instead of a double
> > > free.. As it's safe to deinit the char which is
> > > shared by all queue pairs when cleanup the 1st
> > > queue pair, it should be safe to free vhost-user
> > > structure there too.
> > 
> > One thing worried me is that, I can't reproduce
> > the crash with `make check-qtest-x86_64`.
> 
> I sent a patch that will make test fail on qemu coredump.

That would be very useful!

Without your explanation, I guess I won't be able to
notice such crash in this test.

Best regards,
Tiwei Bie

> 
> > I tried
> > to run it a lot of times, but the only output I
> > got each time was:
> > 
> >         CHK version_gen.h
> >   GTESTER check-qtest-x86_64
> > 
> > I did a quick glance of the `check-qtest-x86_64`
> > target in the makefile, it seems that the relevant
> > test is `tests/vhost-user-test`. So I also tried
> > to run this test directly:
> > 
> > make tests/vhost-user-test
> > while true; do
> >   QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 ./tests/vhost-user-test
> > done
> > 
> > And the only output in each loop I got was:
> > 
> > /x86_64/vhost-user/migrate: OK
> > /x86_64/vhost-user/multiqueue: OK
> > /x86_64/vhost-user/read-guest-mem/memfd: OK
> > /x86_64/vhost-user/read-guest-mem/memfile: OK
> > 
> > So I'm still not quite sure what caused the crash
> > on your side. But it does make more sense to free
> > the vhost-user structure only when cleanup the
> > 1st queue pair (i.e. where the `chr` is deinit-ed).
> > 
> > I have sent a new patch set with above change:
> > 
> > http://lists.gnu.org/archive/html/qemu-devel/2018-05/msg05508.html
> > https://patchwork.kernel.org/bundle/btw/vhost-user-host-notifiers/
> > 
> > Because the above change is got from your diff
> > and also based on your analysis, I kept your SoB
> > in that patch (if you have any concern about it,
> > please let me know).
> > 
> > In this patch set, I also introduced a protocol
> > feature to allow slave to send fds to master via
> > the slave channel.
> > 
> > If you still see crashes with the new patch set,
> > please provide me a bit more details, e.g. the
> > crash message. Thanks a lot!
> > 
> > Best regards,
> > Tiwei Bie

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 98+ messages in thread

end of thread, other threads:[~2018-05-24 15:21 UTC | newest]

Thread overview: 98+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-12 15:12 [Qemu-devel] [PATCH v3 0/6] Extend vhost-user to support registering external host notifiers Tiwei Bie
2018-04-12 15:12 ` [virtio-dev] " Tiwei Bie
2018-04-12 15:12 ` [Qemu-devel] [PATCH v3 1/6] vhost-user: add Net prefix to internal state structure Tiwei Bie
2018-04-12 15:12   ` [virtio-dev] " Tiwei Bie
2018-04-12 15:12 ` [Qemu-devel] [PATCH v3 2/6] vhost-user: introduce shared vhost-user state Tiwei Bie
2018-04-12 15:12   ` [virtio-dev] " Tiwei Bie
2018-05-23 13:44   ` [Qemu-devel] " Michael S. Tsirkin
2018-05-23 13:44     ` [virtio-dev] " Michael S. Tsirkin
2018-05-23 15:36     ` [Qemu-devel] " Michael S. Tsirkin
2018-05-23 15:36       ` [virtio-dev] " Michael S. Tsirkin
2018-05-23 15:43       ` [Qemu-devel] " Michael S. Tsirkin
2018-05-23 15:43         ` [virtio-dev] " Michael S. Tsirkin
2018-05-23 23:21         ` [Qemu-devel] " Tiwei Bie
2018-05-23 23:21           ` [virtio-dev] " Tiwei Bie
2018-05-24  2:24           ` [Qemu-devel] " Tiwei Bie
2018-05-24  2:24             ` [virtio-dev] " Tiwei Bie
2018-05-24  7:03             ` [Qemu-devel] " Tiwei Bie
2018-05-24  7:03               ` [virtio-dev] " Tiwei Bie
2018-05-24 10:59             ` [Qemu-devel] " Tiwei Bie
2018-05-24 10:59               ` [virtio-dev] " Tiwei Bie
2018-05-24 13:55               ` [Qemu-devel] " Michael S. Tsirkin
2018-05-24 13:55                 ` [virtio-dev] " Michael S. Tsirkin
2018-05-24 14:54                 ` [Qemu-devel] " Tiwei Bie
2018-05-24 14:54                   ` Tiwei Bie
2018-05-24 14:30               ` [Qemu-devel] " Michael S. Tsirkin
2018-05-24 14:30                 ` [virtio-dev] " Michael S. Tsirkin
2018-05-24 15:22                 ` [Qemu-devel] " Tiwei Bie
2018-05-24 15:22                   ` [virtio-dev] " Tiwei Bie
2018-05-24 13:50             ` [Qemu-devel] " Michael S. Tsirkin
2018-05-24 13:50               ` [virtio-dev] " Michael S. Tsirkin
2018-04-12 15:12 ` [Qemu-devel] [PATCH v3 3/6] vhost-user: support receiving file descriptors in slave_read Tiwei Bie
2018-04-12 15:12   ` [virtio-dev] " Tiwei Bie
2018-05-23 21:25   ` [Qemu-devel] " Michael S. Tsirkin
2018-05-23 21:25     ` [virtio-dev] " Michael S. Tsirkin
2018-05-23 23:12     ` [Qemu-devel] " Tiwei Bie
2018-05-23 23:12       ` Tiwei Bie
2018-05-24 13:48       ` [Qemu-devel] " Michael S. Tsirkin
2018-05-24 13:48         ` Michael S. Tsirkin
2018-05-24 14:56         ` [Qemu-devel] " Tiwei Bie
2018-05-24 14:56           ` Tiwei Bie
2018-04-12 15:12 ` [Qemu-devel] [PATCH v3 4/6] virtio: support setting memory region based host notifier Tiwei Bie
2018-04-12 15:12   ` [virtio-dev] " Tiwei Bie
2018-04-12 15:12 ` [Qemu-devel] [PATCH v3 5/6] vhost: allow backends to filter memory sections Tiwei Bie
2018-04-12 15:12   ` [virtio-dev] " Tiwei Bie
2018-04-12 15:12 ` [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers Tiwei Bie
2018-04-12 15:12   ` [virtio-dev] " Tiwei Bie
2018-04-18 16:34   ` [Qemu-devel] " Michael S. Tsirkin
2018-04-18 16:34     ` [virtio-dev] " Michael S. Tsirkin
2018-04-19 11:14     ` [Qemu-devel] " Tiwei Bie
2018-04-19 11:14       ` [virtio-dev] " Tiwei Bie
2018-04-19 12:43       ` [Qemu-devel] " Liang, Cunming
2018-04-19 12:43         ` [virtio-dev] " Liang, Cunming
2018-04-19 13:02         ` [Qemu-devel] " Paolo Bonzini
2018-04-19 13:02           ` Paolo Bonzini
2018-04-19 15:19           ` [Qemu-devel] " Michael S. Tsirkin
2018-04-19 15:19             ` Michael S. Tsirkin
2018-04-19 15:51             ` [Qemu-devel] " Paolo Bonzini
2018-04-19 15:51               ` Paolo Bonzini
2018-04-19 15:59               ` [Qemu-devel] " Michael S. Tsirkin
2018-04-19 15:59                 ` Michael S. Tsirkin
2018-04-19 16:07                 ` [Qemu-devel] " Paolo Bonzini
2018-04-19 16:07                   ` Paolo Bonzini
2018-04-19 16:48                   ` [Qemu-devel] " Michael S. Tsirkin
2018-04-19 16:48                     ` Michael S. Tsirkin
2018-04-19 16:24             ` [Qemu-devel] " Liang, Cunming
2018-04-19 16:24               ` Liang, Cunming
2018-04-19 16:55               ` [Qemu-devel] " Michael S. Tsirkin
2018-04-19 16:55                 ` Michael S. Tsirkin
2018-04-20  3:01                 ` [Qemu-devel] " Liang, Cunming
2018-04-20  3:01                   ` Liang, Cunming
2018-04-19 15:42         ` [Qemu-devel] " Michael S. Tsirkin
2018-04-19 15:42           ` [virtio-dev] " Michael S. Tsirkin
2018-04-19 15:52           ` [Qemu-devel] " Paolo Bonzini
2018-04-19 15:52             ` [virtio-dev] " Paolo Bonzini
2018-04-19 16:34             ` [Qemu-devel] " Michael S. Tsirkin
2018-04-19 16:34               ` [virtio-dev] " Michael S. Tsirkin
2018-04-19 16:52             ` [Qemu-devel] " Liang, Cunming
2018-04-19 16:52               ` [virtio-dev] " Liang, Cunming
2018-04-19 16:59               ` [Qemu-devel] " Paolo Bonzini
2018-04-19 16:59                 ` Paolo Bonzini
2018-04-19 17:27                 ` [Qemu-devel] " Michael S. Tsirkin
2018-04-19 17:27                   ` Michael S. Tsirkin
2018-04-19 17:35                   ` [Qemu-devel] " Paolo Bonzini
2018-04-19 17:35                     ` Paolo Bonzini
2018-04-19 17:39                     ` [Qemu-devel] " Michael S. Tsirkin
2018-04-19 17:39                       ` Michael S. Tsirkin
2018-04-19 17:00               ` [Qemu-devel] " Michael S. Tsirkin
2018-04-19 17:00                 ` [virtio-dev] " Michael S. Tsirkin
2018-04-19 23:05                 ` [Qemu-devel] " Liang, Cunming
2018-04-19 23:05                   ` [virtio-dev] " Liang, Cunming
2018-04-19 16:27           ` [Qemu-devel] " Liang, Cunming
2018-04-19 16:27             ` [virtio-dev] " Liang, Cunming
2018-05-02 10:32       ` [Qemu-devel] " Tiwei Bie
2018-05-02 10:32         ` [virtio-dev] " Tiwei Bie
2018-05-16  1:41 ` [Qemu-devel] [PATCH v3 0/6] Extend vhost-user to " Michael S. Tsirkin
2018-05-16  1:41   ` [virtio-dev] " Michael S. Tsirkin
2018-05-16  1:56   ` [Qemu-devel] " Tiwei Bie
2018-05-16  1:56     ` [virtio-dev] " Tiwei Bie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.