All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 0/9] vDPA support in qemu
@ 2020-05-08 16:32 Cindy Lu
  2020-05-08 16:32 ` [RFC v2 1/9] net: introduce qemu_get_peer Cindy Lu
                   ` (9 more replies)
  0 siblings, 10 replies; 33+ messages in thread
From: Cindy Lu @ 2020-05-08 16:32 UTC (permalink / raw)
  To: mst, armbru, eblake, cohuck, jasowang
  Cc: mhabets, qemu-devel, rob.miller, saugatm, lulu, hanand, hch,
	eperezma, jgg, shahafs, kevin.tian, parav, vmireyno,
	cunming.liang, gdawar, jiri, xiao.w.wang, stefanha, zhihong.wang,
	aadam, rdunlap, maxime.coquelin, lingshan.zhu

vDPA device is a device that uses a datapath which complies with the
virtio specifications with vendor specific control path. vDPA devices
can be both physically located on the hardware or emulated by software.
This RFC introduce the vDPA support in qemu

change from v1
separe the patch of introduce vhost_set_vring_ready method
separe the patch of qemu_get_peer
separe the patch  of vhost_set_state
intorduce the new macro specific for vDPA in configure
intorduce the fuction to pass the fd from cmdline 
introduce the docmation in qemu-options.hx
the other comments form last version  


Cindy Lu (3):
  net: introduce qemu_get_peer
  net: use the function qemu_get_peer
  virtio_net: introduce vhost_set_state

Jason Wang (4):
  virtio-bus: introduce queue_enabled method
  virito-pci: implement queue_enabled method
  vhost_net: set vq ready during start if necessary
  vhost: introduce vhost_set_vring_ready method

Tiwei Bie (2):
  vhost-vdpa: introduce vhost-vdpa net client
  vhost-vdpa: implement vhost-vdpa backend

 configure                         |  21 ++
 hw/net/vhost_net-stub.c           |   4 +
 hw/net/vhost_net.c                |  77 ++++-
 hw/net/virtio-net.c               |   9 +
 hw/virtio/Makefile.objs           |   1 +
 hw/virtio/vhost-backend.c         |   5 +
 hw/virtio/vhost-vdpa.c            | 447 ++++++++++++++++++++++++++++++
 hw/virtio/vhost.c                 |  14 +
 hw/virtio/virtio-pci.c            |  13 +
 hw/virtio/virtio.c                |   6 +
 include/hw/virtio/vhost-backend.h |  10 +-
 include/hw/virtio/vhost-vdpa.h    |  25 ++
 include/hw/virtio/vhost.h         |   1 +
 include/hw/virtio/virtio-bus.h    |   4 +
 include/net/net.h                 |   1 +
 include/net/vhost-vdpa.h          |  19 ++
 include/net/vhost_net.h           |   4 +-
 net/Makefile.objs                 |   2 +-
 net/clients.h                     |   2 +
 net/net.c                         |   9 +
 net/vhost-vdpa.c                  | 227 +++++++++++++++
 qapi/net.json                     |  22 +-
 qemu-options.hx                   |  19 ++
 23 files changed, 930 insertions(+), 12 deletions(-)
 create mode 100644 hw/virtio/vhost-vdpa.c
 create mode 100644 include/hw/virtio/vhost-vdpa.h
 create mode 100644 include/net/vhost-vdpa.h
 create mode 100644 net/vhost-vdpa.c

-- 
2.21.1



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC v2 1/9] net: introduce qemu_get_peer
  2020-05-08 16:32 [RFC v2 0/9] vDPA support in qemu Cindy Lu
@ 2020-05-08 16:32 ` Cindy Lu
  2020-05-08 16:32 ` [RFC v2 2/9] net: use the function qemu_get_peer Cindy Lu
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Cindy Lu @ 2020-05-08 16:32 UTC (permalink / raw)
  To: mst, armbru, eblake, cohuck, jasowang
  Cc: mhabets, qemu-devel, rob.miller, saugatm, lulu, hanand, hch,
	eperezma, jgg, shahafs, kevin.tian, parav, vmireyno,
	cunming.liang, gdawar, jiri, xiao.w.wang, stefanha, zhihong.wang,
	aadam, rdunlap, maxime.coquelin, lingshan.zhu

This is a small function that can get the peer from given NetClientState and queue_index

Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 include/net/net.h | 1 +
 net/net.c         | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index e175ba9677..0a74324ccd 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -175,6 +175,7 @@ void hmp_info_network(Monitor *mon, const QDict *qdict);
 void net_socket_rs_init(SocketReadState *rs,
                         SocketReadStateFinalize *finalize,
                         bool vnet_hdr);
+NetClientState *qemu_get_peer(NetClientState *nc, int queue_index);
 
 /* NIC info */
 
diff --git a/net/net.c b/net/net.c
index 84aa6d8d00..b3192deaf1 100644
--- a/net/net.c
+++ b/net/net.c
@@ -324,6 +324,12 @@ void *qemu_get_nic_opaque(NetClientState *nc)
 
     return nic->opaque;
 }
+NetClientState *qemu_get_peer(NetClientState *nc, int queue_index)
+{
+    assert(nc != NULL);
+    NetClientState *ncs = nc + queue_index;
+    return ncs->peer;
+}
 
 static void qemu_cleanup_net_client(NetClientState *nc)
 {
-- 
2.21.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC v2 2/9] net: use the function qemu_get_peer
  2020-05-08 16:32 [RFC v2 0/9] vDPA support in qemu Cindy Lu
  2020-05-08 16:32 ` [RFC v2 1/9] net: introduce qemu_get_peer Cindy Lu
@ 2020-05-08 16:32 ` Cindy Lu
  2020-05-09  2:19   ` Jason Wang
  2020-05-08 16:32 ` [RFC v2 3/9] virtio_net: introduce vhost_set_state Cindy Lu
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Cindy Lu @ 2020-05-08 16:32 UTC (permalink / raw)
  To: mst, armbru, eblake, cohuck, jasowang
  Cc: mhabets, qemu-devel, rob.miller, saugatm, lulu, hanand, hch,
	eperezma, jgg, shahafs, kevin.tian, parav, vmireyno,
	cunming.liang, gdawar, jiri, xiao.w.wang, stefanha, zhihong.wang,
	aadam, rdunlap, maxime.coquelin, lingshan.zhu

user the qemu_get_peer to replace the old process

Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 hw/net/vhost_net.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 6b82803fa7..d1d421e3d9 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -306,7 +306,9 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
     BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
     VirtioBusState *vbus = VIRTIO_BUS(qbus);
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
+    struct vhost_net *net;
     int r, e, i;
+    NetClientState *peer;
 
     if (!k->set_guest_notifiers) {
         error_report("binding does not support guest notifiers");
@@ -314,9 +316,9 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
     }
 
     for (i = 0; i < total_queues; i++) {
-        struct vhost_net *net;
 
-        net = get_vhost_net(ncs[i].peer);
+        peer = qemu_get_peer(ncs, i);
+        net = get_vhost_net(peer);
         vhost_net_set_vq_index(net, i * 2);
 
         /* Suppress the masking guest notifiers on vhost user
@@ -335,7 +337,8 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
     }
 
     for (i = 0; i < total_queues; i++) {
-        r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev);
+        peer = qemu_get_peer(ncs, i);
+        r = vhost_net_start_one(get_vhost_net(peer), dev);
 
         if (r < 0) {
             goto err_start;
@@ -343,7 +346,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 
         if (ncs[i].peer->vring_enable) {
             /* restore vring enable state */
-            r = vhost_set_vring_enable(ncs[i].peer, ncs[i].peer->vring_enable);
+            r = vhost_set_vring_enable(peer, peer->vring_enable);
 
             if (r < 0) {
                 goto err_start;
@@ -355,7 +358,8 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 
 err_start:
     while (--i >= 0) {
-        vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
+        peer = qemu_get_peer(ncs , i);
+        vhost_net_stop_one(get_vhost_net(peer), dev);
     }
     e = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
     if (e < 0) {
-- 
2.21.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC v2 3/9] virtio_net: introduce vhost_set_state
  2020-05-08 16:32 [RFC v2 0/9] vDPA support in qemu Cindy Lu
  2020-05-08 16:32 ` [RFC v2 1/9] net: introduce qemu_get_peer Cindy Lu
  2020-05-08 16:32 ` [RFC v2 2/9] net: use the function qemu_get_peer Cindy Lu
@ 2020-05-08 16:32 ` Cindy Lu
  2020-05-09  2:25   ` Jason Wang
  2020-05-08 16:32 ` [RFC v2 4/9] vhost-vdpa: introduce vhost-vdpa net client Cindy Lu
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Cindy Lu @ 2020-05-08 16:32 UTC (permalink / raw)
  To: mst, armbru, eblake, cohuck, jasowang
  Cc: mhabets, qemu-devel, rob.miller, saugatm, lulu, hanand, hch,
	eperezma, jgg, shahafs, kevin.tian, parav, vmireyno,
	cunming.liang, gdawar, jiri, xiao.w.wang, stefanha, zhihong.wang,
	aadam, rdunlap, maxime.coquelin, lingshan.zhu

Introduce a function to set the state to the vhost driver.
vDPA need to sync the driver's state to NIC

Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 hw/net/vhost_net.c                | 9 +++++++++
 hw/net/virtio-net.c               | 9 +++++++++
 include/hw/virtio/vhost-backend.h | 2 ++
 include/net/vhost_net.h           | 2 +-
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index d1d421e3d9..63b2a85d6e 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -465,3 +465,12 @@ int vhost_net_set_mtu(struct vhost_net *net, uint16_t mtu)
 
     return vhost_ops->vhost_net_set_mtu(&net->dev, mtu);
 }
+int vhost_set_state(NetClientState *nc, uint8_t state)
+{
+    struct vhost_net *net = get_vhost_net(nc);
+    struct vhost_dev *hdev = &net->dev;
+        if (hdev->vhost_ops->vhost_set_state) {
+                return hdev->vhost_ops->vhost_set_state(hdev, state);
+        }
+    return 0;
+}
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index db3d7c38e6..1bddb4b4af 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -206,6 +206,9 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
     VirtIODevice *vdev = VIRTIO_DEVICE(n);
     NetClientState *nc = qemu_get_queue(n->nic);
     int queues = n->multiqueue ? n->max_queues : 1;
+    NetClientState *peer = qemu_get_peer(nc, 0);
+    uint8_t status_set  = vdev->status ;
+    uint8_t vhost_started_pre = n->vhost_started;
 
     if (!get_vhost_net(nc->peer)) {
         return;
@@ -245,6 +248,7 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
                 return;
             }
         }
+        status_set = status_set | VIRTIO_CONFIG_S_DRIVER_OK;
 
         n->vhost_started = 1;
         r = vhost_net_start(vdev, n->nic->ncs, queues);
@@ -252,11 +256,16 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
             error_report("unable to start vhost net: %d: "
                          "falling back on userspace virtio", -r);
             n->vhost_started = 0;
+            status_set = status_set & ~VIRTIO_CONFIG_S_DRIVER_OK;
         }
     } else {
         vhost_net_stop(vdev, n->nic->ncs, queues);
+        status_set = status_set & ~VIRTIO_CONFIG_S_DRIVER_OK;
         n->vhost_started = 0;
     }
+    if (vhost_started_pre != n->vhost_started) {
+            vhost_set_state(peer, status_set);
+    }
 }
 
 static int virtio_net_set_vnet_endian_one(VirtIODevice *vdev,
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 6f6670783f..f823055167 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -112,6 +112,7 @@ typedef int (*vhost_get_inflight_fd_op)(struct vhost_dev *dev,
 typedef int (*vhost_set_inflight_fd_op)(struct vhost_dev *dev,
                                         struct vhost_inflight *inflight);
 
+typedef int (*vhost_set_state_op)(struct vhost_dev *dev, uint8_t state);
 typedef struct VhostOps {
     VhostBackendType backend_type;
     vhost_backend_init vhost_backend_init;
@@ -152,6 +153,7 @@ typedef struct VhostOps {
     vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter;
     vhost_get_inflight_fd_op vhost_get_inflight_fd;
     vhost_set_inflight_fd_op vhost_set_inflight_fd;
+    vhost_set_state_op vhost_set_state;
 } VhostOps;
 
 extern const VhostOps user_ops;
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index 77e47398c4..6548a5a105 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -39,5 +39,5 @@ int vhost_set_vring_enable(NetClientState * nc, int enable);
 uint64_t vhost_net_get_acked_features(VHostNetState *net);
 
 int vhost_net_set_mtu(struct vhost_net *net, uint16_t mtu);
-
+int vhost_set_state(NetClientState *nc, uint8_t state);
 #endif
-- 
2.21.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC v2 4/9] vhost-vdpa: introduce vhost-vdpa net client
  2020-05-08 16:32 [RFC v2 0/9] vDPA support in qemu Cindy Lu
                   ` (2 preceding siblings ...)
  2020-05-08 16:32 ` [RFC v2 3/9] virtio_net: introduce vhost_set_state Cindy Lu
@ 2020-05-08 16:32 ` Cindy Lu
  2020-05-08 16:41   ` Eric Blake
  2020-05-09  2:40   ` Jason Wang
  2020-05-08 16:32 ` [RFC v2 5/9] vhost-vdpa: implement vhost-vdpa backend Cindy Lu
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 33+ messages in thread
From: Cindy Lu @ 2020-05-08 16:32 UTC (permalink / raw)
  To: mst, armbru, eblake, cohuck, jasowang
  Cc: mhabets, qemu-devel, rob.miller, saugatm, lulu, hanand, hch,
	eperezma, jgg, shahafs, kevin.tian, parav, vmireyno,
	cunming.liang, gdawar, jiri, xiao.w.wang, stefanha, zhihong.wang,
	Tiwei Bie, aadam, rdunlap, maxime.coquelin, lingshan.zhu

From: Tiwei Bie <tiwei.bie@intel.com>

This patch set introduces a new net client type: vhost-vdpa.
vhost-vdpa net client will set up a vDPA device which is specified
by a "vhostdev" parameter.

Co-authored-by: Lingshan Zhu <lingshan.zhu@intel.com>
Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 configure                |  21 ++++
 include/net/vhost-vdpa.h |  19 ++++
 include/net/vhost_net.h  |   1 +
 net/Makefile.objs        |   2 +-
 net/clients.h            |   2 +
 net/net.c                |   3 +
 net/vhost-vdpa.c         | 227 +++++++++++++++++++++++++++++++++++++++
 qapi/net.json            |  22 +++-
 qemu-options.hx          |  19 ++++
 9 files changed, 313 insertions(+), 3 deletions(-)
 create mode 100644 include/net/vhost-vdpa.h
 create mode 100644 net/vhost-vdpa.c

diff --git a/configure b/configure
index 6099be1d84..bdd732e3bb 100755
--- a/configure
+++ b/configure
@@ -1505,6 +1505,10 @@ for opt do
   ;;
   --enable-vhost-user) vhost_user="yes"
   ;;
+  --disable-vhost-vdpa) vhost_vdpa="no"
+  ;;
+  --enable-vhost-vdpa) vhost_vdpa="yes"
+  ;;
   --disable-vhost-kernel) vhost_kernel="no"
   ;;
   --enable-vhost-kernel) vhost_kernel="yes"
@@ -1780,6 +1784,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   vhost-crypto    vhost-user-crypto backend support
   vhost-kernel    vhost kernel backend support
   vhost-user      vhost-user backend support
+  vhost-vdpa      vhost-vdpa backend support
   spice           spice
   rbd             rados block device (rbd)
   libiscsi        iscsi support
@@ -2241,6 +2246,10 @@ test "$vhost_user" = "" && vhost_user=yes
 if test "$vhost_user" = "yes" && test "$mingw32" = "yes"; then
   error_exit "vhost-user isn't available on win32"
 fi
+test "$vhost_vdpa" = "" && vhost_vdpa=yes
+if test "$vhost_vdpa" = "yes" && test "$mingw32" = "yes"; then
+  error_exit "vhost-vdpa isn't available on win32"
+fi
 test "$vhost_kernel" = "" && vhost_kernel=$linux
 if test "$vhost_kernel" = "yes" && test "$linux" != "yes"; then
   error_exit "vhost-kernel is only available on Linux"
@@ -2269,6 +2278,11 @@ test "$vhost_user_fs" = "" && vhost_user_fs=$vhost_user
 if test "$vhost_user_fs" = "yes" && test "$vhost_user" = "no"; then
   error_exit "--enable-vhost-user-fs requires --enable-vhost-user"
 fi
+#vhost-vdpa backends
+test "$vhost_net_vdpa" = "" && vhost_net_vdpa=$vhost_vdpa
+if test "$vhost_net_vdpa" = "yes" && test "$vhost_vdpa" = "no"; then
+  error_exit "--enable-vhost-net-vdpa requires --enable-vhost-vdpa"
+fi
 
 # OR the vhost-kernel and vhost-user values for simplicity
 if test "$vhost_net" = ""; then
@@ -6543,6 +6557,7 @@ echo "vhost-scsi support $vhost_scsi"
 echo "vhost-vsock support $vhost_vsock"
 echo "vhost-user support $vhost_user"
 echo "vhost-user-fs support $vhost_user_fs"
+echo "vhost-vdpa support $vhost_vdpa"
 echo "Trace backends    $trace_backends"
 if have_backend "simple"; then
 echo "Trace output file $trace_file-<pid>"
@@ -7031,6 +7046,9 @@ fi
 if test "$vhost_net_user" = "yes" ; then
   echo "CONFIG_VHOST_NET_USER=y" >> $config_host_mak
 fi
+if test "$vhost_net_vdpa" = "yes" ; then
+  echo "CONFIG_VHOST_NET_VDPA=y" >> $config_host_mak
+fi
 if test "$vhost_crypto" = "yes" ; then
   echo "CONFIG_VHOST_CRYPTO=y" >> $config_host_mak
 fi
@@ -7043,6 +7061,9 @@ fi
 if test "$vhost_user" = "yes" ; then
   echo "CONFIG_VHOST_USER=y" >> $config_host_mak
 fi
+if test "$vhost_vdpa" = "yes" ; then
+  echo "CONFIG_VHOST_VDPA=y" >> $config_host_mak
+fi
 if test "$vhost_user_fs" = "yes" ; then
   echo "CONFIG_VHOST_USER_FS=y" >> $config_host_mak
 fi
diff --git a/include/net/vhost-vdpa.h b/include/net/vhost-vdpa.h
new file mode 100644
index 0000000000..6ce0d04f72
--- /dev/null
+++ b/include/net/vhost-vdpa.h
@@ -0,0 +1,19 @@
+/*
+ * vhost-vdpa.h
+ *
+ * Copyright(c) 2017-2018 Intel Corporation.
+ * Copyright(c) 2020 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef VHOST_VDPA_H
+#define VHOST_VDPA_H
+
+struct vhost_net;
+struct vhost_net *vhost_vdpa_get_vhost_net(NetClientState *nc);
+uint64_t vhost_vdpa_get_acked_features(NetClientState *nc);
+
+#endif /* VHOST_VDPA_H */
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index 6548a5a105..b47844bf29 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -40,4 +40,5 @@ uint64_t vhost_net_get_acked_features(VHostNetState *net);
 
 int vhost_net_set_mtu(struct vhost_net *net, uint16_t mtu);
 int vhost_set_state(NetClientState *nc, uint8_t state);
+int vhost_net_get_device_id(struct vhost_net *net, uint32_t *device_id);
 #endif
diff --git a/net/Makefile.objs b/net/Makefile.objs
index c5d076d19c..5ab45545db 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -26,7 +26,7 @@ tap-obj-$(CONFIG_SOLARIS) = tap-solaris.o
 tap-obj-y ?= tap-stub.o
 common-obj-$(CONFIG_POSIX) += tap.o $(tap-obj-y)
 common-obj-$(CONFIG_WIN32) += tap-win32.o
-
+common-obj-$(CONFIG_VHOST_NET_VDPA) += vhost-vdpa.o
 vde.o-libs = $(VDE_LIBS)
 
 common-obj-$(CONFIG_CAN_BUS) += can/
diff --git a/net/clients.h b/net/clients.h
index a6ef267e19..92f9b59aed 100644
--- a/net/clients.h
+++ b/net/clients.h
@@ -61,4 +61,6 @@ int net_init_netmap(const Netdev *netdev, const char *name,
 int net_init_vhost_user(const Netdev *netdev, const char *name,
                         NetClientState *peer, Error **errp);
 
+int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
+                        NetClientState *peer, Error **errp);
 #endif /* QEMU_NET_CLIENTS_H */
diff --git a/net/net.c b/net/net.c
index b3192deaf1..9eff1ae982 100644
--- a/net/net.c
+++ b/net/net.c
@@ -965,6 +965,9 @@ static int (* const net_client_init_fun[NET_CLIENT_DRIVER__MAX])(
 #ifdef CONFIG_VHOST_NET_USER
         [NET_CLIENT_DRIVER_VHOST_USER] = net_init_vhost_user,
 #endif
+#ifdef CONFIG_VHOST_NET_VDPA
+        [NET_CLIENT_DRIVER_VHOST_VDPA] = net_init_vhost_vdpa,
+#endif
 #ifdef CONFIG_L2TPV3
         [NET_CLIENT_DRIVER_L2TPV3]    = net_init_l2tpv3,
 #endif
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
new file mode 100644
index 0000000000..c29678fdf1
--- /dev/null
+++ b/net/vhost-vdpa.c
@@ -0,0 +1,227 @@
+/*
+ * vhost-vdpa.c
+ *
+ * Copyright(c) 2017-2018 Intel Corporation.
+ * Copyright(c) 2020 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "clients.h"
+#include "net/vhost_net.h"
+#include "net/vhost-vdpa.h"
+#include "hw/virtio/vhost-vdpa.h"
+#include "qemu/config-file.h"
+#include "qemu/error-report.h"
+#include "qemu/option.h"
+#include "qapi/error.h"
+#include <linux/vfio.h>
+#include <sys/ioctl.h>
+#include <err.h>
+#include <linux/virtio_net.h>
+#include "monitor/monitor.h"
+#include "qemu/sockets.h"
+#include "hw/virtio/vhost.h"
+
+/* Todo:need to add the multiqueue support here */
+typedef struct VhostVDPAState {
+    NetClientState nc;
+    struct vhost_vdpa vhost_vdpa;
+    VHostNetState *vhost_net;
+    uint64_t acked_features;
+    bool started;
+} VhostVDPAState;
+
+VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
+{
+    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+    return s->vhost_net;
+}
+
+uint64_t vhost_vdpa_get_acked_features(NetClientState *nc)
+{
+    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+    return s->acked_features;
+}
+
+static void vhost_vdpa_del(NetClientState *ncs)
+{
+    VhostVDPAState *s;
+
+    assert(ncs->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+
+    s = DO_UPCAST(VhostVDPAState, nc, ncs);
+
+    if (s->vhost_net) {
+        /* save acked features */
+        uint64_t features = vhost_net_get_acked_features(s->vhost_net);
+        if (features) {
+            s->acked_features = features;
+         }
+        vhost_net_cleanup(s->vhost_net);
+    }
+}
+
+static int vhost_vdpa_add(NetClientState *ncs, void *be)
+{
+    VhostNetOptions options;
+    struct vhost_net *net = NULL;
+    VhostVDPAState *s;
+
+    options.backend_type = VHOST_BACKEND_TYPE_VDPA;
+
+    assert(ncs->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+
+    s = DO_UPCAST(VhostVDPAState, nc, ncs);
+
+    options.net_backend = ncs;
+    options.opaque      = be;
+    options.busyloop_timeout = 0;
+    net = vhost_net_init(&options);
+    if (!net) {
+        error_report("failed to init vhost_net for queue");
+        goto err;
+     }
+
+     if (s->vhost_net) {
+        vhost_net_cleanup(s->vhost_net);
+        g_free(s->vhost_net);
+     }
+     s->vhost_net = net;
+
+    return 0;
+
+err:
+    if (net) {
+        vhost_net_cleanup(net);
+    }
+    vhost_vdpa_del(ncs);
+    return -1;
+}
+static void vhost_vdpa_cleanup(NetClientState *nc)
+{
+    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+
+    if (s->vhost_net) {
+        vhost_net_cleanup(s->vhost_net);
+        g_free(s->vhost_net);
+        s->vhost_net = NULL;
+    }
+
+    qemu_purge_queued_packets(nc);
+}
+
+static bool vhost_vdpa_has_vnet_hdr(NetClientState *nc)
+{
+    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+
+    return true;
+}
+
+static bool vhost_vdpa_has_ufo(NetClientState *nc)
+{
+    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    uint64_t  features = 0;
+
+    features |= (1ULL << VIRTIO_NET_F_HOST_UFO);
+    features = vhost_net_get_features(s->vhost_net, features);
+    return !!(features & (1ULL << VIRTIO_NET_F_HOST_UFO));
+
+}
+
+static int vhost_vdpa_check_device_id(NetClientState *nc)
+{
+    uint32_t device_id;
+    int ret;
+    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    /* Get the device id from hw*/
+    ret = vhost_net_get_device_id(s->vhost_net, &device_id);
+    if (device_id != VIRTIO_ID_NET) {
+        return -ENOTSUP;
+    }
+    return ret;
+}
+
+static NetClientInfo net_vhost_vdpa_info = {
+        .type = NET_CLIENT_DRIVER_VHOST_VDPA,
+        .size = sizeof(VhostVDPAState),
+        .cleanup = vhost_vdpa_cleanup,
+        .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
+        .has_ufo = vhost_vdpa_has_ufo,
+};
+
+static int net_vhost_vdpa_init(NetClientState *peer, const char *device,
+                               const char *name, const char *vhostdev,
+                               bool has_fd, char *fd)
+{
+    NetClientState *nc = NULL;
+    VhostVDPAState *s;
+    int vdpa_device_fd = -1;
+    Error *err = NULL;
+
+    assert(name);
+
+    nc = qemu_new_net_client(&net_vhost_vdpa_info, peer, device, name);
+    snprintf(nc->info_str, sizeof(nc->info_str), "vhost-vdpa");
+    nc->queue_index = 0;
+
+    s = DO_UPCAST(VhostVDPAState, nc, nc);
+
+    if (has_fd) {
+        vdpa_device_fd = monitor_fd_param(cur_mon, fd, &err);
+    } else{
+        vdpa_device_fd = open(vhostdev, O_RDWR);
+    }
+
+    if (vdpa_device_fd == -1) {
+        return -errno;
+     }
+    s->vhost_vdpa.device_fd = vdpa_device_fd;
+    vhost_vdpa_add(nc, (void *)&s->vhost_vdpa);
+    assert(s->vhost_net);
+    /* check the device id for vdpa */
+    return vhost_vdpa_check_device_id(nc);
+}
+
+static int net_vhost_check_net(void *opaque, QemuOpts *opts, Error **errp)
+{
+    const char *name = opaque;
+    const char *driver, *netdev;
+
+    driver = qemu_opt_get(opts, "driver");
+    netdev = qemu_opt_get(opts, "netdev");
+    if (!driver || !netdev) {
+        return 0;
+    }
+
+    if (strcmp(netdev, name) == 0 &&
+        !g_str_has_prefix(driver, "virtio-net-")) {
+        error_setg(errp, "vhost-vdpa requires frontend driver virtio-net-*");
+        return -1;
+    }
+
+    return 0;
+}
+
+int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
+                        NetClientState *peer, Error **errp)
+{
+    const NetdevVhostVDPAOptions *opts;
+
+    assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+    opts = &netdev->u.vhost_vdpa;
+    /* verify net frontend */
+    if (qemu_opts_foreach(qemu_find_opts("device"), net_vhost_check_net,
+                          (char *)name, errp)) {
+        return -1;
+    }
+    return net_vhost_vdpa_init(peer, "vhost_vdpa", name, opts->vhostdev,
+                    opts->has_fd, opts->fd);
+}
diff --git a/qapi/net.json b/qapi/net.json
index 335295be50..0f4fa6fffc 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -441,6 +441,23 @@
     '*queues':        'int' } }
 
 ##
+# @NetdevVhostVDPAOptions:
+#
+# Vhost-vdpa network backend
+#
+# @vhostdev: name of a vdpa dev path in sysfs
+#
+# @queues: number of queues to be created for multiqueue vhost-vdpa
+#          (default: 1) (Since 5.1)
+#
+# Since: 5.1
+##
+{ 'struct': 'NetdevVhostVDPAOptions',
+  'data': {
+    '*vhostdev':     'str',
+    '*fd':           'str',
+    '*queues':       'int' } }
+##
 # @NetClientDriver:
 #
 # Available netdev drivers.
@@ -451,7 +468,7 @@
 ##
 { 'enum': 'NetClientDriver',
   'data': [ 'none', 'nic', 'user', 'tap', 'l2tpv3', 'socket', 'vde',
-            'bridge', 'hubport', 'netmap', 'vhost-user' ] }
+            'bridge', 'hubport', 'netmap', 'vhost-user', 'vhost-vdpa' ] }
 
 ##
 # @Netdev:
@@ -479,7 +496,8 @@
     'bridge':   'NetdevBridgeOptions',
     'hubport':  'NetdevHubPortOptions',
     'netmap':   'NetdevNetmapOptions',
-    'vhost-user': 'NetdevVhostUserOptions' } }
+    'vhost-user': 'NetdevVhostUserOptions',
+    'vhost-vdpa': 'NetdevVhostVDPAOptions' } }
 
 ##
 # @NetLegacy:
diff --git a/qemu-options.hx b/qemu-options.hx
index 65c9473b73..08256d9d4e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2291,6 +2291,10 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
 #ifdef CONFIG_POSIX
     "-netdev vhost-user,id=str,chardev=dev[,vhostforce=on|off]\n"
     "                configure a vhost-user network, backed by a chardev 'dev'\n"
+#endif
+#ifdef CONFIG_POSIX
+    "-netdev vhost-vdpa,id=str,vhostdev=/path/to/dev\n"
+    "                configure a vhost-vdpa network, backed by a vhostdev 'dev'\n"
 #endif
     "-netdev hubport,id=str,hubid=n[,netdev=nd]\n"
     "                configure a hub port on the hub with ID 'n'\n", QEMU_ARCH_ALL)
@@ -2310,6 +2314,9 @@ DEF("nic", HAS_ARG, QEMU_OPTION_nic,
 #endif
 #ifdef CONFIG_POSIX
     "vhost-user|"
+#endif
+#ifdef CONFIG_POSIX
+    "vhost-vdpa|"
 #endif
     "socket][,option][,...][mac=macaddr]\n"
     "                initialize an on-board / default host NIC (using MAC address\n"
@@ -2749,6 +2756,18 @@ qemu -m 512 -object memory-backend-file,id=mem,size=512M,mem-path=/hugetlbfs,sha
      -device virtio-net-pci,netdev=net0
 @end example
 
+@item -netdev vhost-vdpa,vhostdev=/path/to/dev
+Establish a vhost-vdpa netdev, backed by a vhostdev. The chardev should
+be a unix domain socket backed one. The vhost-vdpa uses a specifically defined
+protocol to pass vhost ioctl replacement messages to an application on the other
+end of the socket.
+Example:
+@example
+qemu -m 512 -object memory-backend-file,id=mem,size=512M,mem-path=/hugetlbfs,share=on \
+     -numa node,memdev=mem \
+     -netdev type=vhost-vdpa,id=net0,vhostdev=/path/to/dev \
+     -device virtio-net-pci,netdev=net0
+@end example
 @item -netdev hubport,id=@var{id},hubid=@var{hubid}[,netdev=@var{nd}]
 
 Create a hub port on the emulated hub with ID @var{hubid}.
-- 
2.21.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC v2 5/9] vhost-vdpa: implement vhost-vdpa backend
  2020-05-08 16:32 [RFC v2 0/9] vDPA support in qemu Cindy Lu
                   ` (3 preceding siblings ...)
  2020-05-08 16:32 ` [RFC v2 4/9] vhost-vdpa: introduce vhost-vdpa net client Cindy Lu
@ 2020-05-08 16:32 ` Cindy Lu
  2020-05-09  3:00   ` Jason Wang
  2020-05-21 12:40   ` Stefan Hajnoczi
  2020-05-08 16:32 ` [RFC v2 6/9] virtio-bus: introduce queue_enabled method Cindy Lu
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 33+ messages in thread
From: Cindy Lu @ 2020-05-08 16:32 UTC (permalink / raw)
  To: mst, armbru, eblake, cohuck, jasowang
  Cc: mhabets, qemu-devel, rob.miller, saugatm, lulu, hanand, hch,
	eperezma, jgg, shahafs, kevin.tian, parav, vmireyno,
	cunming.liang, gdawar, jiri, xiao.w.wang, stefanha, zhihong.wang,
	Tiwei Bie, aadam, rdunlap, maxime.coquelin, lingshan.zhu

From: Tiwei Bie <tiwei.bie@intel.com>

Currently we have 2 types of vhost backends in QEMU: vhost kernel and
vhost-user. The above patch provides a generic device for vDPA purpose,
this vDPA device exposes to user space a non-vendor-specific configuration
interface for setting up a vhost HW accelerator, this patch set introduces
a third vhost backend called vhost-vdpa based on the vDPA interface.

Vhost-vdpa usage:

  qemu-system-x86_64 -cpu host -enable-kvm \
    ......
  -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-id,id=vhost-vdpa0 \
  -device virtio-net-pci,netdev=vhost-vdpa0,page-per-vq=on \

Co-Authored-By: Lingshan zhu <lingshan.zhu@intel.com>
Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 hw/net/vhost_net.c                |  39 ++-
 hw/virtio/Makefile.objs           |   1 +
 hw/virtio/vhost-backend.c         |   5 +
 hw/virtio/vhost-vdpa.c            | 447 ++++++++++++++++++++++++++++++
 hw/virtio/vhost.c                 |  14 +
 include/hw/virtio/vhost-backend.h |   8 +-
 include/hw/virtio/vhost-vdpa.h    |  25 ++
 include/hw/virtio/vhost.h         |   1 +
 8 files changed, 538 insertions(+), 2 deletions(-)
 create mode 100644 hw/virtio/vhost-vdpa.c
 create mode 100644 include/hw/virtio/vhost-vdpa.h

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 63b2a85d6e..1af39abaf3 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -17,8 +17,10 @@
 #include "net/net.h"
 #include "net/tap.h"
 #include "net/vhost-user.h"
+#include "net/vhost-vdpa.h"
 
 #include "standard-headers/linux/vhost_types.h"
+#include "linux-headers/linux/vhost.h"
 #include "hw/virtio/virtio-net.h"
 #include "net/vhost_net.h"
 #include "qemu/error-report.h"
@@ -85,6 +87,29 @@ static const int user_feature_bits[] = {
     VHOST_INVALID_FEATURE_BIT
 };
 
+static const int vdpa_feature_bits[] = {
+    VIRTIO_F_NOTIFY_ON_EMPTY,
+    VIRTIO_RING_F_INDIRECT_DESC,
+    VIRTIO_RING_F_EVENT_IDX,
+    VIRTIO_F_ANY_LAYOUT,
+    VIRTIO_F_VERSION_1,
+    VIRTIO_NET_F_CSUM,
+    VIRTIO_NET_F_GUEST_CSUM,
+    VIRTIO_NET_F_GSO,
+    VIRTIO_NET_F_GUEST_TSO4,
+    VIRTIO_NET_F_GUEST_TSO6,
+    VIRTIO_NET_F_GUEST_ECN,
+    VIRTIO_NET_F_GUEST_UFO,
+    VIRTIO_NET_F_HOST_TSO4,
+    VIRTIO_NET_F_HOST_TSO6,
+    VIRTIO_NET_F_HOST_ECN,
+    VIRTIO_NET_F_HOST_UFO,
+    VIRTIO_NET_F_MRG_RXBUF,
+    VIRTIO_NET_F_MTU,
+    VIRTIO_F_IOMMU_PLATFORM,
+    VIRTIO_NET_F_GUEST_ANNOUNCE,
+    VHOST_INVALID_FEATURE_BIT
+};
 static const int *vhost_net_get_feature_bits(struct vhost_net *net)
 {
     const int *feature_bits = 0;
@@ -96,6 +121,9 @@ static const int *vhost_net_get_feature_bits(struct vhost_net *net)
     case NET_CLIENT_DRIVER_VHOST_USER:
         feature_bits = user_feature_bits;
         break;
+    case NET_CLIENT_DRIVER_VHOST_VDPA:
+        feature_bits = vdpa_feature_bits;
+        break;
     default:
         error_report("Feature bits not defined for this type: %d",
                 net->nc->info->type);
@@ -110,7 +138,10 @@ uint64_t vhost_net_get_features(struct vhost_net *net, uint64_t features)
     return vhost_get_features(&net->dev, vhost_net_get_feature_bits(net),
             features);
 }
-
+int vhost_net_get_device_id(struct vhost_net *net, uint32_t * device_id)
+{
+    return vhost_dev_get_device_id(&net->dev, device_id);
+}
 void vhost_net_ack_features(struct vhost_net *net, uint64_t features)
 {
     net->dev.acked_features = net->dev.backend_features;
@@ -433,6 +464,12 @@ VHostNetState *get_vhost_net(NetClientState *nc)
         vhost_net = vhost_user_get_vhost_net(nc);
         assert(vhost_net);
         break;
+#endif
+#ifdef CONFIG_VHOST_NET_VDPA
+    case NET_CLIENT_DRIVER_VHOST_VDPA:
+        vhost_net = vhost_vdpa_get_vhost_net(nc);
+        assert(vhost_net);
+        break;
 #endif
     default:
         break;
diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index e2f70fbb89..e7c5d4a862 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -5,6 +5,7 @@ obj-y += virtio.o
 obj-$(call lor,$(CONFIG_VHOST_USER),$(CONFIG_VHOST_KERNEL)) += vhost.o vhost-backend.o
 common-obj-$(call lnot,$(call lor,$(CONFIG_VHOST_USER),$(CONFIG_VHOST_KERNEL))) += vhost-stub.o
 obj-$(CONFIG_VHOST_USER) += vhost-user.o
+obj-$(CONFIG_VHOST_VDPA) += vhost-vdpa.o
 
 common-obj-$(CONFIG_VIRTIO_RNG) += virtio-rng.o
 common-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index 48905383f8..069ddb423d 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -285,6 +285,11 @@ int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
     case VHOST_BACKEND_TYPE_USER:
         dev->vhost_ops = &user_ops;
         break;
+#endif
+#ifdef CONFIG_VHOST_VDPA
+    case VHOST_BACKEND_TYPE_VDPA:
+        dev->vhost_ops = &vdpa_ops;
+        break;
 #endif
     default:
         error_report("Unknown vhost backend type");
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
new file mode 100644
index 0000000000..bac8a8aa2a
--- /dev/null
+++ b/hw/virtio/vhost-vdpa.c
@@ -0,0 +1,447 @@
+/*
+ * vhost-vdpa
+ *
+ *  Copyright(c) 2017-2018 Intel Corporation.
+ *  Copyright(c) 2020 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include <linux/vhost.h>
+#include <linux/vfio.h>
+#include <sys/eventfd.h>
+#include <sys/ioctl.h>
+#include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-backend.h"
+#include "hw/virtio/virtio-net.h"
+#include "hw/virtio/vhost-vdpa.h"
+#include "qemu/main-loop.h"
+#include <linux/kvm.h>
+#include "sysemu/kvm.h"
+
+
+static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section)
+{
+    return (!memory_region_is_ram(section->mr) &&
+            !memory_region_is_iommu(section->mr)) ||
+           /*
+            * Sizing an enabled 64-bit BAR can cause spurious mappings to
+            * addresses in the upper part of the 64-bit address space.  These
+            * are never accessed by the CPU and beyond the address width of
+            * some IOMMU hardware.  TODO: VDPA should tell us the IOMMU width.
+            */
+           section->offset_within_address_space & (1ULL << 63);
+}
+
+static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
+                              void *vaddr, bool readonly)
+{
+    struct vhost_msg_v2 msg;
+    int fd = v->device_fd;
+    int ret = 0;
+
+    msg.type =  v->msg_type;
+    msg.iotlb.iova = iova;
+    msg.iotlb.size = size;
+    msg.iotlb.uaddr = (uint64_t)vaddr;
+    msg.iotlb.perm = readonly ? VHOST_ACCESS_RO : VHOST_ACCESS_RW;
+    msg.iotlb.type = VHOST_IOTLB_UPDATE;
+
+    if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
+        error_report("failed to write, fd=%d, errno=%d (%s)",
+            fd, errno, strerror(errno));
+        return -EIO ;
+    }
+
+    return ret;
+}
+
+static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
+                                hwaddr size)
+{
+    struct vhost_msg_v2 msg;
+    int fd = v->device_fd;
+    int ret = 0;
+
+    msg.type =  v->msg_type;
+    msg.iotlb.iova = iova;
+    msg.iotlb.size = size;
+    msg.iotlb.type = VHOST_IOTLB_INVALIDATE;
+
+    if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
+        error_report("failed to write, fd=%d, errno=%d (%s)",
+            fd, errno, strerror(errno));
+        return -EIO ;
+    }
+
+    return ret;
+}
+
+static void vhost_vdpa_listener_region_add(MemoryListener *listener,
+                                           MemoryRegionSection *section)
+{
+    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
+    hwaddr iova;
+    Int128 llend, llsize;
+    void *vaddr;
+    int ret;
+
+    if (vhost_vdpa_listener_skipped_section(section)) {
+        return;
+    }
+
+    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
+                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
+        error_report("%s received unaligned region", __func__);
+        return;
+    }
+
+    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+    llend = int128_make64(section->offset_within_address_space);
+    llend = int128_add(llend, section->size);
+    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+
+    if (int128_ge(int128_make64(iova), llend)) {
+        return;
+    }
+
+    memory_region_ref(section->mr);
+
+    /* Here we assume that memory_region_is_ram(section->mr)==true */
+
+    vaddr = memory_region_get_ram_ptr(section->mr) +
+            section->offset_within_region +
+            (iova - section->offset_within_address_space);
+
+    llsize = int128_sub(llend, int128_make64(iova));
+
+    ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
+                             vaddr, section->readonly);
+    if (ret) {
+        error_report("vhost vdpa map fail!");
+        if (memory_region_is_ram_device(section->mr)) {
+            /* Allow unexpected mappings not to be fatal for RAM devices */
+            error_report("map ram fail!");
+          return ;
+        }
+        goto fail;
+    }
+
+    return;
+
+fail:
+    if (memory_region_is_ram_device(section->mr)) {
+        error_report("failed to vdpa_dma_map. pci p2p may not work");
+        return;
+
+    }
+    /*
+     * On the initfn path, store the first error in the container so we
+     * can gracefully fail.  Runtime, there's not much we can do other
+     * than throw a hardware error.
+     */
+    error_report("vhost-vdpa: DMA mapping failed, unable to continue");
+    return;
+
+}
+
+static void vhost_vdpa_listener_region_del(MemoryListener *listener,
+                                           MemoryRegionSection *section)
+{
+    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
+    hwaddr iova;
+    Int128 llend, llsize;
+    int ret;
+    bool try_unmap = true;
+
+    if (vhost_vdpa_listener_skipped_section(section)) {
+        return;
+    }
+
+    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
+                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
+        error_report("%s received unaligned region", __func__);
+        return;
+    }
+
+    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+    llend = int128_make64(section->offset_within_address_space);
+    llend = int128_add(llend, section->size);
+    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+
+    if (int128_ge(int128_make64(iova), llend)) {
+        return;
+    }
+
+    llsize = int128_sub(llend, int128_make64(iova));
+
+    if (try_unmap) {
+        ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
+        if (ret) {
+            error_report("vhost_vdpa dma unmap error!");
+        }
+    }
+
+    memory_region_unref(section->mr);
+}
+
+static const MemoryListener vhost_vdpa_memory_listener = {
+    .region_add = vhost_vdpa_listener_region_add,
+    .region_del = vhost_vdpa_listener_region_del,
+};
+
+
+static int vhost_vdpa_call(struct vhost_dev *dev, unsigned long int request,
+                             void *arg)
+{
+    struct vhost_vdpa *v = dev->opaque;
+    int fd = v->device_fd;
+
+    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
+
+    return ioctl(fd, request, arg);
+}
+
+static int vhost_vdpa_get_backend_features(struct vhost_dev *dev,
+                                   uint64_t *features)
+{
+    return vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, features);
+}
+
+static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque)
+{
+    struct vhost_vdpa *v;
+    uint64_t backend_features = 0;
+    int ret = 0;
+
+    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
+
+    v = opaque;
+    dev->opaque =  opaque ;
+
+    v->listener = vhost_vdpa_memory_listener;
+    memory_listener_register(&v->listener, &address_space_memory);
+
+    ret = vhost_vdpa_get_backend_features(dev, &backend_features);
+   /*
+    * In order to compatible with older version kernel,
+    * If the kernel not support this ioctl,
+    * set the msg_type for v2 by defeault
+    */
+    if (ret) {
+        v->msg_type = VHOST_IOTLB_MSG_V2;
+        return 0;
+     }
+    if (backend_features &  (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)) {
+        v->msg_type = VHOST_IOTLB_MSG_V2;
+     } else {
+        v->msg_type = VHOST_IOTLB_MSG;
+     }
+    return 0;
+}
+
+static int vhost_vdpa_cleanup(struct vhost_dev *dev)
+{
+    struct vhost_vdpa *v;
+    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
+
+    v = dev->opaque;
+    memory_listener_unregister(&v->listener);
+
+    dev->opaque = NULL;
+    return 0;
+}
+
+static int vhost_vdpa_memslots_limit(struct vhost_dev *dev)
+{
+    /*Using IOTLB API here,NOTE:please set the ulimit before using*/
+    return INT_MAX;
+}
+
+static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
+                                   struct vhost_log *log)
+{
+    return -1;
+}
+
+static int vhost_vdpa_set_mem_table(struct vhost_dev *dev,
+                                    struct vhost_memory *mem)
+{
+
+    if (mem->padding) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static int vhost_vdpa_set_vring_addr(struct vhost_dev *dev,
+                                     struct vhost_vring_addr *addr)
+{
+    return vhost_vdpa_call(dev, VHOST_SET_VRING_ADDR, addr);
+}
+
+static int vhost_vdpa_set_vring_num(struct vhost_dev *dev,
+                                    struct vhost_vring_state *ring)
+{
+    return vhost_vdpa_call(dev, VHOST_SET_VRING_NUM, ring);
+}
+
+static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
+                                     struct vhost_vring_state *ring)
+{
+    return vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
+}
+
+static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
+                                     struct vhost_vring_state *ring)
+{
+
+    return vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
+}
+
+static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
+                                     struct vhost_vring_file *file)
+{
+    return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
+}
+
+static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
+                                     struct vhost_vring_file *file)
+{
+    return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
+}
+
+static int vhost_vdpa_set_features(struct vhost_dev *dev,
+                                   uint64_t features)
+{
+    return vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
+}
+
+static int vhost_vdpa_get_features(struct vhost_dev *dev,
+                                   uint64_t *features)
+{
+    return vhost_vdpa_call(dev, VHOST_GET_FEATURES, features);
+}
+
+static int vhost_vdpa_get_device_id(struct vhost_dev *dev,
+                                   uint32_t *device_id)
+{
+    return vhost_vdpa_call(dev, VHOST_VDPA_GET_DEVICE_ID, device_id);
+}
+
+static int vhost_vdpa_set_owner(struct vhost_dev *dev)
+{
+    return vhost_vdpa_call(dev, VHOST_SET_OWNER, NULL);
+}
+
+static int vhost_vdpa_reset_device(struct vhost_dev *dev)
+{
+    return vhost_vdpa_call(dev, VHOST_RESET_OWNER, NULL);
+}
+
+static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
+{
+    assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs);
+
+    return idx - dev->vq_index;
+}
+
+static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
+{
+    int i;
+    for (i = 0; i < dev->nvqs; ++i) {
+        struct vhost_vring_state state = {
+            .index = dev->vq_index + i,
+            .num = 1,
+        };
+        vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
+    }
+    return 0;
+}
+static int  vhost_vdpa_set_config(struct vhost_dev *dev, const uint8_t *data,
+                                   uint32_t offset, uint32_t size,
+                                   uint32_t flags)
+{
+
+    struct vhost_vdpa_config *config;
+    int ret;
+    if ((size > VHOST_VDPA_MAX_CONFIG_SIZE) || (data == NULL)) {
+        return -1;
+    }
+    config = g_new0(struct vhost_vdpa_config, 1);
+    if (config == NULL) {
+        return -EINVAL;
+    }
+    config->off = 0;
+    config->len = size;
+    memcpy(config->buf, data, size);
+    ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_CONFIG, config);
+     g_free(config);
+     return ret;
+}
+
+ static int  vhost_vdpa_get_config(struct vhost_dev *dev, uint8_t *config,
+                                   uint32_t config_len)
+{
+    struct vhost_vdpa_config *v_config = NULL;
+    int ret;
+
+     if ((config_len > VHOST_VDPA_MAX_CONFIG_SIZE) || (config == NULL)) {
+        return -1;
+    }
+    v_config = g_new0(struct vhost_vdpa_config, 1);
+
+    if (v_config == NULL) {
+        return -EINVAL;
+    }
+   ret =  vhost_vdpa_call(dev, VHOST_VDPA_GET_CONFIG, v_config);
+   if ((v_config->len > config_len - v_config->off) || (v_config->len == 0)) {
+        g_free(v_config);
+       return -EINVAL;
+    }
+
+    memcpy(config, v_config->buf + v_config->off, config_len);
+    g_free(v_config);
+    return ret;
+ }
+static int vhost_vdpa_set_state(struct vhost_dev *dev, uint8_t state)
+{
+    return vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &state);
+}
+
+
+const VhostOps vdpa_ops = {
+        .backend_type = VHOST_BACKEND_TYPE_VDPA,
+        .vhost_backend_init = vhost_vdpa_init,
+        .vhost_backend_cleanup = vhost_vdpa_cleanup,
+        .vhost_backend_memslots_limit = vhost_vdpa_memslots_limit,
+        .vhost_set_log_base = vhost_vdpa_set_log_base,
+        .vhost_set_mem_table = vhost_vdpa_set_mem_table,
+        .vhost_set_vring_addr = vhost_vdpa_set_vring_addr,
+        .vhost_set_vring_endian = NULL,
+        .vhost_set_vring_num = vhost_vdpa_set_vring_num,
+        .vhost_set_vring_base = vhost_vdpa_set_vring_base,
+        .vhost_get_vring_base = vhost_vdpa_get_vring_base,
+        .vhost_set_vring_kick = vhost_vdpa_set_vring_kick,
+        .vhost_set_vring_call = vhost_vdpa_set_vring_call,
+        .vhost_set_features = vhost_vdpa_set_features,
+        .vhost_get_features = vhost_vdpa_get_features,
+        .vhost_set_owner = vhost_vdpa_set_owner,
+        .vhost_reset_device = vhost_vdpa_reset_device,
+        .vhost_get_vq_index = vhost_vdpa_get_vq_index,
+        .vhost_set_vring_ready = vhost_vdpa_set_vring_ready,
+        .vhost_get_config  = vhost_vdpa_get_config,
+        .vhost_set_config = vhost_vdpa_set_config,
+        .vhost_requires_shm_log = NULL,
+        .vhost_migration_done = NULL,
+        .vhost_backend_can_merge = NULL,
+        .vhost_net_set_mtu = NULL,
+        .vhost_set_iotlb_callback = NULL,
+        .vhost_send_device_iotlb_msg = NULL,
+        .vhost_set_state = vhost_vdpa_set_state,
+        .vhost_get_device_id = vhost_vdpa_get_device_id,
+};
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 4da0d5a6c5..6d2aa461d6 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -746,6 +746,12 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
         .log_guest_addr = vq->used_phys,
         .flags = enable_log ? (1 << VHOST_VRING_F_LOG) : 0,
     };
+    /*vDPA need to use the phys address here to set to hardware*/
+    if (dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
+        addr.desc_user_addr = (uint64_t)(unsigned long)vq->desc_phys;
+        addr.avail_user_addr = (uint64_t)(unsigned long)vq->avail_phys;
+        addr.used_user_addr = (uint64_t)(unsigned long)vq->used_phys;
+    }
     int r = dev->vhost_ops->vhost_set_vring_addr(dev, &addr);
     if (r < 0) {
         VHOST_OPS_DEBUG("vhost_set_vring_addr failed");
@@ -1493,6 +1499,14 @@ int vhost_dev_set_config(struct vhost_dev *hdev, const uint8_t *data,
     return -1;
 }
 
+int vhost_dev_get_device_id(struct vhost_dev *hdev, uint32_t *device_id)
+{
+    assert(hdev->vhost_ops);
+    if (hdev->vhost_ops->vhost_get_device_id) {
+        return hdev->vhost_ops->vhost_get_device_id(hdev, device_id);
+    }
+    return -1;
+}
 void vhost_dev_set_config_notifier(struct vhost_dev *hdev,
                                    const VhostDevConfigOps *ops)
 {
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index f823055167..8ac898a987 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -17,7 +17,8 @@ typedef enum VhostBackendType {
     VHOST_BACKEND_TYPE_NONE = 0,
     VHOST_BACKEND_TYPE_KERNEL = 1,
     VHOST_BACKEND_TYPE_USER = 2,
-    VHOST_BACKEND_TYPE_MAX = 3,
+    VHOST_BACKEND_TYPE_VDPA = 3,
+    VHOST_BACKEND_TYPE_MAX = 4,
 } VhostBackendType;
 
 typedef enum VhostSetConfigType {
@@ -77,6 +78,7 @@ typedef int (*vhost_reset_device_op)(struct vhost_dev *dev);
 typedef int (*vhost_get_vq_index_op)(struct vhost_dev *dev, int idx);
 typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev,
                                          int enable);
+typedef int (*vhost_set_vring_ready_op)(struct vhost_dev *dev);
 typedef bool (*vhost_requires_shm_log_op)(struct vhost_dev *dev);
 typedef int (*vhost_migration_done_op)(struct vhost_dev *dev,
                                        char *mac_addr);
@@ -113,6 +115,7 @@ typedef int (*vhost_set_inflight_fd_op)(struct vhost_dev *dev,
                                         struct vhost_inflight *inflight);
 
 typedef int (*vhost_set_state_op)(struct vhost_dev *dev, uint8_t state);
+typedef int (*vhost_get_device_id_op)(struct vhost_dev *dev, uint32_t *dev_id);
 typedef struct VhostOps {
     VhostBackendType backend_type;
     vhost_backend_init vhost_backend_init;
@@ -139,6 +142,7 @@ typedef struct VhostOps {
     vhost_reset_device_op vhost_reset_device;
     vhost_get_vq_index_op vhost_get_vq_index;
     vhost_set_vring_enable_op vhost_set_vring_enable;
+    vhost_set_vring_ready_op vhost_set_vring_ready;
     vhost_requires_shm_log_op vhost_requires_shm_log;
     vhost_migration_done_op vhost_migration_done;
     vhost_backend_can_merge_op vhost_backend_can_merge;
@@ -154,9 +158,11 @@ typedef struct VhostOps {
     vhost_get_inflight_fd_op vhost_get_inflight_fd;
     vhost_set_inflight_fd_op vhost_set_inflight_fd;
     vhost_set_state_op vhost_set_state;
+    vhost_get_device_id_op vhost_get_device_id;
 } VhostOps;
 
 extern const VhostOps user_ops;
+extern const VhostOps vdpa_ops;
 
 int vhost_set_backend_type(struct vhost_dev *dev,
                            VhostBackendType backend_type);
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
new file mode 100644
index 0000000000..9d64fbfea9
--- /dev/null
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -0,0 +1,25 @@
+/*
+ * vhost-vdpa.h
+ *
+ * Copyright(c) 2017-2018 Intel Corporation.
+ * Copyright(c) 2020 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef HW_VIRTIO_VHOST_VDPA_H
+#define HW_VIRTIO_VHOST_VDPA_H
+
+#include "hw/virtio/virtio.h"
+
+typedef struct vhost_vdpa {
+    int device_fd;
+    uint32_t msg_type;
+    MemoryListener listener;
+} VhostVDPA;
+
+extern AddressSpace address_space_memory;
+
+#endif
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 085450c6f8..020956d5a9 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -124,6 +124,7 @@ int vhost_dev_get_config(struct vhost_dev *dev, uint8_t *config,
                          uint32_t config_len);
 int vhost_dev_set_config(struct vhost_dev *dev, const uint8_t *data,
                          uint32_t offset, uint32_t size, uint32_t flags);
+int vhost_dev_get_device_id(struct vhost_dev *hdev, uint32_t *device_id);
 /* notifier callback in case vhost device config space changed
  */
 void vhost_dev_set_config_notifier(struct vhost_dev *dev,
-- 
2.21.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC v2 6/9] virtio-bus: introduce queue_enabled method
  2020-05-08 16:32 [RFC v2 0/9] vDPA support in qemu Cindy Lu
                   ` (4 preceding siblings ...)
  2020-05-08 16:32 ` [RFC v2 5/9] vhost-vdpa: implement vhost-vdpa backend Cindy Lu
@ 2020-05-08 16:32 ` Cindy Lu
  2020-05-09  3:01   ` Jason Wang
  2020-05-08 16:32 ` [RFC v2 7/9] virito-pci: implement " Cindy Lu
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Cindy Lu @ 2020-05-08 16:32 UTC (permalink / raw)
  To: mst, armbru, eblake, cohuck, jasowang
  Cc: mhabets, qemu-devel, rob.miller, saugatm, lulu, hanand, hch,
	eperezma, jgg, shahafs, kevin.tian, parav, vmireyno,
	cunming.liang, gdawar, jiri, xiao.w.wang, stefanha, zhihong.wang,
	aadam, rdunlap, maxime.coquelin, lingshan.zhu

From: Jason Wang <jasowang@redhat.com>

This patch introduces queue_enabled() method which allows the
transport to implement its own way to report whether or not a queue is
enabled.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/virtio.c             | 6 ++++++
 include/hw/virtio/virtio-bus.h | 4 ++++
 2 files changed, 10 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 04716b5f6c..09732a8836 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3169,6 +3169,12 @@ hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n)
 
 bool virtio_queue_enabled(VirtIODevice *vdev, int n)
 {
+    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+
+    if (k->queue_enabled)
+        return k->queue_enabled(qbus->parent, n);
+
     return virtio_queue_get_desc_addr(vdev, n) != 0;
 }
 
diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
index 38c9399cd4..0f6f215925 100644
--- a/include/hw/virtio/virtio-bus.h
+++ b/include/hw/virtio/virtio-bus.h
@@ -83,6 +83,10 @@ typedef struct VirtioBusClass {
      */
     int (*ioeventfd_assign)(DeviceState *d, EventNotifier *notifier,
                             int n, bool assign);
+    /*
+     * Whether queue number n is enabled.
+     */
+    bool (*queue_enabled)(DeviceState *d, int n);
     /*
      * Does the transport have variable vring alignment?
      * (ie can it ever call virtio_queue_set_align()?)
-- 
2.21.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC v2 7/9] virito-pci: implement queue_enabled method
  2020-05-08 16:32 [RFC v2 0/9] vDPA support in qemu Cindy Lu
                   ` (5 preceding siblings ...)
  2020-05-08 16:32 ` [RFC v2 6/9] virtio-bus: introduce queue_enabled method Cindy Lu
@ 2020-05-08 16:32 ` Cindy Lu
  2020-05-09  3:02   ` Jason Wang
  2020-05-09 12:01   ` Philippe Mathieu-Daudé
  2020-05-08 16:32 ` [RFC v2 8/9] vhost_net: set vq ready during start if necessary Cindy Lu
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 33+ messages in thread
From: Cindy Lu @ 2020-05-08 16:32 UTC (permalink / raw)
  To: mst, armbru, eblake, cohuck, jasowang
  Cc: mhabets, qemu-devel, rob.miller, saugatm, lulu, hanand, hch,
	eperezma, jgg, shahafs, kevin.tian, parav, vmireyno,
	cunming.liang, gdawar, jiri, xiao.w.wang, stefanha, zhihong.wang,
	aadam, rdunlap, maxime.coquelin, lingshan.zhu

From: Jason Wang <jasowang@redhat.com>

With version 1, we can detect whether a queue is enabled via
queue_enabled.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/virtio-pci.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index c6b47a9c73..4aaf5d953e 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1103,6 +1103,18 @@ static AddressSpace *virtio_pci_get_dma_as(DeviceState *d)
     return pci_get_address_space(dev);
 }
 
+static bool virtio_pci_queue_enabled(DeviceState *d, int n)
+{
+    VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
+    VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
+
+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+        return proxy->vqs[vdev->queue_sel].enabled;
+    }
+
+    return virtio_queue_get_desc_addr(vdev, n) != 0;
+}
+
 static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
                                    struct virtio_pci_cap *cap)
 {
@@ -2053,6 +2065,7 @@ static void virtio_pci_bus_class_init(ObjectClass *klass, void *data)
     k->ioeventfd_enabled = virtio_pci_ioeventfd_enabled;
     k->ioeventfd_assign = virtio_pci_ioeventfd_assign;
     k->get_dma_as = virtio_pci_get_dma_as;
+    k->queue_enabled = virtio_pci_queue_enabled;
 }
 
 static const TypeInfo virtio_pci_bus_info = {
-- 
2.21.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC v2 8/9] vhost_net: set vq ready during start if necessary
  2020-05-08 16:32 [RFC v2 0/9] vDPA support in qemu Cindy Lu
                   ` (6 preceding siblings ...)
  2020-05-08 16:32 ` [RFC v2 7/9] virito-pci: implement " Cindy Lu
@ 2020-05-08 16:32 ` Cindy Lu
  2020-05-09  3:03   ` Jason Wang
  2020-05-08 16:32 ` [RFC v2 9/9] vhost: introduce vhost_set_vring_ready method Cindy Lu
  2020-05-09  3:10 ` [RFC v2 0/9] vDPA support in qemu Jason Wang
  9 siblings, 1 reply; 33+ messages in thread
From: Cindy Lu @ 2020-05-08 16:32 UTC (permalink / raw)
  To: mst, armbru, eblake, cohuck, jasowang
  Cc: mhabets, qemu-devel, rob.miller, saugatm, lulu, hanand, hch,
	eperezma, jgg, shahafs, kevin.tian, parav, vmireyno,
	cunming.liang, gdawar, jiri, xiao.w.wang, stefanha, zhihong.wang,
	aadam, rdunlap, maxime.coquelin, lingshan.zhu

From: Jason Wang <jasowang@redhat.com>

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/net/vhost_net.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 1af39abaf3..eff9ec9177 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -383,6 +383,10 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
                 goto err_start;
             }
         }
+
+        if (virtio_queue_enabled(dev, i)) {
+            vhost_set_vring_ready(peer);
+        }
     }
 
     return 0;
-- 
2.21.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC v2 9/9] vhost: introduce vhost_set_vring_ready method
  2020-05-08 16:32 [RFC v2 0/9] vDPA support in qemu Cindy Lu
                   ` (7 preceding siblings ...)
  2020-05-08 16:32 ` [RFC v2 8/9] vhost_net: set vq ready during start if necessary Cindy Lu
@ 2020-05-08 16:32 ` Cindy Lu
  2020-05-09  3:05   ` Jason Wang
  2020-05-09  3:10 ` [RFC v2 0/9] vDPA support in qemu Jason Wang
  9 siblings, 1 reply; 33+ messages in thread
From: Cindy Lu @ 2020-05-08 16:32 UTC (permalink / raw)
  To: mst, armbru, eblake, cohuck, jasowang
  Cc: mhabets, qemu-devel, rob.miller, saugatm, lulu, hanand, hch,
	eperezma, jgg, shahafs, kevin.tian, parav, vmireyno,
	cunming.liang, gdawar, jiri, xiao.w.wang, stefanha, zhihong.wang,
	aadam, rdunlap, maxime.coquelin, lingshan.zhu

From: Jason Wang <jasowang@redhat.com>

Vhost-vdpa introduces VHOST_VDPA_SET_VRING_ENABLE which complies the
semantic of queue_enable defined in virtio spec. This method can be
used for preventing device from executing request for a specific
virtqueue. This patch introduces the vhost_ops for this.

Note that, we've already had vhost_set_vring_enable which has different
semantic which allows to enable or disable a specific virtqueue for
some kinds of vhost backends. E.g vhost-user use this to changes the
number of active queue pairs.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/net/vhost_net-stub.c |  4 ++++
 hw/net/vhost_net.c      | 11 ++++++++++-
 include/net/vhost_net.h |  1 +
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/net/vhost_net-stub.c b/hw/net/vhost_net-stub.c
index aac0e98228..43e93e1a9a 100644
--- a/hw/net/vhost_net-stub.c
+++ b/hw/net/vhost_net-stub.c
@@ -86,6 +86,10 @@ int vhost_set_vring_enable(NetClientState *nc, int enable)
     return 0;
 }
 
+int vhost_set_vring_ready(NetClientState *nc)
+{
+    return 0;
+}
 int vhost_net_set_mtu(struct vhost_net *net, uint16_t mtu)
 {
     return 0;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index eff9ec9177..6911282a0a 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -375,7 +375,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
             goto err_start;
         }
 
-        if (ncs[i].peer->vring_enable) {
+        if (peer->vring_enable) {
             /* restore vring enable state */
             r = vhost_set_vring_enable(peer, peer->vring_enable);
 
@@ -496,6 +496,15 @@ int vhost_set_vring_enable(NetClientState *nc, int enable)
     return 0;
 }
 
+int vhost_set_vring_ready(NetClientState *nc)
+{
+    VHostNetState *net = get_vhost_net(nc);
+    const VhostOps *vhost_ops = net->dev.vhost_ops;
+    if (vhost_ops && vhost_ops->vhost_set_vring_ready) {
+        return vhost_ops->vhost_set_vring_ready(&net->dev);
+    }
+    return 0;
+}
 int vhost_net_set_mtu(struct vhost_net *net, uint16_t mtu)
 {
     const VhostOps *vhost_ops = net->dev.vhost_ops;
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index b47844bf29..247432a3b2 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -35,6 +35,7 @@ int vhost_net_notify_migration_done(VHostNetState *net, char* mac_addr);
 VHostNetState *get_vhost_net(NetClientState *nc);
 
 int vhost_set_vring_enable(NetClientState * nc, int enable);
+int vhost_set_vring_ready(NetClientState *nc);
 
 uint64_t vhost_net_get_acked_features(VHostNetState *net);
 
-- 
2.21.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RFC v2 4/9] vhost-vdpa: introduce vhost-vdpa net client
  2020-05-08 16:32 ` [RFC v2 4/9] vhost-vdpa: introduce vhost-vdpa net client Cindy Lu
@ 2020-05-08 16:41   ` Eric Blake
  2020-05-09  7:17     ` Cindy Lu
  2020-05-09  2:40   ` Jason Wang
  1 sibling, 1 reply; 33+ messages in thread
From: Eric Blake @ 2020-05-08 16:41 UTC (permalink / raw)
  To: Cindy Lu, mst, armbru, cohuck, jasowang
  Cc: mhabets, qemu-devel, rob.miller, saugatm, hanand, hch, eperezma,
	jgg, shahafs, kevin.tian, parav, vmireyno, cunming.liang, gdawar,
	jiri, xiao.w.wang, stefanha, zhihong.wang, Tiwei Bie, aadam,
	rdunlap, maxime.coquelin, lingshan.zhu

On 5/8/20 11:32 AM, Cindy Lu wrote:
> From: Tiwei Bie <tiwei.bie@intel.com>
> 
> This patch set introduces a new net client type: vhost-vdpa.
> vhost-vdpa net client will set up a vDPA device which is specified
> by a "vhostdev" parameter.
> 
> Co-authored-by: Lingshan Zhu <lingshan.zhu@intel.com>
> Signed-off-by: Cindy Lu <lulu@redhat.com>
> ---

Just looking at UI:


> +++ b/qapi/net.json
> @@ -441,6 +441,23 @@
>       '*queues':        'int' } }
>   
>   ##
> +# @NetdevVhostVDPAOptions:
> +#
> +# Vhost-vdpa network backend
> +#
> +# @vhostdev: name of a vdpa dev path in sysfs
> +#
> +# @queues: number of queues to be created for multiqueue vhost-vdpa
> +#          (default: 1) (Since 5.1)

No need to mark a 'since' tag on a member introduced at the same time as 
the overall struct itself.

> +#
> +# Since: 5.1
> +##
> +{ 'struct': 'NetdevVhostVDPAOptions',
> +  'data': {
> +    '*vhostdev':     'str',

What does this default to if omitted?

> +    '*fd':           'str',

Not documented above.

> +    '*queues':       'int' } }
> +##

Missing blank line separator.

>   # @NetClientDriver:
>   #
>   # Available netdev drivers.
> @@ -451,7 +468,7 @@
>   ##
>   { 'enum': 'NetClientDriver',
>     'data': [ 'none', 'nic', 'user', 'tap', 'l2tpv3', 'socket', 'vde',
> -            'bridge', 'hubport', 'netmap', 'vhost-user' ] }
> +            'bridge', 'hubport', 'netmap', 'vhost-user', 'vhost-vdpa' ] }

Missing a line above that 'vhost-vdpa' is new to 5.1.


> @@ -2749,6 +2756,18 @@ qemu -m 512 -object memory-backend-file,id=mem,size=512M,mem-path=/hugetlbfs,sha
>        -device virtio-net-pci,netdev=net0
>   @end example
>   
> +@item -netdev vhost-vdpa,vhostdev=/path/to/dev
> +Establish a vhost-vdpa netdev, backed by a vhostdev. The chardev should
> +be a unix domain socket backed one. The vhost-vdpa uses a specifically defined
> +protocol to pass vhost ioctl replacement messages to an application on the other
> +end of the socket.
> +Example:
> +@example
> +qemu -m 512 -object memory-backend-file,id=mem,size=512M,mem-path=/hugetlbfs,share=on \
> +     -numa node,memdev=mem \
> +     -netdev type=vhost-vdpa,id=net0,vhostdev=/path/to/dev \
> +     -device virtio-net-pci,netdev=net0
> +@end example
>   @item -netdev hubport,id=@var{id},hubid=@var{hubid}[,netdev=@var{nd}]
>   
>   Create a hub port on the emulated hub with ID @var{hubid}.
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 2/9] net: use the function qemu_get_peer
  2020-05-08 16:32 ` [RFC v2 2/9] net: use the function qemu_get_peer Cindy Lu
@ 2020-05-09  2:19   ` Jason Wang
  2020-05-09  6:49     ` Cindy Lu
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Wang @ 2020-05-09  2:19 UTC (permalink / raw)
  To: Cindy Lu, mst, armbru, eblake, cohuck
  Cc: mhabets, qemu-devel, rob.miller, saugatm, hanand, hch, eperezma,
	jgg, shahafs, kevin.tian, parav, vmireyno, cunming.liang, gdawar,
	jiri, xiao.w.wang, stefanha, zhihong.wang, aadam, rdunlap,
	maxime.coquelin, lingshan.zhu


On 2020/5/9 上午12:32, Cindy Lu wrote:
> user the qemu_get_peer to replace the old process


The title should be "vhost_net: use the function qemu_get_peer".

Thanks


>
> Signed-off-by: Cindy Lu <lulu@redhat.com>
> ---
>   hw/net/vhost_net.c | 14 +++++++++-----
>   1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 6b82803fa7..d1d421e3d9 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -306,7 +306,9 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>       BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
>       VirtioBusState *vbus = VIRTIO_BUS(qbus);
>       VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
> +    struct vhost_net *net;
>       int r, e, i;
> +    NetClientState *peer;
>   
>       if (!k->set_guest_notifiers) {
>           error_report("binding does not support guest notifiers");
> @@ -314,9 +316,9 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>       }
>   
>       for (i = 0; i < total_queues; i++) {
> -        struct vhost_net *net;
>   
> -        net = get_vhost_net(ncs[i].peer);
> +        peer = qemu_get_peer(ncs, i);
> +        net = get_vhost_net(peer);
>           vhost_net_set_vq_index(net, i * 2);
>   
>           /* Suppress the masking guest notifiers on vhost user
> @@ -335,7 +337,8 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>       }
>   
>       for (i = 0; i < total_queues; i++) {
> -        r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev);
> +        peer = qemu_get_peer(ncs, i);
> +        r = vhost_net_start_one(get_vhost_net(peer), dev);
>   
>           if (r < 0) {
>               goto err_start;
> @@ -343,7 +346,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>   
>           if (ncs[i].peer->vring_enable) {
>               /* restore vring enable state */
> -            r = vhost_set_vring_enable(ncs[i].peer, ncs[i].peer->vring_enable);
> +            r = vhost_set_vring_enable(peer, peer->vring_enable);
>   
>               if (r < 0) {
>                   goto err_start;
> @@ -355,7 +358,8 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>   
>   err_start:
>       while (--i >= 0) {
> -        vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
> +        peer = qemu_get_peer(ncs , i);
> +        vhost_net_stop_one(get_vhost_net(peer), dev);
>       }
>       e = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
>       if (e < 0) {



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 3/9] virtio_net: introduce vhost_set_state
  2020-05-08 16:32 ` [RFC v2 3/9] virtio_net: introduce vhost_set_state Cindy Lu
@ 2020-05-09  2:25   ` Jason Wang
  2020-05-09  7:08     ` Cindy Lu
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Wang @ 2020-05-09  2:25 UTC (permalink / raw)
  To: Cindy Lu, mst, armbru, eblake, cohuck
  Cc: mhabets, qemu-devel, rob.miller, saugatm, hanand, hch, eperezma,
	jgg, shahafs, kevin.tian, parav, vmireyno, cunming.liang, gdawar,
	jiri, xiao.w.wang, stefanha, zhihong.wang, aadam, rdunlap,
	maxime.coquelin, lingshan.zhu


On 2020/5/9 上午12:32, Cindy Lu wrote:
> Introduce a function to set the state to the vhost driver.
> vDPA need to sync the driver's state to NIC


Let's split this patch into two.

1) introduce vhost_set_state
2) make virtio-net use of vhost_set_state


>
> Signed-off-by: Cindy Lu <lulu@redhat.com>
> ---
>   hw/net/vhost_net.c                | 9 +++++++++
>   hw/net/virtio-net.c               | 9 +++++++++
>   include/hw/virtio/vhost-backend.h | 2 ++
>   include/net/vhost_net.h           | 2 +-
>   4 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index d1d421e3d9..63b2a85d6e 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -465,3 +465,12 @@ int vhost_net_set_mtu(struct vhost_net *net, uint16_t mtu)
>   
>       return vhost_ops->vhost_net_set_mtu(&net->dev, mtu);
>   }
> +int vhost_set_state(NetClientState *nc, uint8_t state)
> +{
> +    struct vhost_net *net = get_vhost_net(nc);
> +    struct vhost_dev *hdev = &net->dev;
> +        if (hdev->vhost_ops->vhost_set_state) {


Indentation looks wrong.


> +                return hdev->vhost_ops->vhost_set_state(hdev, state);
> +        }
> +    return 0;
> +}
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index db3d7c38e6..1bddb4b4af 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -206,6 +206,9 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>       VirtIODevice *vdev = VIRTIO_DEVICE(n);
>       NetClientState *nc = qemu_get_queue(n->nic);
>       int queues = n->multiqueue ? n->max_queues : 1;
> +    NetClientState *peer = qemu_get_peer(nc, 0);
> +    uint8_t status_set  = vdev->status ;
> +    uint8_t vhost_started_pre = n->vhost_started;
>   
>       if (!get_vhost_net(nc->peer)) {
>           return;
> @@ -245,6 +248,7 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>                   return;
>               }
>           }
> +        status_set = status_set | VIRTIO_CONFIG_S_DRIVER_OK;
>   
>           n->vhost_started = 1;
>           r = vhost_net_start(vdev, n->nic->ncs, queues);
> @@ -252,11 +256,16 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>               error_report("unable to start vhost net: %d: "
>                            "falling back on userspace virtio", -r);
>               n->vhost_started = 0;
> +            status_set = status_set & ~VIRTIO_CONFIG_S_DRIVER_OK;
>           }
>       } else {
>           vhost_net_stop(vdev, n->nic->ncs, queues);
> +        status_set = status_set & ~VIRTIO_CONFIG_S_DRIVER_OK;
>           n->vhost_started = 0;
>       }
> +    if (vhost_started_pre != n->vhost_started) {
> +            vhost_set_state(peer, status_set);


Any reason why not just passing virtio device status to vhost-vdpa?


> +    }
>   }
>   
>   static int virtio_net_set_vnet_endian_one(VirtIODevice *vdev,
> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> index 6f6670783f..f823055167 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -112,6 +112,7 @@ typedef int (*vhost_get_inflight_fd_op)(struct vhost_dev *dev,
>   typedef int (*vhost_set_inflight_fd_op)(struct vhost_dev *dev,
>                                           struct vhost_inflight *inflight);
>   
> +typedef int (*vhost_set_state_op)(struct vhost_dev *dev, uint8_t state);


Need document what's the meaning of state here, is it e.g virtio device 
status? If yes, is it better to rename it to vhost_set_status()?

Thanks


>   typedef struct VhostOps {
>       VhostBackendType backend_type;
>       vhost_backend_init vhost_backend_init;
> @@ -152,6 +153,7 @@ typedef struct VhostOps {
>       vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter;
>       vhost_get_inflight_fd_op vhost_get_inflight_fd;
>       vhost_set_inflight_fd_op vhost_set_inflight_fd;
> +    vhost_set_state_op vhost_set_state;
>   } VhostOps;
>   
>   extern const VhostOps user_ops;
> diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
> index 77e47398c4..6548a5a105 100644
> --- a/include/net/vhost_net.h
> +++ b/include/net/vhost_net.h
> @@ -39,5 +39,5 @@ int vhost_set_vring_enable(NetClientState * nc, int enable);
>   uint64_t vhost_net_get_acked_features(VHostNetState *net);
>   
>   int vhost_net_set_mtu(struct vhost_net *net, uint16_t mtu);
> -
> +int vhost_set_state(NetClientState *nc, uint8_t state);
>   #endif



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 4/9] vhost-vdpa: introduce vhost-vdpa net client
  2020-05-08 16:32 ` [RFC v2 4/9] vhost-vdpa: introduce vhost-vdpa net client Cindy Lu
  2020-05-08 16:41   ` Eric Blake
@ 2020-05-09  2:40   ` Jason Wang
  2020-05-09  7:31     ` Cindy Lu
  1 sibling, 1 reply; 33+ messages in thread
From: Jason Wang @ 2020-05-09  2:40 UTC (permalink / raw)
  To: Cindy Lu, mst, armbru, eblake, cohuck
  Cc: mhabets, qemu-devel, rob.miller, saugatm, hanand, hch, eperezma,
	jgg, shahafs, kevin.tian, parav, vmireyno, cunming.liang, gdawar,
	jiri, xiao.w.wang, stefanha, zhihong.wang, Tiwei Bie, aadam,
	rdunlap, maxime.coquelin, lingshan.zhu


On 2020/5/9 上午12:32, Cindy Lu wrote:
> From: Tiwei Bie <tiwei.bie@intel.com>


If you think you've done a huge refactor on the code, you can change the 
author but need to keep the sob of Tiwei.


>
> This patch set introduces a new net client type: vhost-vdpa.
> vhost-vdpa net client will set up a vDPA device which is specified
> by a "vhostdev" parameter.
>
> Co-authored-by: Lingshan Zhu <lingshan.zhu@intel.com>
> Signed-off-by: Cindy Lu <lulu@redhat.com>
> ---
>   configure                |  21 ++++
>   include/net/vhost-vdpa.h |  19 ++++
>   include/net/vhost_net.h  |   1 +


Patch 5 which is the infrastructure of vhost-vpda should come first. 
Please re-order the patch in next version.


>   net/Makefile.objs        |   2 +-
>   net/clients.h            |   2 +
>   net/net.c                |   3 +
>   net/vhost-vdpa.c         | 227 +++++++++++++++++++++++++++++++++++++++
>   qapi/net.json            |  22 +++-
>   qemu-options.hx          |  19 ++++
>   9 files changed, 313 insertions(+), 3 deletions(-)
>   create mode 100644 include/net/vhost-vdpa.h
>   create mode 100644 net/vhost-vdpa.c
>
> diff --git a/configure b/configure
> index 6099be1d84..bdd732e3bb 100755
> --- a/configure
> +++ b/configure
> @@ -1505,6 +1505,10 @@ for opt do
>     ;;
>     --enable-vhost-user) vhost_user="yes"
>     ;;
> +  --disable-vhost-vdpa) vhost_vdpa="no"
> +  ;;
> +  --enable-vhost-vdpa) vhost_vdpa="yes"
> +  ;;
>     --disable-vhost-kernel) vhost_kernel="no"
>     ;;
>     --enable-vhost-kernel) vhost_kernel="yes"
> @@ -1780,6 +1784,7 @@ disabled with --disable-FEATURE, default is enabled if available:
>     vhost-crypto    vhost-user-crypto backend support
>     vhost-kernel    vhost kernel backend support
>     vhost-user      vhost-user backend support
> +  vhost-vdpa      vhost-vdpa backend support


Maybe "vhost-vdpa kernel backend support" is better.


>     spice           spice
>     rbd             rados block device (rbd)
>     libiscsi        iscsi support
> @@ -2241,6 +2246,10 @@ test "$vhost_user" = "" && vhost_user=yes
>   if test "$vhost_user" = "yes" && test "$mingw32" = "yes"; then
>     error_exit "vhost-user isn't available on win32"
>   fi
> +test "$vhost_vdpa" = "" && vhost_vdpa=yes
> +if test "$vhost_vdpa" = "yes" && test "$mingw32" = "yes"; then
> +  error_exit "vhost-vdpa isn't available on win32"
> +fi


Let's add a check for Linux like vhost kernel below.


>   test "$vhost_kernel" = "" && vhost_kernel=$linux
>   if test "$vhost_kernel" = "yes" && test "$linux" != "yes"; then
>     error_exit "vhost-kernel is only available on Linux"
> @@ -2269,6 +2278,11 @@ test "$vhost_user_fs" = "" && vhost_user_fs=$vhost_user
>   if test "$vhost_user_fs" = "yes" && test "$vhost_user" = "no"; then
>     error_exit "--enable-vhost-user-fs requires --enable-vhost-user"
>   fi
> +#vhost-vdpa backends
> +test "$vhost_net_vdpa" = "" && vhost_net_vdpa=$vhost_vdpa
> +if test "$vhost_net_vdpa" = "yes" && test "$vhost_vdpa" = "no"; then
> +  error_exit "--enable-vhost-net-vdpa requires --enable-vhost-vdpa"
> +fi
>   
>   # OR the vhost-kernel and vhost-user values for simplicity
>   if test "$vhost_net" = ""; then
> @@ -6543,6 +6557,7 @@ echo "vhost-scsi support $vhost_scsi"
>   echo "vhost-vsock support $vhost_vsock"
>   echo "vhost-user support $vhost_user"
>   echo "vhost-user-fs support $vhost_user_fs"
> +echo "vhost-vdpa support $vhost_vdpa"
>   echo "Trace backends    $trace_backends"
>   if have_backend "simple"; then
>   echo "Trace output file $trace_file-<pid>"
> @@ -7031,6 +7046,9 @@ fi
>   if test "$vhost_net_user" = "yes" ; then
>     echo "CONFIG_VHOST_NET_USER=y" >> $config_host_mak
>   fi
> +if test "$vhost_net_vdpa" = "yes" ; then
> +  echo "CONFIG_VHOST_NET_VDPA=y" >> $config_host_mak
> +fi
>   if test "$vhost_crypto" = "yes" ; then
>     echo "CONFIG_VHOST_CRYPTO=y" >> $config_host_mak
>   fi
> @@ -7043,6 +7061,9 @@ fi
>   if test "$vhost_user" = "yes" ; then
>     echo "CONFIG_VHOST_USER=y" >> $config_host_mak
>   fi
> +if test "$vhost_vdpa" = "yes" ; then
> +  echo "CONFIG_VHOST_VDPA=y" >> $config_host_mak
> +fi
>   if test "$vhost_user_fs" = "yes" ; then
>     echo "CONFIG_VHOST_USER_FS=y" >> $config_host_mak
>   fi
> diff --git a/include/net/vhost-vdpa.h b/include/net/vhost-vdpa.h
> new file mode 100644
> index 0000000000..6ce0d04f72
> --- /dev/null
> +++ b/include/net/vhost-vdpa.h
> @@ -0,0 +1,19 @@
> +/*
> + * vhost-vdpa.h
> + *
> + * Copyright(c) 2017-2018 Intel Corporation.
> + * Copyright(c) 2020 Red Hat, Inc.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef VHOST_VDPA_H
> +#define VHOST_VDPA_H
> +
> +struct vhost_net;
> +struct vhost_net *vhost_vdpa_get_vhost_net(NetClientState *nc);
> +uint64_t vhost_vdpa_get_acked_features(NetClientState *nc);
> +
> +#endif /* VHOST_VDPA_H */
> diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
> index 6548a5a105..b47844bf29 100644
> --- a/include/net/vhost_net.h
> +++ b/include/net/vhost_net.h
> @@ -40,4 +40,5 @@ uint64_t vhost_net_get_acked_features(VHostNetState *net);
>   
>   int vhost_net_set_mtu(struct vhost_net *net, uint16_t mtu);
>   int vhost_set_state(NetClientState *nc, uint8_t state);
> +int vhost_net_get_device_id(struct vhost_net *net, uint32_t *device_id);


Let's move this function to vhost-vdpa generic header instead of net.


>   #endif
> diff --git a/net/Makefile.objs b/net/Makefile.objs
> index c5d076d19c..5ab45545db 100644
> --- a/net/Makefile.objs
> +++ b/net/Makefile.objs
> @@ -26,7 +26,7 @@ tap-obj-$(CONFIG_SOLARIS) = tap-solaris.o
>   tap-obj-y ?= tap-stub.o
>   common-obj-$(CONFIG_POSIX) += tap.o $(tap-obj-y)
>   common-obj-$(CONFIG_WIN32) += tap-win32.o
> -
> +common-obj-$(CONFIG_VHOST_NET_VDPA) += vhost-vdpa.o
>   vde.o-libs = $(VDE_LIBS)
>   
>   common-obj-$(CONFIG_CAN_BUS) += can/
> diff --git a/net/clients.h b/net/clients.h
> index a6ef267e19..92f9b59aed 100644
> --- a/net/clients.h
> +++ b/net/clients.h
> @@ -61,4 +61,6 @@ int net_init_netmap(const Netdev *netdev, const char *name,
>   int net_init_vhost_user(const Netdev *netdev, const char *name,
>                           NetClientState *peer, Error **errp);
>   
> +int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> +                        NetClientState *peer, Error **errp);
>   #endif /* QEMU_NET_CLIENTS_H */
> diff --git a/net/net.c b/net/net.c
> index b3192deaf1..9eff1ae982 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -965,6 +965,9 @@ static int (* const net_client_init_fun[NET_CLIENT_DRIVER__MAX])(
>   #ifdef CONFIG_VHOST_NET_USER
>           [NET_CLIENT_DRIVER_VHOST_USER] = net_init_vhost_user,
>   #endif
> +#ifdef CONFIG_VHOST_NET_VDPA
> +        [NET_CLIENT_DRIVER_VHOST_VDPA] = net_init_vhost_vdpa,
> +#endif
>   #ifdef CONFIG_L2TPV3
>           [NET_CLIENT_DRIVER_L2TPV3]    = net_init_l2tpv3,
>   #endif
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> new file mode 100644
> index 0000000000..c29678fdf1
> --- /dev/null
> +++ b/net/vhost-vdpa.c
> @@ -0,0 +1,227 @@
> +/*
> + * vhost-vdpa.c
> + *
> + * Copyright(c) 2017-2018 Intel Corporation.
> + * Copyright(c) 2020 Red Hat, Inc.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "clients.h"
> +#include "net/vhost_net.h"
> +#include "net/vhost-vdpa.h"
> +#include "hw/virtio/vhost-vdpa.h"
> +#include "qemu/config-file.h"
> +#include "qemu/error-report.h"
> +#include "qemu/option.h"
> +#include "qapi/error.h"
> +#include <linux/vfio.h>


No need any more.


> +#include <sys/ioctl.h>
> +#include <err.h>
> +#include <linux/virtio_net.h>


That's not the way we include standard linux headers, qemu maintain a 
copy of standard linux headers.

Need use #include "standard-headers/linux/xxx.h"


> +#include "monitor/monitor.h"
> +#include "qemu/sockets.h"


Do we really need this?


> +#include "hw/virtio/vhost.h"
> +
> +/* Todo:need to add the multiqueue support here */
> +typedef struct VhostVDPAState {
> +    NetClientState nc;
> +    struct vhost_vdpa vhost_vdpa;
> +    VHostNetState *vhost_net;
> +    uint64_t acked_features;
> +    bool started;
> +} VhostVDPAState;
> +
> +VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
> +{
> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> +    return s->vhost_net;
> +}
> +
> +uint64_t vhost_vdpa_get_acked_features(NetClientState *nc)
> +{
> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> +    return s->acked_features;
> +}
> +
> +static void vhost_vdpa_del(NetClientState *ncs)
> +{
> +    VhostVDPAState *s;
> +
> +    assert(ncs->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> +
> +    s = DO_UPCAST(VhostVDPAState, nc, ncs);
> +
> +    if (s->vhost_net) {
> +        /* save acked features */
> +        uint64_t features = vhost_net_get_acked_features(s->vhost_net);
> +        if (features) {
> +            s->acked_features = features;
> +         }
> +        vhost_net_cleanup(s->vhost_net);
> +    }
> +}
> +
> +static int vhost_vdpa_add(NetClientState *ncs, void *be)
> +{
> +    VhostNetOptions options;
> +    struct vhost_net *net = NULL;
> +    VhostVDPAState *s;
> +
> +    options.backend_type = VHOST_BACKEND_TYPE_VDPA;
> +
> +    assert(ncs->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> +
> +    s = DO_UPCAST(VhostVDPAState, nc, ncs);
> +
> +    options.net_backend = ncs;
> +    options.opaque      = be;
> +    options.busyloop_timeout = 0;
> +    net = vhost_net_init(&options);
> +    if (!net) {
> +        error_report("failed to init vhost_net for queue");
> +        goto err;
> +     }
> +
> +     if (s->vhost_net) {
> +        vhost_net_cleanup(s->vhost_net);
> +        g_free(s->vhost_net);
> +     }
> +     s->vhost_net = net;
> +
> +    return 0;
> +
> +err:
> +    if (net) {
> +        vhost_net_cleanup(net);
> +    }
> +    vhost_vdpa_del(ncs);
> +    return -1;
> +}
> +static void vhost_vdpa_cleanup(NetClientState *nc)
> +{
> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +
> +    if (s->vhost_net) {
> +        vhost_net_cleanup(s->vhost_net);
> +        g_free(s->vhost_net);
> +        s->vhost_net = NULL;
> +    }
> +
> +    qemu_purge_queued_packets(nc);
> +}
> +
> +static bool vhost_vdpa_has_vnet_hdr(NetClientState *nc)
> +{
> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> +
> +    return true;
> +}
> +
> +static bool vhost_vdpa_has_ufo(NetClientState *nc)
> +{
> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    uint64_t  features = 0;
> +
> +    features |= (1ULL << VIRTIO_NET_F_HOST_UFO);
> +    features = vhost_net_get_features(s->vhost_net, features);
> +    return !!(features & (1ULL << VIRTIO_NET_F_HOST_UFO));
> +
> +}
> +
> +static int vhost_vdpa_check_device_id(NetClientState *nc)
> +{
> +    uint32_t device_id;
> +    int ret;
> +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> +    /* Get the device id from hw*/
> +    ret = vhost_net_get_device_id(s->vhost_net, &device_id);
> +    if (device_id != VIRTIO_ID_NET) {
> +        return -ENOTSUP;
> +    }
> +    return ret;
> +}
> +
> +static NetClientInfo net_vhost_vdpa_info = {
> +        .type = NET_CLIENT_DRIVER_VHOST_VDPA,
> +        .size = sizeof(VhostVDPAState),
> +        .cleanup = vhost_vdpa_cleanup,
> +        .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
> +        .has_ufo = vhost_vdpa_has_ufo,
> +};
> +
> +static int net_vhost_vdpa_init(NetClientState *peer, const char *device,
> +                               const char *name, const char *vhostdev,
> +                               bool has_fd, char *fd)
> +{
> +    NetClientState *nc = NULL;
> +    VhostVDPAState *s;
> +    int vdpa_device_fd = -1;
> +    Error *err = NULL;
> +
> +    assert(name);
> +
> +    nc = qemu_new_net_client(&net_vhost_vdpa_info, peer, device, name);
> +    snprintf(nc->info_str, sizeof(nc->info_str), "vhost-vdpa");
> +    nc->queue_index = 0;
> +
> +    s = DO_UPCAST(VhostVDPAState, nc, nc);
> +
> +    if (has_fd) {
> +        vdpa_device_fd = monitor_fd_param(cur_mon, fd, &err);
> +    } else{
> +        vdpa_device_fd = open(vhostdev, O_RDWR);
> +    }
> +
> +    if (vdpa_device_fd == -1) {
> +        return -errno;
> +     }
> +    s->vhost_vdpa.device_fd = vdpa_device_fd;
> +    vhost_vdpa_add(nc, (void *)&s->vhost_vdpa);
> +    assert(s->vhost_net);
> +    /* check the device id for vdpa */
> +    return vhost_vdpa_check_device_id(nc);


We probably need to the check earlier. If we do things like this, we 
will probably leak vhost_device_fd.


> +}
> +
> +static int net_vhost_check_net(void *opaque, QemuOpts *opts, Error **errp)
> +{
> +    const char *name = opaque;
> +    const char *driver, *netdev;
> +
> +    driver = qemu_opt_get(opts, "driver");
> +    netdev = qemu_opt_get(opts, "netdev");
> +    if (!driver || !netdev) {
> +        return 0;
> +    }
> +
> +    if (strcmp(netdev, name) == 0 &&
> +        !g_str_has_prefix(driver, "virtio-net-")) {
> +        error_setg(errp, "vhost-vdpa requires frontend driver virtio-net-*");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> +                        NetClientState *peer, Error **errp)
> +{
> +    const NetdevVhostVDPAOptions *opts;
> +
> +    assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> +    opts = &netdev->u.vhost_vdpa;
> +    /* verify net frontend */
> +    if (qemu_opts_foreach(qemu_find_opts("device"), net_vhost_check_net,
> +                          (char *)name, errp)) {
> +        return -1;
> +    }
> +    return net_vhost_vdpa_init(peer, "vhost_vdpa", name, opts->vhostdev,
> +                    opts->has_fd, opts->fd);
> +}
> diff --git a/qapi/net.json b/qapi/net.json
> index 335295be50..0f4fa6fffc 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -441,6 +441,23 @@
>       '*queues':        'int' } }
>   
>   ##
> +# @NetdevVhostVDPAOptions:
> +#
> +# Vhost-vdpa network backend
> +#
> +# @vhostdev: name of a vdpa dev path in sysfs
> +#
> +# @queues: number of queues to be created for multiqueue vhost-vdpa
> +#          (default: 1) (Since 5.1)
> +#
> +# Since: 5.1
> +##
> +{ 'struct': 'NetdevVhostVDPAOptions',
> +  'data': {
> +    '*vhostdev':     'str',
> +    '*fd':           'str',
> +    '*queues':       'int' } }
> +##
>   # @NetClientDriver:
>   #
>   # Available netdev drivers.
> @@ -451,7 +468,7 @@
>   ##
>   { 'enum': 'NetClientDriver',
>     'data': [ 'none', 'nic', 'user', 'tap', 'l2tpv3', 'socket', 'vde',
> -            'bridge', 'hubport', 'netmap', 'vhost-user' ] }
> +            'bridge', 'hubport', 'netmap', 'vhost-user', 'vhost-vdpa' ] }
>   
>   ##
>   # @Netdev:
> @@ -479,7 +496,8 @@
>       'bridge':   'NetdevBridgeOptions',
>       'hubport':  'NetdevHubPortOptions',
>       'netmap':   'NetdevNetmapOptions',
> -    'vhost-user': 'NetdevVhostUserOptions' } }
> +    'vhost-user': 'NetdevVhostUserOptions',
> +    'vhost-vdpa': 'NetdevVhostVDPAOptions' } }
>   
>   ##
>   # @NetLegacy:
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 65c9473b73..08256d9d4e 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2291,6 +2291,10 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
>   #ifdef CONFIG_POSIX
>       "-netdev vhost-user,id=str,chardev=dev[,vhostforce=on|off]\n"
>       "                configure a vhost-user network, backed by a chardev 'dev'\n"
> +#endif
> +#ifdef CONFIG_POSIX
> +    "-netdev vhost-vdpa,id=str,vhostdev=/path/to/dev\n"
> +    "                configure a vhost-vdpa network, backed by a vhostdev 'dev'\n"
>   #endif
>       "-netdev hubport,id=str,hubid=n[,netdev=nd]\n"
>       "                configure a hub port on the hub with ID 'n'\n", QEMU_ARCH_ALL)
> @@ -2310,6 +2314,9 @@ DEF("nic", HAS_ARG, QEMU_OPTION_nic,
>   #endif
>   #ifdef CONFIG_POSIX
>       "vhost-user|"
> +#endif
> +#ifdef CONFIG_POSIX
> +    "vhost-vdpa|"
>   #endif
>       "socket][,option][,...][mac=macaddr]\n"
>       "                initialize an on-board / default host NIC (using MAC address\n"
> @@ -2749,6 +2756,18 @@ qemu -m 512 -object memory-backend-file,id=mem,size=512M,mem-path=/hugetlbfs,sha
>        -device virtio-net-pci,netdev=net0
>   @end example
>   
> +@item -netdev vhost-vdpa,vhostdev=/path/to/dev
> +Establish a vhost-vdpa netdev, backed by a vhostdev. The chardev should
> +be a unix domain socket backed one.


This seems wrong, we don't use unix domain socket.

Thanks


>   The vhost-vdpa uses a specifically defined
> +protocol to pass vhost ioctl replacement messages to an application on the other
> +end of the socket.
> +Example:
> +@example
> +qemu -m 512 -object memory-backend-file,id=mem,size=512M,mem-path=/hugetlbfs,share=on \
> +     -numa node,memdev=mem \
> +     -netdev type=vhost-vdpa,id=net0,vhostdev=/path/to/dev \
> +     -device virtio-net-pci,netdev=net0
> +@end example
>   @item -netdev hubport,id=@var{id},hubid=@var{hubid}[,netdev=@var{nd}]
>   
>   Create a hub port on the emulated hub with ID @var{hubid}.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 5/9] vhost-vdpa: implement vhost-vdpa backend
  2020-05-08 16:32 ` [RFC v2 5/9] vhost-vdpa: implement vhost-vdpa backend Cindy Lu
@ 2020-05-09  3:00   ` Jason Wang
  2020-05-09  3:07     ` Jason Wang
  2020-05-09  8:14     ` Cindy Lu
  2020-05-21 12:40   ` Stefan Hajnoczi
  1 sibling, 2 replies; 33+ messages in thread
From: Jason Wang @ 2020-05-09  3:00 UTC (permalink / raw)
  To: Cindy Lu, mst, armbru, eblake, cohuck
  Cc: mhabets, qemu-devel, rob.miller, saugatm, hanand, hch, eperezma,
	jgg, shahafs, kevin.tian, parav, vmireyno, cunming.liang, gdawar,
	jiri, xiao.w.wang, stefanha, zhihong.wang, Tiwei Bie, aadam,
	rdunlap, maxime.coquelin, lingshan.zhu


On 2020/5/9 上午12:32, Cindy Lu wrote:
> From: Tiwei Bie <tiwei.bie@intel.com>
>
> Currently we have 2 types of vhost backends in QEMU: vhost kernel and
> vhost-user. The above patch provides a generic device for vDPA purpose,
> this vDPA device exposes to user space a non-vendor-specific configuration
> interface for setting up a vhost HW accelerator, this patch set introduces
> a third vhost backend called vhost-vdpa based on the vDPA interface.
>
> Vhost-vdpa usage:
>
>    qemu-system-x86_64 -cpu host -enable-kvm \
>      ......
>    -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-id,id=vhost-vdpa0 \
>    -device virtio-net-pci,netdev=vhost-vdpa0,page-per-vq=on \
>
> Co-Authored-By: Lingshan zhu <lingshan.zhu@intel.com>
> Signed-off-by: Cindy Lu <lulu@redhat.com>


Signed-off-by: Jason Wang <jasowang@redhat.com>


> ---
>   hw/net/vhost_net.c                |  39 ++-
>   hw/virtio/Makefile.objs           |   1 +
>   hw/virtio/vhost-backend.c         |   5 +
>   hw/virtio/vhost-vdpa.c            | 447 ++++++++++++++++++++++++++++++
>   hw/virtio/vhost.c                 |  14 +
>   include/hw/virtio/vhost-backend.h |   8 +-
>   include/hw/virtio/vhost-vdpa.h    |  25 ++
>   include/hw/virtio/vhost.h         |   1 +
>   8 files changed, 538 insertions(+), 2 deletions(-)
>   create mode 100644 hw/virtio/vhost-vdpa.c
>   create mode 100644 include/hw/virtio/vhost-vdpa.h
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 63b2a85d6e..1af39abaf3 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -17,8 +17,10 @@
>   #include "net/net.h"
>   #include "net/tap.h"
>   #include "net/vhost-user.h"
> +#include "net/vhost-vdpa.h"
>   
>   #include "standard-headers/linux/vhost_types.h"
> +#include "linux-headers/linux/vhost.h"
>   #include "hw/virtio/virtio-net.h"
>   #include "net/vhost_net.h"
>   #include "qemu/error-report.h"
> @@ -85,6 +87,29 @@ static const int user_feature_bits[] = {
>       VHOST_INVALID_FEATURE_BIT
>   };
>   
> +static const int vdpa_feature_bits[] = {
> +    VIRTIO_F_NOTIFY_ON_EMPTY,
> +    VIRTIO_RING_F_INDIRECT_DESC,
> +    VIRTIO_RING_F_EVENT_IDX,
> +    VIRTIO_F_ANY_LAYOUT,
> +    VIRTIO_F_VERSION_1,
> +    VIRTIO_NET_F_CSUM,
> +    VIRTIO_NET_F_GUEST_CSUM,
> +    VIRTIO_NET_F_GSO,
> +    VIRTIO_NET_F_GUEST_TSO4,
> +    VIRTIO_NET_F_GUEST_TSO6,
> +    VIRTIO_NET_F_GUEST_ECN,
> +    VIRTIO_NET_F_GUEST_UFO,
> +    VIRTIO_NET_F_HOST_TSO4,
> +    VIRTIO_NET_F_HOST_TSO6,
> +    VIRTIO_NET_F_HOST_ECN,
> +    VIRTIO_NET_F_HOST_UFO,
> +    VIRTIO_NET_F_MRG_RXBUF,
> +    VIRTIO_NET_F_MTU,
> +    VIRTIO_F_IOMMU_PLATFORM,
> +    VIRTIO_NET_F_GUEST_ANNOUNCE,
> +    VHOST_INVALID_FEATURE_BIT
> +};


The framework should be ready for packed ring, let's add that.


>   static const int *vhost_net_get_feature_bits(struct vhost_net *net)
>   {
>       const int *feature_bits = 0;
> @@ -96,6 +121,9 @@ static const int *vhost_net_get_feature_bits(struct vhost_net *net)
>       case NET_CLIENT_DRIVER_VHOST_USER:
>           feature_bits = user_feature_bits;
>           break;
> +    case NET_CLIENT_DRIVER_VHOST_VDPA:
> +        feature_bits = vdpa_feature_bits;
> +        break;
>       default:
>           error_report("Feature bits not defined for this type: %d",
>                   net->nc->info->type);
> @@ -110,7 +138,10 @@ uint64_t vhost_net_get_features(struct vhost_net *net, uint64_t features)
>       return vhost_get_features(&net->dev, vhost_net_get_feature_bits(net),
>               features);
>   }
> -
> +int vhost_net_get_device_id(struct vhost_net *net, uint32_t * device_id)
> +{
> +    return vhost_dev_get_device_id(&net->dev, device_id);
> +}
>   void vhost_net_ack_features(struct vhost_net *net, uint64_t features)
>   {
>       net->dev.acked_features = net->dev.backend_features;
> @@ -433,6 +464,12 @@ VHostNetState *get_vhost_net(NetClientState *nc)
>           vhost_net = vhost_user_get_vhost_net(nc);
>           assert(vhost_net);
>           break;
> +#endif
> +#ifdef CONFIG_VHOST_NET_VDPA
> +    case NET_CLIENT_DRIVER_VHOST_VDPA:
> +        vhost_net = vhost_vdpa_get_vhost_net(nc);
> +        assert(vhost_net);
> +        break;
>   #endif
>       default:
>           break;
> diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
> index e2f70fbb89..e7c5d4a862 100644
> --- a/hw/virtio/Makefile.objs
> +++ b/hw/virtio/Makefile.objs
> @@ -5,6 +5,7 @@ obj-y += virtio.o
>   obj-$(call lor,$(CONFIG_VHOST_USER),$(CONFIG_VHOST_KERNEL)) += vhost.o vhost-backend.o
>   common-obj-$(call lnot,$(call lor,$(CONFIG_VHOST_USER),$(CONFIG_VHOST_KERNEL))) += vhost-stub.o
>   obj-$(CONFIG_VHOST_USER) += vhost-user.o
> +obj-$(CONFIG_VHOST_VDPA) += vhost-vdpa.o
>   
>   common-obj-$(CONFIG_VIRTIO_RNG) += virtio-rng.o
>   common-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> index 48905383f8..069ddb423d 100644
> --- a/hw/virtio/vhost-backend.c
> +++ b/hw/virtio/vhost-backend.c
> @@ -285,6 +285,11 @@ int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
>       case VHOST_BACKEND_TYPE_USER:
>           dev->vhost_ops = &user_ops;
>           break;
> +#endif
> +#ifdef CONFIG_VHOST_VDPA
> +    case VHOST_BACKEND_TYPE_VDPA:
> +        dev->vhost_ops = &vdpa_ops;
> +        break;
>   #endif
>       default:
>           error_report("Unknown vhost backend type");
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> new file mode 100644
> index 0000000000..bac8a8aa2a
> --- /dev/null
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -0,0 +1,447 @@
> +/*
> + * vhost-vdpa
> + *
> + *  Copyright(c) 2017-2018 Intel Corporation.
> + *  Copyright(c) 2020 Red Hat, Inc.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include <linux/vhost.h>
> +#include <linux/vfio.h>
> +#include <sys/eventfd.h>
> +#include <sys/ioctl.h>
> +#include "hw/virtio/vhost.h"
> +#include "hw/virtio/vhost-backend.h"
> +#include "hw/virtio/virtio-net.h"
> +#include "hw/virtio/vhost-vdpa.h"
> +#include "qemu/main-loop.h"
> +#include <linux/kvm.h>
> +#include "sysemu/kvm.h"
> +
> +
> +static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section)
> +{
> +    return (!memory_region_is_ram(section->mr) &&
> +            !memory_region_is_iommu(section->mr)) ||
> +           /*
> +            * Sizing an enabled 64-bit BAR can cause spurious mappings to
> +            * addresses in the upper part of the 64-bit address space.  These
> +            * are never accessed by the CPU and beyond the address width of
> +            * some IOMMU hardware.  TODO: VDPA should tell us the IOMMU width.
> +            */
> +           section->offset_within_address_space & (1ULL << 63);
> +}
> +
> +static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
> +                              void *vaddr, bool readonly)
> +{
> +    struct vhost_msg_v2 msg;
> +    int fd = v->device_fd;
> +    int ret = 0;
> +
> +    msg.type =  v->msg_type;
> +    msg.iotlb.iova = iova;
> +    msg.iotlb.size = size;
> +    msg.iotlb.uaddr = (uint64_t)vaddr;
> +    msg.iotlb.perm = readonly ? VHOST_ACCESS_RO : VHOST_ACCESS_RW;
> +    msg.iotlb.type = VHOST_IOTLB_UPDATE;
> +
> +    if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> +        error_report("failed to write, fd=%d, errno=%d (%s)",
> +            fd, errno, strerror(errno));
> +        return -EIO ;
> +    }
> +
> +    return ret;
> +}
> +
> +static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
> +                                hwaddr size)
> +{
> +    struct vhost_msg_v2 msg;
> +    int fd = v->device_fd;
> +    int ret = 0;
> +
> +    msg.type =  v->msg_type;
> +    msg.iotlb.iova = iova;
> +    msg.iotlb.size = size;
> +    msg.iotlb.type = VHOST_IOTLB_INVALIDATE;
> +
> +    if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> +        error_report("failed to write, fd=%d, errno=%d (%s)",
> +            fd, errno, strerror(errno));
> +        return -EIO ;
> +    }
> +
> +    return ret;
> +}
> +
> +static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> +                                           MemoryRegionSection *section)
> +{
> +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> +    hwaddr iova;
> +    Int128 llend, llsize;
> +    void *vaddr;
> +    int ret;
> +
> +    if (vhost_vdpa_listener_skipped_section(section)) {
> +        return;
> +    }
> +
> +    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
> +                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> +        error_report("%s received unaligned region", __func__);
> +        return;
> +    }
> +
> +    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
> +    llend = int128_make64(section->offset_within_address_space);
> +    llend = int128_add(llend, section->size);
> +    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
> +
> +    if (int128_ge(int128_make64(iova), llend)) {
> +        return;
> +    }
> +
> +    memory_region_ref(section->mr);
> +
> +    /* Here we assume that memory_region_is_ram(section->mr)==true */
> +
> +    vaddr = memory_region_get_ram_ptr(section->mr) +
> +            section->offset_within_region +
> +            (iova - section->offset_within_address_space);
> +
> +    llsize = int128_sub(llend, int128_make64(iova));
> +
> +    ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
> +                             vaddr, section->readonly);
> +    if (ret) {
> +        error_report("vhost vdpa map fail!");
> +        if (memory_region_is_ram_device(section->mr)) {
> +            /* Allow unexpected mappings not to be fatal for RAM devices */
> +            error_report("map ram fail!");
> +          return ;
> +        }
> +        goto fail;
> +    }
> +
> +    return;
> +
> +fail:
> +    if (memory_region_is_ram_device(section->mr)) {
> +        error_report("failed to vdpa_dma_map. pci p2p may not work");
> +        return;
> +
> +    }
> +    /*
> +     * On the initfn path, store the first error in the container so we
> +     * can gracefully fail.  Runtime, there's not much we can do other
> +     * than throw a hardware error.
> +     */
> +    error_report("vhost-vdpa: DMA mapping failed, unable to continue");
> +    return;
> +
> +}
> +
> +static void vhost_vdpa_listener_region_del(MemoryListener *listener,
> +                                           MemoryRegionSection *section)
> +{
> +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> +    hwaddr iova;
> +    Int128 llend, llsize;
> +    int ret;
> +    bool try_unmap = true;
> +
> +    if (vhost_vdpa_listener_skipped_section(section)) {
> +        return;
> +    }
> +
> +    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
> +                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> +        error_report("%s received unaligned region", __func__);
> +        return;
> +    }
> +
> +    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
> +    llend = int128_make64(section->offset_within_address_space);
> +    llend = int128_add(llend, section->size);
> +    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
> +
> +    if (int128_ge(int128_make64(iova), llend)) {
> +        return;
> +    }
> +
> +    llsize = int128_sub(llend, int128_make64(iova));
> +
> +    if (try_unmap) {
> +        ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
> +        if (ret) {
> +            error_report("vhost_vdpa dma unmap error!");
> +        }
> +    }
> +
> +    memory_region_unref(section->mr);
> +}
> +
> +static const MemoryListener vhost_vdpa_memory_listener = {
> +    .region_add = vhost_vdpa_listener_region_add,
> +    .region_del = vhost_vdpa_listener_region_del,
> +};
> +


Need comment to explain why vhost-vdpa requires a specific memory 
listener other than just reusing the one that is provided by vhost.


> +


This newline is unnecessary.


> +static int vhost_vdpa_call(struct vhost_dev *dev, unsigned long int request,
> +                             void *arg)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    int fd = v->device_fd;
> +
> +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> +
> +    return ioctl(fd, request, arg);
> +}
> +
> +static int vhost_vdpa_get_backend_features(struct vhost_dev *dev,
> +                                   uint64_t *features)
> +{
> +    return vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, features);
> +}
> +
> +static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque)
> +{
> +    struct vhost_vdpa *v;
> +    uint64_t backend_features = 0;
> +    int ret = 0;
> +
> +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> +
> +    v = opaque;
> +    dev->opaque =  opaque ;
> +
> +    v->listener = vhost_vdpa_memory_listener;
> +    memory_listener_register(&v->listener, &address_space_memory);
> +
> +    ret = vhost_vdpa_get_backend_features(dev, &backend_features);
> +   /*
> +    * In order to compatible with older version kernel,
> +    * If the kernel not support this ioctl,
> +    * set the msg_type for v2 by defeault
> +    */
> +    if (ret) {
> +        v->msg_type = VHOST_IOTLB_MSG_V2;
> +        return 0;
> +     }
> +    if (backend_features &  (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)) {
> +        v->msg_type = VHOST_IOTLB_MSG_V2;
> +     } else {
> +        v->msg_type = VHOST_IOTLB_MSG;
> +     }


But in vhost_vdpa_dma_unmap/vhost_vdpa_dma_map, v2 is still being used.

Consider vdpa implies msg_v2, and it doesn't implement 
get_backend_features. We use v2 unconditionally and leave this check in 
the future.


> +    return 0;
> +}
> +
> +static int vhost_vdpa_cleanup(struct vhost_dev *dev)
> +{
> +    struct vhost_vdpa *v;
> +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> +
> +    v = dev->opaque;
> +    memory_listener_unregister(&v->listener);
> +
> +    dev->opaque = NULL;
> +    return 0;
> +}
> +
> +static int vhost_vdpa_memslots_limit(struct vhost_dev *dev)
> +{
> +    /*Using IOTLB API here,NOTE:please set the ulimit before using*/


The "note" seems unrelated to the memslot here.


> +    return INT_MAX;
> +}
> +
> +static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
> +                                   struct vhost_log *log)
> +{
> +    return -1;
> +}
> +
> +static int vhost_vdpa_set_mem_table(struct vhost_dev *dev,
> +                                    struct vhost_memory *mem)
> +{
> +
> +    if (mem->padding) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int vhost_vdpa_set_vring_addr(struct vhost_dev *dev,
> +                                     struct vhost_vring_addr *addr)
> +{
> +    return vhost_vdpa_call(dev, VHOST_SET_VRING_ADDR, addr);
> +}
> +
> +static int vhost_vdpa_set_vring_num(struct vhost_dev *dev,
> +                                    struct vhost_vring_state *ring)
> +{
> +    return vhost_vdpa_call(dev, VHOST_SET_VRING_NUM, ring);
> +}
> +
> +static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
> +                                     struct vhost_vring_state *ring)
> +{
> +    return vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
> +}
> +
> +static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
> +                                     struct vhost_vring_state *ring)
> +{
> +
> +    return vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
> +}
> +
> +static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
> +                                     struct vhost_vring_file *file)
> +{
> +    return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
> +}
> +
> +static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> +                                     struct vhost_vring_file *file)
> +{
> +    return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
> +}
> +
> +static int vhost_vdpa_set_features(struct vhost_dev *dev,
> +                                   uint64_t features)
> +{
> +    return vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
> +}
> +
> +static int vhost_vdpa_get_features(struct vhost_dev *dev,
> +                                   uint64_t *features)
> +{
> +    return vhost_vdpa_call(dev, VHOST_GET_FEATURES, features);
> +}
> +
> +static int vhost_vdpa_get_device_id(struct vhost_dev *dev,
> +                                   uint32_t *device_id)
> +{
> +    return vhost_vdpa_call(dev, VHOST_VDPA_GET_DEVICE_ID, device_id);
> +}
> +
> +static int vhost_vdpa_set_owner(struct vhost_dev *dev)
> +{
> +    return vhost_vdpa_call(dev, VHOST_SET_OWNER, NULL);
> +}
> +
> +static int vhost_vdpa_reset_device(struct vhost_dev *dev)
> +{
> +    return vhost_vdpa_call(dev, VHOST_RESET_OWNER, NULL);
> +}


I think we may want to reuse the helpers of kenrel backends instead of 
duplicating functions here.


> +
> +static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
> +{
> +    assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs);
> +
> +    return idx - dev->vq_index;
> +}
> +
> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> +{
> +    int i;
> +    for (i = 0; i < dev->nvqs; ++i) {
> +        struct vhost_vring_state state = {
> +            .index = dev->vq_index + i,
> +            .num = 1,
> +        };
> +        vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
> +    }
> +    return 0;
> +}
> +static int  vhost_vdpa_set_config(struct vhost_dev *dev, const uint8_t *data,


One unnecessary space between int and "vhost_vdpa_set_config"


> +                                   uint32_t offset, uint32_t size,
> +                                   uint32_t flags)
> +{
> +
> +    struct vhost_vdpa_config *config;
> +    int ret;
> +    if ((size > VHOST_VDPA_MAX_CONFIG_SIZE) || (data == NULL)) {
> +        return -1;
> +    }
> +    config = g_new0(struct vhost_vdpa_config, 1);


It's better to just use a temporary variable instead of allocate it 
dynamically?


> +    if (config == NULL) {
> +        return -EINVAL;
> +    }
> +    config->off = 0;
> +    config->len = size;
> +    memcpy(config->buf, data, size);
> +    ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_CONFIG, config);
> +     g_free(config);
> +     return ret;
> +}
> +
> + static int  vhost_vdpa_get_config(struct vhost_dev *dev, uint8_t *config,
> +                                   uint32_t config_len)
> +{
> +    struct vhost_vdpa_config *v_config = NULL;
> +    int ret;
> +
> +     if ((config_len > VHOST_VDPA_MAX_CONFIG_SIZE) || (config == NULL)) {
> +        return -1;
> +    }
> +    v_config = g_new0(struct vhost_vdpa_config, 1);


Let's use a variable on the stack.


> +
> +    if (v_config == NULL) {
> +        return -EINVAL;
> +    }
> +   ret =  vhost_vdpa_call(dev, VHOST_VDPA_GET_CONFIG, v_config);


One extra space after '='.


> +   if ((v_config->len > config_len - v_config->off) || (v_config->len == 0)) {
> +        g_free(v_config);
> +       return -EINVAL;
> +    }


Indentation is wired and why need such check?


> +
> +    memcpy(config, v_config->buf + v_config->off, config_len);
> +    g_free(v_config);
> +    return ret;
> + }


Newline is needed here.


> +static int vhost_vdpa_set_state(struct vhost_dev *dev, uint8_t state)
> +{
> +    return vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &state);
> +}
> +
> +


This new line is not needed.


> +const VhostOps vdpa_ops = {
> +        .backend_type = VHOST_BACKEND_TYPE_VDPA,
> +        .vhost_backend_init = vhost_vdpa_init,
> +        .vhost_backend_cleanup = vhost_vdpa_cleanup,
> +        .vhost_backend_memslots_limit = vhost_vdpa_memslots_limit,
> +        .vhost_set_log_base = vhost_vdpa_set_log_base,
> +        .vhost_set_mem_table = vhost_vdpa_set_mem_table,
> +        .vhost_set_vring_addr = vhost_vdpa_set_vring_addr,
> +        .vhost_set_vring_endian = NULL,
> +        .vhost_set_vring_num = vhost_vdpa_set_vring_num,
> +        .vhost_set_vring_base = vhost_vdpa_set_vring_base,
> +        .vhost_get_vring_base = vhost_vdpa_get_vring_base,
> +        .vhost_set_vring_kick = vhost_vdpa_set_vring_kick,
> +        .vhost_set_vring_call = vhost_vdpa_set_vring_call,
> +        .vhost_set_features = vhost_vdpa_set_features,
> +        .vhost_get_features = vhost_vdpa_get_features,
> +        .vhost_set_owner = vhost_vdpa_set_owner,
> +        .vhost_reset_device = vhost_vdpa_reset_device,
> +        .vhost_get_vq_index = vhost_vdpa_get_vq_index,
> +        .vhost_set_vring_ready = vhost_vdpa_set_vring_ready,
> +        .vhost_get_config  = vhost_vdpa_get_config,
> +        .vhost_set_config = vhost_vdpa_set_config,
> +        .vhost_requires_shm_log = NULL,
> +        .vhost_migration_done = NULL,
> +        .vhost_backend_can_merge = NULL,
> +        .vhost_net_set_mtu = NULL,
> +        .vhost_set_iotlb_callback = NULL,
> +        .vhost_send_device_iotlb_msg = NULL,
> +        .vhost_set_state = vhost_vdpa_set_state,
> +        .vhost_get_device_id = vhost_vdpa_get_device_id,
> +};


As said previously, need to consider to reuse vhost kernel helpers.


> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 4da0d5a6c5..6d2aa461d6 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -746,6 +746,12 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
>           .log_guest_addr = vq->used_phys,
>           .flags = enable_log ? (1 << VHOST_VRING_F_LOG) : 0,
>       };
> +    /*vDPA need to use the phys address here to set to hardware*/
> +    if (dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
> +        addr.desc_user_addr = (uint64_t)(unsigned long)vq->desc_phys;
> +        addr.avail_user_addr = (uint64_t)(unsigned long)vq->avail_phys;
> +        addr.used_user_addr = (uint64_t)(unsigned long)vq->used_phys;
> +    }
>       int r = dev->vhost_ops->vhost_set_vring_addr(dev, &addr);
>       if (r < 0) {
>           VHOST_OPS_DEBUG("vhost_set_vring_addr failed");
> @@ -1493,6 +1499,14 @@ int vhost_dev_set_config(struct vhost_dev *hdev, const uint8_t *data,
>       return -1;
>   }
>   
> +int vhost_dev_get_device_id(struct vhost_dev *hdev, uint32_t *device_id)
> +{
> +    assert(hdev->vhost_ops);
> +    if (hdev->vhost_ops->vhost_get_device_id) {


If you introduce a new vhost_ops, please use a separate patch.


> +        return hdev->vhost_ops->vhost_get_device_id(hdev, device_id);
> +    }
> +    return -1;
> +}
>   void vhost_dev_set_config_notifier(struct vhost_dev *hdev,
>                                      const VhostDevConfigOps *ops)
>   {
> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> index f823055167..8ac898a987 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -17,7 +17,8 @@ typedef enum VhostBackendType {
>       VHOST_BACKEND_TYPE_NONE = 0,
>       VHOST_BACKEND_TYPE_KERNEL = 1,
>       VHOST_BACKEND_TYPE_USER = 2,
> -    VHOST_BACKEND_TYPE_MAX = 3,
> +    VHOST_BACKEND_TYPE_VDPA = 3,
> +    VHOST_BACKEND_TYPE_MAX = 4,
>   } VhostBackendType;
>   
>   typedef enum VhostSetConfigType {
> @@ -77,6 +78,7 @@ typedef int (*vhost_reset_device_op)(struct vhost_dev *dev);
>   typedef int (*vhost_get_vq_index_op)(struct vhost_dev *dev, int idx);
>   typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev,
>                                            int enable);
> +typedef int (*vhost_set_vring_ready_op)(struct vhost_dev *dev);


A new patch for this patch.

Thanks


>   typedef bool (*vhost_requires_shm_log_op)(struct vhost_dev *dev);
>   typedef int (*vhost_migration_done_op)(struct vhost_dev *dev,
>                                          char *mac_addr);
> @@ -113,6 +115,7 @@ typedef int (*vhost_set_inflight_fd_op)(struct vhost_dev *dev,
>                                           struct vhost_inflight *inflight);
>   
>   typedef int (*vhost_set_state_op)(struct vhost_dev *dev, uint8_t state);
> +typedef int (*vhost_get_device_id_op)(struct vhost_dev *dev, uint32_t *dev_id);
>   typedef struct VhostOps {
>       VhostBackendType backend_type;
>       vhost_backend_init vhost_backend_init;
> @@ -139,6 +142,7 @@ typedef struct VhostOps {
>       vhost_reset_device_op vhost_reset_device;
>       vhost_get_vq_index_op vhost_get_vq_index;
>       vhost_set_vring_enable_op vhost_set_vring_enable;
> +    vhost_set_vring_ready_op vhost_set_vring_ready;
>       vhost_requires_shm_log_op vhost_requires_shm_log;
>       vhost_migration_done_op vhost_migration_done;
>       vhost_backend_can_merge_op vhost_backend_can_merge;
> @@ -154,9 +158,11 @@ typedef struct VhostOps {
>       vhost_get_inflight_fd_op vhost_get_inflight_fd;
>       vhost_set_inflight_fd_op vhost_set_inflight_fd;
>       vhost_set_state_op vhost_set_state;
> +    vhost_get_device_id_op vhost_get_device_id;
>   } VhostOps;
>   
>   extern const VhostOps user_ops;
> +extern const VhostOps vdpa_ops;
>   
>   int vhost_set_backend_type(struct vhost_dev *dev,
>                              VhostBackendType backend_type);
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> new file mode 100644
> index 0000000000..9d64fbfea9
> --- /dev/null
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -0,0 +1,25 @@
> +/*
> + * vhost-vdpa.h
> + *
> + * Copyright(c) 2017-2018 Intel Corporation.
> + * Copyright(c) 2020 Red Hat, Inc.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef HW_VIRTIO_VHOST_VDPA_H
> +#define HW_VIRTIO_VHOST_VDPA_H
> +
> +#include "hw/virtio/virtio.h"
> +
> +typedef struct vhost_vdpa {
> +    int device_fd;
> +    uint32_t msg_type;
> +    MemoryListener listener;
> +} VhostVDPA;
> +
> +extern AddressSpace address_space_memory;
> +
> +#endif
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 085450c6f8..020956d5a9 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -124,6 +124,7 @@ int vhost_dev_get_config(struct vhost_dev *dev, uint8_t *config,
>                            uint32_t config_len);
>   int vhost_dev_set_config(struct vhost_dev *dev, const uint8_t *data,
>                            uint32_t offset, uint32_t size, uint32_t flags);
> +int vhost_dev_get_device_id(struct vhost_dev *hdev, uint32_t *device_id);
>   /* notifier callback in case vhost device config space changed
>    */
>   void vhost_dev_set_config_notifier(struct vhost_dev *dev,



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 6/9] virtio-bus: introduce queue_enabled method
  2020-05-08 16:32 ` [RFC v2 6/9] virtio-bus: introduce queue_enabled method Cindy Lu
@ 2020-05-09  3:01   ` Jason Wang
  2020-05-09  6:50     ` Cindy Lu
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Wang @ 2020-05-09  3:01 UTC (permalink / raw)
  To: Cindy Lu, mst, armbru, eblake, cohuck
  Cc: mhabets, qemu-devel, rob.miller, saugatm, hanand, hch, eperezma,
	jgg, shahafs, kevin.tian, parav, vmireyno, cunming.liang, gdawar,
	jiri, xiao.w.wang, stefanha, zhihong.wang, aadam, rdunlap,
	maxime.coquelin, lingshan.zhu


On 2020/5/9 上午12:32, Cindy Lu wrote:
> From: Jason Wang <jasowang@redhat.com>
>
> This patch introduces queue_enabled() method which allows the
> transport to implement its own way to report whether or not a queue is
> enabled.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>


This patch should come before any of the vhost-vpda patch.

Thanks


> ---
>   hw/virtio/virtio.c             | 6 ++++++
>   include/hw/virtio/virtio-bus.h | 4 ++++
>   2 files changed, 10 insertions(+)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 04716b5f6c..09732a8836 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -3169,6 +3169,12 @@ hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n)
>   
>   bool virtio_queue_enabled(VirtIODevice *vdev, int n)
>   {
> +    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
> +    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> +
> +    if (k->queue_enabled)
> +        return k->queue_enabled(qbus->parent, n);
> +
>       return virtio_queue_get_desc_addr(vdev, n) != 0;
>   }
>   
> diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
> index 38c9399cd4..0f6f215925 100644
> --- a/include/hw/virtio/virtio-bus.h
> +++ b/include/hw/virtio/virtio-bus.h
> @@ -83,6 +83,10 @@ typedef struct VirtioBusClass {
>        */
>       int (*ioeventfd_assign)(DeviceState *d, EventNotifier *notifier,
>                               int n, bool assign);
> +    /*
> +     * Whether queue number n is enabled.
> +     */
> +    bool (*queue_enabled)(DeviceState *d, int n);
>       /*
>        * Does the transport have variable vring alignment?
>        * (ie can it ever call virtio_queue_set_align()?)



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 7/9] virito-pci: implement queue_enabled method
  2020-05-08 16:32 ` [RFC v2 7/9] virito-pci: implement " Cindy Lu
@ 2020-05-09  3:02   ` Jason Wang
  2020-05-09 12:01   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 33+ messages in thread
From: Jason Wang @ 2020-05-09  3:02 UTC (permalink / raw)
  To: Cindy Lu, mst, armbru, eblake, cohuck
  Cc: mhabets, qemu-devel, rob.miller, saugatm, hanand, hch, eperezma,
	jgg, shahafs, kevin.tian, parav, vmireyno, cunming.liang, gdawar,
	jiri, xiao.w.wang, stefanha, zhihong.wang, aadam, rdunlap,
	maxime.coquelin, lingshan.zhu


On 2020/5/9 上午12:32, Cindy Lu wrote:
> From: Jason Wang <jasowang@redhat.com>
>
> With version 1, we can detect whether a queue is enabled via
> queue_enabled.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>


This patch should come before any vhost-vdpa patch.

Thanks


> ---
>   hw/virtio/virtio-pci.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
>
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index c6b47a9c73..4aaf5d953e 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1103,6 +1103,18 @@ static AddressSpace *virtio_pci_get_dma_as(DeviceState *d)
>       return pci_get_address_space(dev);
>   }
>   
> +static bool virtio_pci_queue_enabled(DeviceState *d, int n)
> +{
> +    VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
> +    VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
> +
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> +        return proxy->vqs[vdev->queue_sel].enabled;
> +    }
> +
> +    return virtio_queue_get_desc_addr(vdev, n) != 0;
> +}
> +
>   static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
>                                      struct virtio_pci_cap *cap)
>   {
> @@ -2053,6 +2065,7 @@ static void virtio_pci_bus_class_init(ObjectClass *klass, void *data)
>       k->ioeventfd_enabled = virtio_pci_ioeventfd_enabled;
>       k->ioeventfd_assign = virtio_pci_ioeventfd_assign;
>       k->get_dma_as = virtio_pci_get_dma_as;
> +    k->queue_enabled = virtio_pci_queue_enabled;
>   }
>   
>   static const TypeInfo virtio_pci_bus_info = {



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 8/9] vhost_net: set vq ready during start if necessary
  2020-05-08 16:32 ` [RFC v2 8/9] vhost_net: set vq ready during start if necessary Cindy Lu
@ 2020-05-09  3:03   ` Jason Wang
  2020-05-09  6:51     ` Cindy Lu
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Wang @ 2020-05-09  3:03 UTC (permalink / raw)
  To: Cindy Lu, mst, armbru, eblake, cohuck
  Cc: mhabets, qemu-devel, rob.miller, saugatm, hanand, hch, eperezma,
	jgg, shahafs, kevin.tian, parav, vmireyno, cunming.liang, gdawar,
	jiri, xiao.w.wang, stefanha, zhihong.wang, aadam, rdunlap,
	maxime.coquelin, lingshan.zhu


On 2020/5/9 上午12:32, Cindy Lu wrote:
> From: Jason Wang <jasowang@redhat.com>
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>   hw/net/vhost_net.c | 4 ++++
>   1 file changed, 4 insertions(+)
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 1af39abaf3..eff9ec9177 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -383,6 +383,10 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>                   goto err_start;
>               }
>           }
> +
> +        if (virtio_queue_enabled(dev, i)) {
> +            vhost_set_vring_ready(peer);
> +        }
>       }
>   
>       return 0;


Please place the patch before vhost-vdpa.

Thanks



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 9/9] vhost: introduce vhost_set_vring_ready method
  2020-05-08 16:32 ` [RFC v2 9/9] vhost: introduce vhost_set_vring_ready method Cindy Lu
@ 2020-05-09  3:05   ` Jason Wang
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Wang @ 2020-05-09  3:05 UTC (permalink / raw)
  To: Cindy Lu, mst, armbru, eblake, cohuck
  Cc: mhabets, qemu-devel, rob.miller, saugatm, maxime.coquelin, hch,
	eperezma, jgg, shahafs, kevin.tian, parav, vmireyno,
	cunming.liang, gdawar, jiri, xiao.w.wang, stefanha, zhihong.wang,
	aadam, rdunlap, hanand, lingshan.zhu


On 2020/5/9 上午12:32, Cindy Lu wrote:
> From: Jason Wang <jasowang@redhat.com>
>
> Vhost-vdpa introduces VHOST_VDPA_SET_VRING_ENABLE which complies the
> semantic of queue_enable defined in virtio spec. This method can be
> used for preventing device from executing request for a specific
> virtqueue. This patch introduces the vhost_ops for this.
>
> Note that, we've already had vhost_set_vring_enable which has different
> semantic which allows to enable or disable a specific virtqueue for
> some kinds of vhost backends. E.g vhost-user use this to changes the
> number of active queue pairs.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>


Similarly, please place this patch before vhost-vdpa patches.

Thanks


> ---
>   hw/net/vhost_net-stub.c |  4 ++++
>   hw/net/vhost_net.c      | 11 ++++++++++-
>   include/net/vhost_net.h |  1 +
>   3 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/hw/net/vhost_net-stub.c b/hw/net/vhost_net-stub.c
> index aac0e98228..43e93e1a9a 100644
> --- a/hw/net/vhost_net-stub.c
> +++ b/hw/net/vhost_net-stub.c
> @@ -86,6 +86,10 @@ int vhost_set_vring_enable(NetClientState *nc, int enable)
>       return 0;
>   }
>   
> +int vhost_set_vring_ready(NetClientState *nc)
> +{
> +    return 0;
> +}
>   int vhost_net_set_mtu(struct vhost_net *net, uint16_t mtu)
>   {
>       return 0;
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index eff9ec9177..6911282a0a 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -375,7 +375,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>               goto err_start;
>           }
>   
> -        if (ncs[i].peer->vring_enable) {
> +        if (peer->vring_enable) {
>               /* restore vring enable state */
>               r = vhost_set_vring_enable(peer, peer->vring_enable);
>   
> @@ -496,6 +496,15 @@ int vhost_set_vring_enable(NetClientState *nc, int enable)
>       return 0;
>   }
>   
> +int vhost_set_vring_ready(NetClientState *nc)
> +{
> +    VHostNetState *net = get_vhost_net(nc);
> +    const VhostOps *vhost_ops = net->dev.vhost_ops;
> +    if (vhost_ops && vhost_ops->vhost_set_vring_ready) {
> +        return vhost_ops->vhost_set_vring_ready(&net->dev);
> +    }
> +    return 0;
> +}
>   int vhost_net_set_mtu(struct vhost_net *net, uint16_t mtu)
>   {
>       const VhostOps *vhost_ops = net->dev.vhost_ops;
> diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
> index b47844bf29..247432a3b2 100644
> --- a/include/net/vhost_net.h
> +++ b/include/net/vhost_net.h
> @@ -35,6 +35,7 @@ int vhost_net_notify_migration_done(VHostNetState *net, char* mac_addr);
>   VHostNetState *get_vhost_net(NetClientState *nc);
>   
>   int vhost_set_vring_enable(NetClientState * nc, int enable);
> +int vhost_set_vring_ready(NetClientState *nc);
>   
>   uint64_t vhost_net_get_acked_features(VHostNetState *net);
>   



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 5/9] vhost-vdpa: implement vhost-vdpa backend
  2020-05-09  3:00   ` Jason Wang
@ 2020-05-09  3:07     ` Jason Wang
  2020-05-09  8:14     ` Cindy Lu
  1 sibling, 0 replies; 33+ messages in thread
From: Jason Wang @ 2020-05-09  3:07 UTC (permalink / raw)
  To: Cindy Lu, mst, armbru, eblake, cohuck
  Cc: mhabets, qemu-devel, rob.miller, saugatm, hanand, hch, eperezma,
	jgg, shahafs, kevin.tian, parav, vmireyno, cunming.liang, gdawar,
	jiri, xiao.w.wang, stefanha, zhihong.wang, Tiwei Bie, aadam,
	rdunlap, maxime.coquelin, lingshan.zhu


On 2020/5/9 上午11:00, Jason Wang wrote:
>
> +        .vhost_get_config  = vhost_vdpa_get_config,
> +        .vhost_set_config = vhost_vdpa_set_config, 


Btw, I don't see the actual user of those two helpers?

Thanks



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 0/9] vDPA support in qemu
  2020-05-08 16:32 [RFC v2 0/9] vDPA support in qemu Cindy Lu
                   ` (8 preceding siblings ...)
  2020-05-08 16:32 ` [RFC v2 9/9] vhost: introduce vhost_set_vring_ready method Cindy Lu
@ 2020-05-09  3:10 ` Jason Wang
  9 siblings, 0 replies; 33+ messages in thread
From: Jason Wang @ 2020-05-09  3:10 UTC (permalink / raw)
  To: Cindy Lu, mst, armbru, eblake, cohuck
  Cc: mhabets, qemu-devel, rob.miller, saugatm, hanand, hch, eperezma,
	jgg, shahafs, kevin.tian, parav, vmireyno, cunming.liang, gdawar,
	jiri, xiao.w.wang, stefanha, zhihong.wang, aadam, rdunlap,
	maxime.coquelin, lingshan.zhu


On 2020/5/9 上午12:32, Cindy Lu wrote:
> vDPA device is a device that uses a datapath which complies with the
> virtio specifications with vendor specific control path. vDPA devices
> can be both physically located on the hardware or emulated by software.
> This RFC introduce the vDPA support in qemu
>
> change from v1
> separe the patch of introduce vhost_set_vring_ready method


I think you meant "separate"?

May worth to mention TODO:

1) vIOMMU support
2) live migration support

And it might be helpful if you can publish a github repo for people to try.

Thanks


> separe the patch of qemu_get_peer
> separe the patch  of vhost_set_state
> intorduce the new macro specific for vDPA in configure
> intorduce the fuction to pass the fd from cmdline
> introduce the docmation in qemu-options.hx
> the other comments form last version
>
>
> Cindy Lu (3):
>    net: introduce qemu_get_peer
>    net: use the function qemu_get_peer
>    virtio_net: introduce vhost_set_state
>
> Jason Wang (4):
>    virtio-bus: introduce queue_enabled method
>    virito-pci: implement queue_enabled method
>    vhost_net: set vq ready during start if necessary
>    vhost: introduce vhost_set_vring_ready method
>
> Tiwei Bie (2):
>    vhost-vdpa: introduce vhost-vdpa net client
>    vhost-vdpa: implement vhost-vdpa backend
>
>   configure                         |  21 ++
>   hw/net/vhost_net-stub.c           |   4 +
>   hw/net/vhost_net.c                |  77 ++++-
>   hw/net/virtio-net.c               |   9 +
>   hw/virtio/Makefile.objs           |   1 +
>   hw/virtio/vhost-backend.c         |   5 +
>   hw/virtio/vhost-vdpa.c            | 447 ++++++++++++++++++++++++++++++
>   hw/virtio/vhost.c                 |  14 +
>   hw/virtio/virtio-pci.c            |  13 +
>   hw/virtio/virtio.c                |   6 +
>   include/hw/virtio/vhost-backend.h |  10 +-
>   include/hw/virtio/vhost-vdpa.h    |  25 ++
>   include/hw/virtio/vhost.h         |   1 +
>   include/hw/virtio/virtio-bus.h    |   4 +
>   include/net/net.h                 |   1 +
>   include/net/vhost-vdpa.h          |  19 ++
>   include/net/vhost_net.h           |   4 +-
>   net/Makefile.objs                 |   2 +-
>   net/clients.h                     |   2 +
>   net/net.c                         |   9 +
>   net/vhost-vdpa.c                  | 227 +++++++++++++++
>   qapi/net.json                     |  22 +-
>   qemu-options.hx                   |  19 ++
>   23 files changed, 930 insertions(+), 12 deletions(-)
>   create mode 100644 hw/virtio/vhost-vdpa.c
>   create mode 100644 include/hw/virtio/vhost-vdpa.h
>   create mode 100644 include/net/vhost-vdpa.h
>   create mode 100644 net/vhost-vdpa.c
>



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 2/9] net: use the function qemu_get_peer
  2020-05-09  2:19   ` Jason Wang
@ 2020-05-09  6:49     ` Cindy Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Cindy Lu @ 2020-05-09  6:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: Cornelia Huck, Michael Tsirkin, mhabets, qemu-devel, hanand,
	rob.miller, saugatm, Markus Armbruster, hch,
	Eugenio Perez Martin, jgg, shahafs, kevin.tian, parav, vmireyno,
	Liang, Cunming, gdawar, jiri, xiao.w.wang, Stefan Hajnoczi, Wang,
	Zhihong, Ariel Adam, rdunlap, Maxime Coquelin, Zhu, Lingshan

On Sat, May 9, 2020 at 10:20 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2020/5/9 上午12:32, Cindy Lu wrote:
> > user the qemu_get_peer to replace the old process
>
>
> The title should be "vhost_net: use the function qemu_get_peer".
>
> Thanks
>
Sure, I will fix this
>
> >
> > Signed-off-by: Cindy Lu <lulu@redhat.com>
> > ---
> >   hw/net/vhost_net.c | 14 +++++++++-----
> >   1 file changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > index 6b82803fa7..d1d421e3d9 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -306,7 +306,9 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >       BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
> >       VirtioBusState *vbus = VIRTIO_BUS(qbus);
> >       VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
> > +    struct vhost_net *net;
> >       int r, e, i;
> > +    NetClientState *peer;
> >
> >       if (!k->set_guest_notifiers) {
> >           error_report("binding does not support guest notifiers");
> > @@ -314,9 +316,9 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >       }
> >
> >       for (i = 0; i < total_queues; i++) {
> > -        struct vhost_net *net;
> >
> > -        net = get_vhost_net(ncs[i].peer);
> > +        peer = qemu_get_peer(ncs, i);
> > +        net = get_vhost_net(peer);
> >           vhost_net_set_vq_index(net, i * 2);
> >
> >           /* Suppress the masking guest notifiers on vhost user
> > @@ -335,7 +337,8 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >       }
> >
> >       for (i = 0; i < total_queues; i++) {
> > -        r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev);
> > +        peer = qemu_get_peer(ncs, i);
> > +        r = vhost_net_start_one(get_vhost_net(peer), dev);
> >
> >           if (r < 0) {
> >               goto err_start;
> > @@ -343,7 +346,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >
> >           if (ncs[i].peer->vring_enable) {
> >               /* restore vring enable state */
> > -            r = vhost_set_vring_enable(ncs[i].peer, ncs[i].peer->vring_enable);
> > +            r = vhost_set_vring_enable(peer, peer->vring_enable);
> >
> >               if (r < 0) {
> >                   goto err_start;
> > @@ -355,7 +358,8 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >
> >   err_start:
> >       while (--i >= 0) {
> > -        vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
> > +        peer = qemu_get_peer(ncs , i);
> > +        vhost_net_stop_one(get_vhost_net(peer), dev);
> >       }
> >       e = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
> >       if (e < 0) {
>



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 6/9] virtio-bus: introduce queue_enabled method
  2020-05-09  3:01   ` Jason Wang
@ 2020-05-09  6:50     ` Cindy Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Cindy Lu @ 2020-05-09  6:50 UTC (permalink / raw)
  To: Jason Wang
  Cc: Cornelia Huck, Michael Tsirkin, mhabets, qemu-devel, hanand,
	rob.miller, saugatm, Markus Armbruster, hch,
	Eugenio Perez Martin, jgg, shahafs, kevin.tian, parav, vmireyno,
	Liang, Cunming, gdawar, jiri, xiao.w.wang, Stefan Hajnoczi, Wang,
	Zhihong, Ariel Adam, rdunlap, Maxime Coquelin, Zhu, Lingshan

On Sat, May 9, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2020/5/9 上午12:32, Cindy Lu wrote:
> > From: Jason Wang <jasowang@redhat.com>
> >
> > This patch introduces queue_enabled() method which allows the
> > transport to implement its own way to report whether or not a queue is
> > enabled.
> >
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>
>
> This patch should come before any of the vhost-vpda patch.
>
> Thanks
>
Sure, Will fix this
>
> > ---
> >   hw/virtio/virtio.c             | 6 ++++++
> >   include/hw/virtio/virtio-bus.h | 4 ++++
> >   2 files changed, 10 insertions(+)
> >
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index 04716b5f6c..09732a8836 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -3169,6 +3169,12 @@ hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n)
> >
> >   bool virtio_queue_enabled(VirtIODevice *vdev, int n)
> >   {
> > +    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
> > +    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> > +
> > +    if (k->queue_enabled)
> > +        return k->queue_enabled(qbus->parent, n);
> > +
> >       return virtio_queue_get_desc_addr(vdev, n) != 0;
> >   }
> >
> > diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
> > index 38c9399cd4..0f6f215925 100644
> > --- a/include/hw/virtio/virtio-bus.h
> > +++ b/include/hw/virtio/virtio-bus.h
> > @@ -83,6 +83,10 @@ typedef struct VirtioBusClass {
> >        */
> >       int (*ioeventfd_assign)(DeviceState *d, EventNotifier *notifier,
> >                               int n, bool assign);
> > +    /*
> > +     * Whether queue number n is enabled.
> > +     */
> > +    bool (*queue_enabled)(DeviceState *d, int n);
> >       /*
> >        * Does the transport have variable vring alignment?
> >        * (ie can it ever call virtio_queue_set_align()?)
>



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 8/9] vhost_net: set vq ready during start if necessary
  2020-05-09  3:03   ` Jason Wang
@ 2020-05-09  6:51     ` Cindy Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Cindy Lu @ 2020-05-09  6:51 UTC (permalink / raw)
  To: Jason Wang
  Cc: Cornelia Huck, Michael Tsirkin, mhabets, qemu-devel, hanand,
	rob.miller, saugatm, Markus Armbruster, hch,
	Eugenio Perez Martin, jgg, shahafs, kevin.tian, parav, vmireyno,
	Liang, Cunming, gdawar, jiri, xiao.w.wang, Stefan Hajnoczi, Wang,
	Zhihong, Ariel Adam, rdunlap, Maxime Coquelin, Zhu, Lingshan

On Sat, May 9, 2020 at 11:04 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2020/5/9 上午12:32, Cindy Lu wrote:
> > From: Jason Wang <jasowang@redhat.com>
> >
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >   hw/net/vhost_net.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > index 1af39abaf3..eff9ec9177 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -383,6 +383,10 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >                   goto err_start;
> >               }
> >           }
> > +
> > +        if (virtio_queue_enabled(dev, i)) {
> > +            vhost_set_vring_ready(peer);
> > +        }
> >       }
> >
> >       return 0;
>
>
> Please place the patch before vhost-vdpa.
>
> Thanks
>
Sure will fix this



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 3/9] virtio_net: introduce vhost_set_state
  2020-05-09  2:25   ` Jason Wang
@ 2020-05-09  7:08     ` Cindy Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Cindy Lu @ 2020-05-09  7:08 UTC (permalink / raw)
  To: Jason Wang
  Cc: Cornelia Huck, Michael Tsirkin, mhabets, qemu-devel, hanand,
	rob.miller, saugatm, Markus Armbruster, hch,
	Eugenio Perez Martin, jgg, shahafs, kevin.tian, parav, vmireyno,
	Liang, Cunming, gdawar, jiri, xiao.w.wang, Stefan Hajnoczi, Wang,
	Zhihong, Ariel Adam, rdunlap, Maxime Coquelin, Zhu, Lingshan

On Sat, May 9, 2020 at 10:25 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2020/5/9 上午12:32, Cindy Lu wrote:
> > Introduce a function to set the state to the vhost driver.
> > vDPA need to sync the driver's state to NIC
>
>
> Let's split this patch into two.
>
> 1) introduce vhost_set_state
> 2) make virtio-net use of vhost_set_state
>
Sure, Will fix this
> >
> > Signed-off-by: Cindy Lu <lulu@redhat.com>
> > ---
> >   hw/net/vhost_net.c                | 9 +++++++++
> >   hw/net/virtio-net.c               | 9 +++++++++
> >   include/hw/virtio/vhost-backend.h | 2 ++
> >   include/net/vhost_net.h           | 2 +-
> >   4 files changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > index d1d421e3d9..63b2a85d6e 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -465,3 +465,12 @@ int vhost_net_set_mtu(struct vhost_net *net, uint16_t mtu)
> >
> >       return vhost_ops->vhost_net_set_mtu(&net->dev, mtu);
> >   }
> > +int vhost_set_state(NetClientState *nc, uint8_t state)
> > +{
> > +    struct vhost_net *net = get_vhost_net(nc);
> > +    struct vhost_dev *hdev = &net->dev;
> > +        if (hdev->vhost_ops->vhost_set_state) {
>
>
> Indentation looks wrong.
>
Will fix this
>
> > +                return hdev->vhost_ops->vhost_set_state(hdev, state);
> > +        }
> > +    return 0;
> > +}
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index db3d7c38e6..1bddb4b4af 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -206,6 +206,9 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
> >       VirtIODevice *vdev = VIRTIO_DEVICE(n);
> >       NetClientState *nc = qemu_get_queue(n->nic);
> >       int queues = n->multiqueue ? n->max_queues : 1;
> > +    NetClientState *peer = qemu_get_peer(nc, 0);
> > +    uint8_t status_set  = vdev->status ;
> > +    uint8_t vhost_started_pre = n->vhost_started;
> >
> >       if (!get_vhost_net(nc->peer)) {
> >           return;
> > @@ -245,6 +248,7 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
> >                   return;
> >               }
> >           }
> > +        status_set = status_set | VIRTIO_CONFIG_S_DRIVER_OK;
> >
> >           n->vhost_started = 1;
> >           r = vhost_net_start(vdev, n->nic->ncs, queues);
> > @@ -252,11 +256,16 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
> >               error_report("unable to start vhost net: %d: "
> >                            "falling back on userspace virtio", -r);
> >               n->vhost_started = 0;
> > +            status_set = status_set & ~VIRTIO_CONFIG_S_DRIVER_OK;
> >           }
> >       } else {
> >           vhost_net_stop(vdev, n->nic->ncs, queues);
> > +        status_set = status_set & ~VIRTIO_CONFIG_S_DRIVER_OK;
> >           n->vhost_started = 0;
> >       }
> > +    if (vhost_started_pre != n->vhost_started) {
> > +            vhost_set_state(peer, status_set);
>
>
> Any reason why not just passing virtio device status to vhost-vdpa?
>
I will double check and fix this
>
> > +    }
> >   }
> >
> >   static int virtio_net_set_vnet_endian_one(VirtIODevice *vdev,
> > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> > index 6f6670783f..f823055167 100644
> > --- a/include/hw/virtio/vhost-backend.h
> > +++ b/include/hw/virtio/vhost-backend.h
> > @@ -112,6 +112,7 @@ typedef int (*vhost_get_inflight_fd_op)(struct vhost_dev *dev,
> >   typedef int (*vhost_set_inflight_fd_op)(struct vhost_dev *dev,
> >                                           struct vhost_inflight *inflight);
> >
> > +typedef int (*vhost_set_state_op)(struct vhost_dev *dev, uint8_t state);
>
>
> Need document what's the meaning of state here, is it e.g virtio device
> status? If yes, is it better to rename it to vhost_set_status()?
>
> Thanks
>
sure will fix this
>
> >   typedef struct VhostOps {
> >       VhostBackendType backend_type;
> >       vhost_backend_init vhost_backend_init;
> > @@ -152,6 +153,7 @@ typedef struct VhostOps {
> >       vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter;
> >       vhost_get_inflight_fd_op vhost_get_inflight_fd;
> >       vhost_set_inflight_fd_op vhost_set_inflight_fd;
> > +    vhost_set_state_op vhost_set_state;
> >   } VhostOps;
> >
> >   extern const VhostOps user_ops;
> > diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
> > index 77e47398c4..6548a5a105 100644
> > --- a/include/net/vhost_net.h
> > +++ b/include/net/vhost_net.h
> > @@ -39,5 +39,5 @@ int vhost_set_vring_enable(NetClientState * nc, int enable);
> >   uint64_t vhost_net_get_acked_features(VHostNetState *net);
> >
> >   int vhost_net_set_mtu(struct vhost_net *net, uint16_t mtu);
> > -
> > +int vhost_set_state(NetClientState *nc, uint8_t state);
> >   #endif
>



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 4/9] vhost-vdpa: introduce vhost-vdpa net client
  2020-05-08 16:41   ` Eric Blake
@ 2020-05-09  7:17     ` Cindy Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Cindy Lu @ 2020-05-09  7:17 UTC (permalink / raw)
  To: Eric Blake
  Cc: Cornelia Huck, Michael Tsirkin, Jason Wang, qemu-devel, hanand,
	rob.miller, saugatm, Markus Armbruster, hch,
	Eugenio Perez Martin, jgg, mhabets, shahafs, kevin.tian, parav,
	vmireyno, Liang, Cunming, gdawar, jiri, xiao.w.wang,
	Stefan Hajnoczi, Wang, Zhihong, Tiwei Bie, Ariel Adam, rdunlap,
	Maxime Coquelin, Zhu, Lingshan

On Sat, May 9, 2020 at 12:42 AM Eric Blake <eblake@redhat.com> wrote:
>
> On 5/8/20 11:32 AM, Cindy Lu wrote:
> > From: Tiwei Bie <tiwei.bie@intel.com>
> >
> > This patch set introduces a new net client type: vhost-vdpa.
> > vhost-vdpa net client will set up a vDPA device which is specified
> > by a "vhostdev" parameter.
> >
> > Co-authored-by: Lingshan Zhu <lingshan.zhu@intel.com>
> > Signed-off-by: Cindy Lu <lulu@redhat.com>
> > ---
>
> Just looking at UI:
>
>
> > +++ b/qapi/net.json
> > @@ -441,6 +441,23 @@
> >       '*queues':        'int' } }
> >
> >   ##
> > +# @NetdevVhostVDPAOptions:
> > +#
> > +# Vhost-vdpa network backend
> > +#
> > +# @vhostdev: name of a vdpa dev path in sysfs
> > +#
> > +# @queues: number of queues to be created for multiqueue vhost-vdpa
> > +#          (default: 1) (Since 5.1)
>
> No need to mark a 'since' tag on a member introduced at the same time as
> the overall struct itself.
>
Will fix this
> > +#
> > +# Since: 5.1
> > +##
> > +{ 'struct': 'NetdevVhostVDPAOptions',
> > +  'data': {
> > +    '*vhostdev':     'str',
>
> What does this default to if omitted?
>
> > +    '*fd':           'str',
>
> Not documented above.
>
> > +    '*queues':       'int' } }
> > +##
>
> Missing blank line separator.
>
Thanks Eric, Will fix  these all
> >   # @NetClientDriver:
> >   #
> >   # Available netdev drivers.
> > @@ -451,7 +468,7 @@
> >   ##
> >   { 'enum': 'NetClientDriver',
> >     'data': [ 'none', 'nic', 'user', 'tap', 'l2tpv3', 'socket', 'vde',
> > -            'bridge', 'hubport', 'netmap', 'vhost-user' ] }
> > +            'bridge', 'hubport', 'netmap', 'vhost-user', 'vhost-vdpa' ] }
>
> Missing a line above that 'vhost-vdpa' is new to 5.1.
>
>
Will fix this
> > @@ -2749,6 +2756,18 @@ qemu -m 512 -object memory-backend-file,id=mem,size=512M,mem-path=/hugetlbfs,sha
> >        -device virtio-net-pci,netdev=net0
> >   @end example
> >
> > +@item -netdev vhost-vdpa,vhostdev=/path/to/dev
> > +Establish a vhost-vdpa netdev, backed by a vhostdev. The chardev should
> > +be a unix domain socket backed one. The vhost-vdpa uses a specifically defined
> > +protocol to pass vhost ioctl replacement messages to an application on the other
> > +end of the socket.
> > +Example:
> > +@example
> > +qemu -m 512 -object memory-backend-file,id=mem,size=512M,mem-path=/hugetlbfs,share=on \
> > +     -numa node,memdev=mem \
> > +     -netdev type=vhost-vdpa,id=net0,vhostdev=/path/to/dev \
> > +     -device virtio-net-pci,netdev=net0
> > +@end example
> >   @item -netdev hubport,id=@var{id},hubid=@var{hubid}[,netdev=@var{nd}]
> >
> >   Create a hub port on the emulated hub with ID @var{hubid}.
> >
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
>



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 4/9] vhost-vdpa: introduce vhost-vdpa net client
  2020-05-09  2:40   ` Jason Wang
@ 2020-05-09  7:31     ` Cindy Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Cindy Lu @ 2020-05-09  7:31 UTC (permalink / raw)
  To: Jason Wang
  Cc: Cornelia Huck, Michael Tsirkin, mhabets, qemu-devel, hanand,
	rob.miller, saugatm, Markus Armbruster, hch,
	Eugenio Perez Martin, jgg, shahafs, kevin.tian, parav, vmireyno,
	Liang, Cunming, gdawar, jiri, xiao.w.wang, Stefan Hajnoczi, Wang,
	Zhihong, Tiwei Bie, Ariel Adam, rdunlap, Maxime Coquelin, Zhu,
	Lingshan

On Sat, May 9, 2020 at 10:40 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2020/5/9 上午12:32, Cindy Lu wrote:
> > From: Tiwei Bie <tiwei.bie@intel.com>
>
>
> If you think you've done a huge refactor on the code, you can change the
> author but need to keep the sob of Tiwei.
>
>
> >
> > This patch set introduces a new net client type: vhost-vdpa.
> > vhost-vdpa net client will set up a vDPA device which is specified
> > by a "vhostdev" parameter.
> >
> > Co-authored-by: Lingshan Zhu <lingshan.zhu@intel.com>
> > Signed-off-by: Cindy Lu <lulu@redhat.com>
> > ---
> >   configure                |  21 ++++
> >   include/net/vhost-vdpa.h |  19 ++++
> >   include/net/vhost_net.h  |   1 +
>
>
> Patch 5 which is the infrastructure of vhost-vpda should come first.
> Please re-order the patch in next version.
>
Sure, Will fix this
>
> >   net/Makefile.objs        |   2 +-
> >   net/clients.h            |   2 +
> >   net/net.c                |   3 +
> >   net/vhost-vdpa.c         | 227 +++++++++++++++++++++++++++++++++++++++
> >   qapi/net.json            |  22 +++-
> >   qemu-options.hx          |  19 ++++
> >   9 files changed, 313 insertions(+), 3 deletions(-)
> >   create mode 100644 include/net/vhost-vdpa.h
> >   create mode 100644 net/vhost-vdpa.c
> >
> > diff --git a/configure b/configure
> > index 6099be1d84..bdd732e3bb 100755
> > --- a/configure
> > +++ b/configure
> > @@ -1505,6 +1505,10 @@ for opt do
> >     ;;
> >     --enable-vhost-user) vhost_user="yes"
> >     ;;
> > +  --disable-vhost-vdpa) vhost_vdpa="no"
> > +  ;;
> > +  --enable-vhost-vdpa) vhost_vdpa="yes"
> > +  ;;
> >     --disable-vhost-kernel) vhost_kernel="no"
> >     ;;
> >     --enable-vhost-kernel) vhost_kernel="yes"
> > @@ -1780,6 +1784,7 @@ disabled with --disable-FEATURE, default is enabled if available:
> >     vhost-crypto    vhost-user-crypto backend support
> >     vhost-kernel    vhost kernel backend support
> >     vhost-user      vhost-user backend support
> > +  vhost-vdpa      vhost-vdpa backend support
>
>
> Maybe "vhost-vdpa kernel backend support" is better.
>
>
Will  fix this
> >     spice           spice
> >     rbd             rados block device (rbd)
> >     libiscsi        iscsi support
> > @@ -2241,6 +2246,10 @@ test "$vhost_user" = "" && vhost_user=yes
> >   if test "$vhost_user" = "yes" && test "$mingw32" = "yes"; then
> >     error_exit "vhost-user isn't available on win32"
> >   fi
> > +test "$vhost_vdpa" = "" && vhost_vdpa=yes
> > +if test "$vhost_vdpa" = "yes" && test "$mingw32" = "yes"; then
> > +  error_exit "vhost-vdpa isn't available on win32"
> > +fi
>
>
> Let's add a check for Linux like vhost kernel below.
>
Will fix this
>
> >   test "$vhost_kernel" = "" && vhost_kernel=$linux
> >   if test "$vhost_kernel" = "yes" && test "$linux" != "yes"; then
> >     error_exit "vhost-kernel is only available on Linux"
> > @@ -2269,6 +2278,11 @@ test "$vhost_user_fs" = "" && vhost_user_fs=$vhost_user
> >   if test "$vhost_user_fs" = "yes" && test "$vhost_user" = "no"; then
> >     error_exit "--enable-vhost-user-fs requires --enable-vhost-user"
> >   fi
> > +#vhost-vdpa backends
> > +test "$vhost_net_vdpa" = "" && vhost_net_vdpa=$vhost_vdpa
> > +if test "$vhost_net_vdpa" = "yes" && test "$vhost_vdpa" = "no"; then
> > +  error_exit "--enable-vhost-net-vdpa requires --enable-vhost-vdpa"
> > +fi
> >
> >   # OR the vhost-kernel and vhost-user values for simplicity
> >   if test "$vhost_net" = ""; then
> > @@ -6543,6 +6557,7 @@ echo "vhost-scsi support $vhost_scsi"
> >   echo "vhost-vsock support $vhost_vsock"
> >   echo "vhost-user support $vhost_user"
> >   echo "vhost-user-fs support $vhost_user_fs"
> > +echo "vhost-vdpa support $vhost_vdpa"
> >   echo "Trace backends    $trace_backends"
> >   if have_backend "simple"; then
> >   echo "Trace output file $trace_file-<pid>"
> > @@ -7031,6 +7046,9 @@ fi
> >   if test "$vhost_net_user" = "yes" ; then
> >     echo "CONFIG_VHOST_NET_USER=y" >> $config_host_mak
> >   fi
> > +if test "$vhost_net_vdpa" = "yes" ; then
> > +  echo "CONFIG_VHOST_NET_VDPA=y" >> $config_host_mak
> > +fi
> >   if test "$vhost_crypto" = "yes" ; then
> >     echo "CONFIG_VHOST_CRYPTO=y" >> $config_host_mak
> >   fi
> > @@ -7043,6 +7061,9 @@ fi
> >   if test "$vhost_user" = "yes" ; then
> >     echo "CONFIG_VHOST_USER=y" >> $config_host_mak
> >   fi
> > +if test "$vhost_vdpa" = "yes" ; then
> > +  echo "CONFIG_VHOST_VDPA=y" >> $config_host_mak
> > +fi
> >   if test "$vhost_user_fs" = "yes" ; then
> >     echo "CONFIG_VHOST_USER_FS=y" >> $config_host_mak
> >   fi
> > diff --git a/include/net/vhost-vdpa.h b/include/net/vhost-vdpa.h
> > new file mode 100644
> > index 0000000000..6ce0d04f72
> > --- /dev/null
> > +++ b/include/net/vhost-vdpa.h
> > @@ -0,0 +1,19 @@
> > +/*
> > + * vhost-vdpa.h
> > + *
> > + * Copyright(c) 2017-2018 Intel Corporation.
> > + * Copyright(c) 2020 Red Hat, Inc.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#ifndef VHOST_VDPA_H
> > +#define VHOST_VDPA_H
> > +
> > +struct vhost_net;
> > +struct vhost_net *vhost_vdpa_get_vhost_net(NetClientState *nc);
> > +uint64_t vhost_vdpa_get_acked_features(NetClientState *nc);
> > +
> > +#endif /* VHOST_VDPA_H */
> > diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
> > index 6548a5a105..b47844bf29 100644
> > --- a/include/net/vhost_net.h
> > +++ b/include/net/vhost_net.h
> > @@ -40,4 +40,5 @@ uint64_t vhost_net_get_acked_features(VHostNetState *net);
> >
> >   int vhost_net_set_mtu(struct vhost_net *net, uint16_t mtu);
> >   int vhost_set_state(NetClientState *nc, uint8_t state);
> > +int vhost_net_get_device_id(struct vhost_net *net, uint32_t *device_id);
>
>
> Let's move this function to vhost-vdpa generic header instead of net.
>
Will fix this
>
> >   #endif
> > diff --git a/net/Makefile.objs b/net/Makefile.objs
> > index c5d076d19c..5ab45545db 100644
> > --- a/net/Makefile.objs
> > +++ b/net/Makefile.objs
> > @@ -26,7 +26,7 @@ tap-obj-$(CONFIG_SOLARIS) = tap-solaris.o
> >   tap-obj-y ?= tap-stub.o
> >   common-obj-$(CONFIG_POSIX) += tap.o $(tap-obj-y)
> >   common-obj-$(CONFIG_WIN32) += tap-win32.o
> > -
> > +common-obj-$(CONFIG_VHOST_NET_VDPA) += vhost-vdpa.o
> >   vde.o-libs = $(VDE_LIBS)
> >
> >   common-obj-$(CONFIG_CAN_BUS) += can/
> > diff --git a/net/clients.h b/net/clients.h
> > index a6ef267e19..92f9b59aed 100644
> > --- a/net/clients.h
> > +++ b/net/clients.h
> > @@ -61,4 +61,6 @@ int net_init_netmap(const Netdev *netdev, const char *name,
> >   int net_init_vhost_user(const Netdev *netdev, const char *name,
> >                           NetClientState *peer, Error **errp);
> >
> > +int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > +                        NetClientState *peer, Error **errp);
> >   #endif /* QEMU_NET_CLIENTS_H */
> > diff --git a/net/net.c b/net/net.c
> > index b3192deaf1..9eff1ae982 100644
> > --- a/net/net.c
> > +++ b/net/net.c
> > @@ -965,6 +965,9 @@ static int (* const net_client_init_fun[NET_CLIENT_DRIVER__MAX])(
> >   #ifdef CONFIG_VHOST_NET_USER
> >           [NET_CLIENT_DRIVER_VHOST_USER] = net_init_vhost_user,
> >   #endif
> > +#ifdef CONFIG_VHOST_NET_VDPA
> > +        [NET_CLIENT_DRIVER_VHOST_VDPA] = net_init_vhost_vdpa,
> > +#endif
> >   #ifdef CONFIG_L2TPV3
> >           [NET_CLIENT_DRIVER_L2TPV3]    = net_init_l2tpv3,
> >   #endif
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > new file mode 100644
> > index 0000000000..c29678fdf1
> > --- /dev/null
> > +++ b/net/vhost-vdpa.c
> > @@ -0,0 +1,227 @@
> > +/*
> > + * vhost-vdpa.c
> > + *
> > + * Copyright(c) 2017-2018 Intel Corporation.
> > + * Copyright(c) 2020 Red Hat, Inc.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "clients.h"
> > +#include "net/vhost_net.h"
> > +#include "net/vhost-vdpa.h"
> > +#include "hw/virtio/vhost-vdpa.h"
> > +#include "qemu/config-file.h"
> > +#include "qemu/error-report.h"
> > +#include "qemu/option.h"
> > +#include "qapi/error.h"
> > +#include <linux/vfio.h>
>
>
> No need any more.
>
>
> > +#include <sys/ioctl.h>
> > +#include <err.h>
> > +#include <linux/virtio_net.h>
>
>
> That's not the way we include standard linux headers, qemu maintain a
> copy of standard linux headers.
>
> Need use #include "standard-headers/linux/xxx.h"
>
Will fix this
>
> > +#include "monitor/monitor.h"
> > +#include "qemu/sockets.h"
>
>
> Do we really need this?
>
Will fix this
>
> > +#include "hw/virtio/vhost.h"
> > +
> > +/* Todo:need to add the multiqueue support here */
> > +typedef struct VhostVDPAState {
> > +    NetClientState nc;
> > +    struct vhost_vdpa vhost_vdpa;
> > +    VHostNetState *vhost_net;
> > +    uint64_t acked_features;
> > +    bool started;
> > +} VhostVDPAState;
> > +
> > +VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
> > +{
> > +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > +    return s->vhost_net;
> > +}
> > +
> > +uint64_t vhost_vdpa_get_acked_features(NetClientState *nc)
> > +{
> > +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > +    return s->acked_features;
> > +}
> > +
> > +static void vhost_vdpa_del(NetClientState *ncs)
> > +{
> > +    VhostVDPAState *s;
> > +
> > +    assert(ncs->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > +
> > +    s = DO_UPCAST(VhostVDPAState, nc, ncs);
> > +
> > +    if (s->vhost_net) {
> > +        /* save acked features */
> > +        uint64_t features = vhost_net_get_acked_features(s->vhost_net);
> > +        if (features) {
> > +            s->acked_features = features;
> > +         }
> > +        vhost_net_cleanup(s->vhost_net);
> > +    }
> > +}
> > +
> > +static int vhost_vdpa_add(NetClientState *ncs, void *be)
> > +{
> > +    VhostNetOptions options;
> > +    struct vhost_net *net = NULL;
> > +    VhostVDPAState *s;
> > +
> > +    options.backend_type = VHOST_BACKEND_TYPE_VDPA;
> > +
> > +    assert(ncs->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > +
> > +    s = DO_UPCAST(VhostVDPAState, nc, ncs);
> > +
> > +    options.net_backend = ncs;
> > +    options.opaque      = be;
> > +    options.busyloop_timeout = 0;
> > +    net = vhost_net_init(&options);
> > +    if (!net) {
> > +        error_report("failed to init vhost_net for queue");
> > +        goto err;
> > +     }
> > +
> > +     if (s->vhost_net) {
> > +        vhost_net_cleanup(s->vhost_net);
> > +        g_free(s->vhost_net);
> > +     }
> > +     s->vhost_net = net;
> > +
> > +    return 0;
> > +
> > +err:
> > +    if (net) {
> > +        vhost_net_cleanup(net);
> > +    }
> > +    vhost_vdpa_del(ncs);
> > +    return -1;
> > +}
> > +static void vhost_vdpa_cleanup(NetClientState *nc)
> > +{
> > +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > +
> > +    if (s->vhost_net) {
> > +        vhost_net_cleanup(s->vhost_net);
> > +        g_free(s->vhost_net);
> > +        s->vhost_net = NULL;
> > +    }
> > +
> > +    qemu_purge_queued_packets(nc);
> > +}
> > +
> > +static bool vhost_vdpa_has_vnet_hdr(NetClientState *nc)
> > +{
> > +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > +
> > +    return true;
> > +}
> > +
> > +static bool vhost_vdpa_has_ufo(NetClientState *nc)
> > +{
> > +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > +    uint64_t  features = 0;
> > +
> > +    features |= (1ULL << VIRTIO_NET_F_HOST_UFO);
> > +    features = vhost_net_get_features(s->vhost_net, features);
> > +    return !!(features & (1ULL << VIRTIO_NET_F_HOST_UFO));
> > +
> > +}
> > +
> > +static int vhost_vdpa_check_device_id(NetClientState *nc)
> > +{
> > +    uint32_t device_id;
> > +    int ret;
> > +    assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > +    VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > +    /* Get the device id from hw*/
> > +    ret = vhost_net_get_device_id(s->vhost_net, &device_id);
> > +    if (device_id != VIRTIO_ID_NET) {
> > +        return -ENOTSUP;
> > +    }
> > +    return ret;
> > +}
> > +
> > +static NetClientInfo net_vhost_vdpa_info = {
> > +        .type = NET_CLIENT_DRIVER_VHOST_VDPA,
> > +        .size = sizeof(VhostVDPAState),
> > +        .cleanup = vhost_vdpa_cleanup,
> > +        .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
> > +        .has_ufo = vhost_vdpa_has_ufo,
> > +};
> > +
> > +static int net_vhost_vdpa_init(NetClientState *peer, const char *device,
> > +                               const char *name, const char *vhostdev,
> > +                               bool has_fd, char *fd)
> > +{
> > +    NetClientState *nc = NULL;
> > +    VhostVDPAState *s;
> > +    int vdpa_device_fd = -1;
> > +    Error *err = NULL;
> > +
> > +    assert(name);
> > +
> > +    nc = qemu_new_net_client(&net_vhost_vdpa_info, peer, device, name);
> > +    snprintf(nc->info_str, sizeof(nc->info_str), "vhost-vdpa");
> > +    nc->queue_index = 0;
> > +
> > +    s = DO_UPCAST(VhostVDPAState, nc, nc);
> > +
> > +    if (has_fd) {
> > +        vdpa_device_fd = monitor_fd_param(cur_mon, fd, &err);
> > +    } else{
> > +        vdpa_device_fd = open(vhostdev, O_RDWR);
> > +    }
> > +
> > +    if (vdpa_device_fd == -1) {
> > +        return -errno;
> > +     }
> > +    s->vhost_vdpa.device_fd = vdpa_device_fd;
> > +    vhost_vdpa_add(nc, (void *)&s->vhost_vdpa);
> > +    assert(s->vhost_net);
> > +    /* check the device id for vdpa */
> > +    return vhost_vdpa_check_device_id(nc);
>
>
> We probably need to the check earlier. If we do things like this, we
> will probably leak vhost_device_fd.
>
there may have some problem to get this device id before the vdpa_add,
I will double check this and try to find a solution
>
> > +}
> > +
> > +static int net_vhost_check_net(void *opaque, QemuOpts *opts, Error **errp)
> > +{
> > +    const char *name = opaque;
> > +    const char *driver, *netdev;
> > +
> > +    driver = qemu_opt_get(opts, "driver");
> > +    netdev = qemu_opt_get(opts, "netdev");
> > +    if (!driver || !netdev) {
> > +        return 0;
> > +    }
> > +
> > +    if (strcmp(netdev, name) == 0 &&
> > +        !g_str_has_prefix(driver, "virtio-net-")) {
> > +        error_setg(errp, "vhost-vdpa requires frontend driver virtio-net-*");
> > +        return -1;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
> > +                        NetClientState *peer, Error **errp)
> > +{
> > +    const NetdevVhostVDPAOptions *opts;
> > +
> > +    assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > +    opts = &netdev->u.vhost_vdpa;
> > +    /* verify net frontend */
> > +    if (qemu_opts_foreach(qemu_find_opts("device"), net_vhost_check_net,
> > +                          (char *)name, errp)) {
> > +        return -1;
> > +    }
> > +    return net_vhost_vdpa_init(peer, "vhost_vdpa", name, opts->vhostdev,
> > +                    opts->has_fd, opts->fd);
> > +}
> > diff --git a/qapi/net.json b/qapi/net.json
> > index 335295be50..0f4fa6fffc 100644
> > --- a/qapi/net.json
> > +++ b/qapi/net.json
> > @@ -441,6 +441,23 @@
> >       '*queues':        'int' } }
> >
> >   ##
> > +# @NetdevVhostVDPAOptions:
> > +#
> > +# Vhost-vdpa network backend
> > +#
> > +# @vhostdev: name of a vdpa dev path in sysfs
> > +#
> > +# @queues: number of queues to be created for multiqueue vhost-vdpa
> > +#          (default: 1) (Since 5.1)
> > +#
> > +# Since: 5.1
> > +##
> > +{ 'struct': 'NetdevVhostVDPAOptions',
> > +  'data': {
> > +    '*vhostdev':     'str',
> > +    '*fd':           'str',
> > +    '*queues':       'int' } }
> > +##
> >   # @NetClientDriver:
> >   #
> >   # Available netdev drivers.
> > @@ -451,7 +468,7 @@
> >   ##
> >   { 'enum': 'NetClientDriver',
> >     'data': [ 'none', 'nic', 'user', 'tap', 'l2tpv3', 'socket', 'vde',
> > -            'bridge', 'hubport', 'netmap', 'vhost-user' ] }
> > +            'bridge', 'hubport', 'netmap', 'vhost-user', 'vhost-vdpa' ] }
> >
> >   ##
> >   # @Netdev:
> > @@ -479,7 +496,8 @@
> >       'bridge':   'NetdevBridgeOptions',
> >       'hubport':  'NetdevHubPortOptions',
> >       'netmap':   'NetdevNetmapOptions',
> > -    'vhost-user': 'NetdevVhostUserOptions' } }
> > +    'vhost-user': 'NetdevVhostUserOptions',
> > +    'vhost-vdpa': 'NetdevVhostVDPAOptions' } }
> >
> >   ##
> >   # @NetLegacy:
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 65c9473b73..08256d9d4e 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -2291,6 +2291,10 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
> >   #ifdef CONFIG_POSIX
> >       "-netdev vhost-user,id=str,chardev=dev[,vhostforce=on|off]\n"
> >       "                configure a vhost-user network, backed by a chardev 'dev'\n"
> > +#endif
> > +#ifdef CONFIG_POSIX
> > +    "-netdev vhost-vdpa,id=str,vhostdev=/path/to/dev\n"
> > +    "                configure a vhost-vdpa network, backed by a vhostdev 'dev'\n"
> >   #endif
> >       "-netdev hubport,id=str,hubid=n[,netdev=nd]\n"
> >       "                configure a hub port on the hub with ID 'n'\n", QEMU_ARCH_ALL)
> > @@ -2310,6 +2314,9 @@ DEF("nic", HAS_ARG, QEMU_OPTION_nic,
> >   #endif
> >   #ifdef CONFIG_POSIX
> >       "vhost-user|"
> > +#endif
> > +#ifdef CONFIG_POSIX
> > +    "vhost-vdpa|"
> >   #endif
> >       "socket][,option][,...][mac=macaddr]\n"
> >       "                initialize an on-board / default host NIC (using MAC address\n"
> > @@ -2749,6 +2756,18 @@ qemu -m 512 -object memory-backend-file,id=mem,size=512M,mem-path=/hugetlbfs,sha
> >        -device virtio-net-pci,netdev=net0
> >   @end example
> >
> > +@item -netdev vhost-vdpa,vhostdev=/path/to/dev
> > +Establish a vhost-vdpa netdev, backed by a vhostdev. The chardev should
> > +be a unix domain socket backed one.
>
>
> This seems wrong, we don't use unix domain socket.
>
> Thanks
>
Thanks Jason, will fix this
>
> >   The vhost-vdpa uses a specifically defined
> > +protocol to pass vhost ioctl replacement messages to an application on the other
> > +end of the socket.
> > +Example:
> > +@example
> > +qemu -m 512 -object memory-backend-file,id=mem,size=512M,mem-path=/hugetlbfs,share=on \
> > +     -numa node,memdev=mem \
> > +     -netdev type=vhost-vdpa,id=net0,vhostdev=/path/to/dev \
> > +     -device virtio-net-pci,netdev=net0
> > +@end example
> >   @item -netdev hubport,id=@var{id},hubid=@var{hubid}[,netdev=@var{nd}]
> >
> >   Create a hub port on the emulated hub with ID @var{hubid}.
>



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 5/9] vhost-vdpa: implement vhost-vdpa backend
  2020-05-09  3:00   ` Jason Wang
  2020-05-09  3:07     ` Jason Wang
@ 2020-05-09  8:14     ` Cindy Lu
  1 sibling, 0 replies; 33+ messages in thread
From: Cindy Lu @ 2020-05-09  8:14 UTC (permalink / raw)
  To: Jason Wang
  Cc: Cornelia Huck, Michael Tsirkin, mhabets, qemu-devel, hanand,
	rob.miller, saugatm, Markus Armbruster, hch,
	Eugenio Perez Martin, jgg, shahafs, kevin.tian, parav, vmireyno,
	Liang, Cunming, gdawar, jiri, xiao.w.wang, Stefan Hajnoczi, Wang,
	Zhihong, Tiwei Bie, Ariel Adam, rdunlap, Maxime Coquelin, Zhu,
	Lingshan

On Sat, May 9, 2020 at 11:00 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2020/5/9 上午12:32, Cindy Lu wrote:
> > From: Tiwei Bie <tiwei.bie@intel.com>
> >
> > Currently we have 2 types of vhost backends in QEMU: vhost kernel and
> > vhost-user. The above patch provides a generic device for vDPA purpose,
> > this vDPA device exposes to user space a non-vendor-specific configuration
> > interface for setting up a vhost HW accelerator, this patch set introduces
> > a third vhost backend called vhost-vdpa based on the vDPA interface.
> >
> > Vhost-vdpa usage:
> >
> >    qemu-system-x86_64 -cpu host -enable-kvm \
> >      ......
> >    -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-id,id=vhost-vdpa0 \
> >    -device virtio-net-pci,netdev=vhost-vdpa0,page-per-vq=on \
> >
> > Co-Authored-By: Lingshan zhu <lingshan.zhu@intel.com>
> > Signed-off-by: Cindy Lu <lulu@redhat.com>
>
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
>
>
> > ---
> >   hw/net/vhost_net.c                |  39 ++-
> >   hw/virtio/Makefile.objs           |   1 +
> >   hw/virtio/vhost-backend.c         |   5 +
> >   hw/virtio/vhost-vdpa.c            | 447 ++++++++++++++++++++++++++++++
> >   hw/virtio/vhost.c                 |  14 +
> >   include/hw/virtio/vhost-backend.h |   8 +-
> >   include/hw/virtio/vhost-vdpa.h    |  25 ++
> >   include/hw/virtio/vhost.h         |   1 +
> >   8 files changed, 538 insertions(+), 2 deletions(-)
> >   create mode 100644 hw/virtio/vhost-vdpa.c
> >   create mode 100644 include/hw/virtio/vhost-vdpa.h
> >
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > index 63b2a85d6e..1af39abaf3 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -17,8 +17,10 @@
> >   #include "net/net.h"
> >   #include "net/tap.h"
> >   #include "net/vhost-user.h"
> > +#include "net/vhost-vdpa.h"
> >
> >   #include "standard-headers/linux/vhost_types.h"
> > +#include "linux-headers/linux/vhost.h"
> >   #include "hw/virtio/virtio-net.h"
> >   #include "net/vhost_net.h"
> >   #include "qemu/error-report.h"
> > @@ -85,6 +87,29 @@ static const int user_feature_bits[] = {
> >       VHOST_INVALID_FEATURE_BIT
> >   };
> >
> > +static const int vdpa_feature_bits[] = {
> > +    VIRTIO_F_NOTIFY_ON_EMPTY,
> > +    VIRTIO_RING_F_INDIRECT_DESC,
> > +    VIRTIO_RING_F_EVENT_IDX,
> > +    VIRTIO_F_ANY_LAYOUT,
> > +    VIRTIO_F_VERSION_1,
> > +    VIRTIO_NET_F_CSUM,
> > +    VIRTIO_NET_F_GUEST_CSUM,
> > +    VIRTIO_NET_F_GSO,
> > +    VIRTIO_NET_F_GUEST_TSO4,
> > +    VIRTIO_NET_F_GUEST_TSO6,
> > +    VIRTIO_NET_F_GUEST_ECN,
> > +    VIRTIO_NET_F_GUEST_UFO,
> > +    VIRTIO_NET_F_HOST_TSO4,
> > +    VIRTIO_NET_F_HOST_TSO6,
> > +    VIRTIO_NET_F_HOST_ECN,
> > +    VIRTIO_NET_F_HOST_UFO,
> > +    VIRTIO_NET_F_MRG_RXBUF,
> > +    VIRTIO_NET_F_MTU,
> > +    VIRTIO_F_IOMMU_PLATFORM,
> > +    VIRTIO_NET_F_GUEST_ANNOUNCE,
> > +    VHOST_INVALID_FEATURE_BIT
> > +};
>
>
> The framework should be ready for packed ring, let's add that.
>
Will fix this
>
> >   static const int *vhost_net_get_feature_bits(struct vhost_net *net)
> >   {
> >       const int *feature_bits = 0;
> > @@ -96,6 +121,9 @@ static const int *vhost_net_get_feature_bits(struct vhost_net *net)
> >       case NET_CLIENT_DRIVER_VHOST_USER:
> >           feature_bits = user_feature_bits;
> >           break;
> > +    case NET_CLIENT_DRIVER_VHOST_VDPA:
> > +        feature_bits = vdpa_feature_bits;
> > +        break;
> >       default:
> >           error_report("Feature bits not defined for this type: %d",
> >                   net->nc->info->type);
> > @@ -110,7 +138,10 @@ uint64_t vhost_net_get_features(struct vhost_net *net, uint64_t features)
> >       return vhost_get_features(&net->dev, vhost_net_get_feature_bits(net),
> >               features);
> >   }
> > -
> > +int vhost_net_get_device_id(struct vhost_net *net, uint32_t * device_id)
> > +{
> > +    return vhost_dev_get_device_id(&net->dev, device_id);
> > +}
> >   void vhost_net_ack_features(struct vhost_net *net, uint64_t features)
> >   {
> >       net->dev.acked_features = net->dev.backend_features;
> > @@ -433,6 +464,12 @@ VHostNetState *get_vhost_net(NetClientState *nc)
> >           vhost_net = vhost_user_get_vhost_net(nc);
> >           assert(vhost_net);
> >           break;
> > +#endif
> > +#ifdef CONFIG_VHOST_NET_VDPA
> > +    case NET_CLIENT_DRIVER_VHOST_VDPA:
> > +        vhost_net = vhost_vdpa_get_vhost_net(nc);
> > +        assert(vhost_net);
> > +        break;
> >   #endif
> >       default:
> >           break;
> > diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
> > index e2f70fbb89..e7c5d4a862 100644
> > --- a/hw/virtio/Makefile.objs
> > +++ b/hw/virtio/Makefile.objs
> > @@ -5,6 +5,7 @@ obj-y += virtio.o
> >   obj-$(call lor,$(CONFIG_VHOST_USER),$(CONFIG_VHOST_KERNEL)) += vhost.o vhost-backend.o
> >   common-obj-$(call lnot,$(call lor,$(CONFIG_VHOST_USER),$(CONFIG_VHOST_KERNEL))) += vhost-stub.o
> >   obj-$(CONFIG_VHOST_USER) += vhost-user.o
> > +obj-$(CONFIG_VHOST_VDPA) += vhost-vdpa.o
> >
> >   common-obj-$(CONFIG_VIRTIO_RNG) += virtio-rng.o
> >   common-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
> > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > index 48905383f8..069ddb423d 100644
> > --- a/hw/virtio/vhost-backend.c
> > +++ b/hw/virtio/vhost-backend.c
> > @@ -285,6 +285,11 @@ int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
> >       case VHOST_BACKEND_TYPE_USER:
> >           dev->vhost_ops = &user_ops;
> >           break;
> > +#endif
> > +#ifdef CONFIG_VHOST_VDPA
> > +    case VHOST_BACKEND_TYPE_VDPA:
> > +        dev->vhost_ops = &vdpa_ops;
> > +        break;
> >   #endif
> >       default:
> >           error_report("Unknown vhost backend type");
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > new file mode 100644
> > index 0000000000..bac8a8aa2a
> > --- /dev/null
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -0,0 +1,447 @@
> > +/*
> > + * vhost-vdpa
> > + *
> > + *  Copyright(c) 2017-2018 Intel Corporation.
> > + *  Copyright(c) 2020 Red Hat, Inc.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include <linux/vhost.h>
> > +#include <linux/vfio.h>
> > +#include <sys/eventfd.h>
> > +#include <sys/ioctl.h>
> > +#include "hw/virtio/vhost.h"
> > +#include "hw/virtio/vhost-backend.h"
> > +#include "hw/virtio/virtio-net.h"
> > +#include "hw/virtio/vhost-vdpa.h"
> > +#include "qemu/main-loop.h"
> > +#include <linux/kvm.h>
> > +#include "sysemu/kvm.h"
> > +
> > +
> > +static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section)
> > +{
> > +    return (!memory_region_is_ram(section->mr) &&
> > +            !memory_region_is_iommu(section->mr)) ||
> > +           /*
> > +            * Sizing an enabled 64-bit BAR can cause spurious mappings to
> > +            * addresses in the upper part of the 64-bit address space.  These
> > +            * are never accessed by the CPU and beyond the address width of
> > +            * some IOMMU hardware.  TODO: VDPA should tell us the IOMMU width.
> > +            */
> > +           section->offset_within_address_space & (1ULL << 63);
> > +}
> > +
> > +static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
> > +                              void *vaddr, bool readonly)
> > +{
> > +    struct vhost_msg_v2 msg;
> > +    int fd = v->device_fd;
> > +    int ret = 0;
> > +
> > +    msg.type =  v->msg_type;
> > +    msg.iotlb.iova = iova;
> > +    msg.iotlb.size = size;
> > +    msg.iotlb.uaddr = (uint64_t)vaddr;
> > +    msg.iotlb.perm = readonly ? VHOST_ACCESS_RO : VHOST_ACCESS_RW;
> > +    msg.iotlb.type = VHOST_IOTLB_UPDATE;
> > +
> > +    if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> > +        error_report("failed to write, fd=%d, errno=%d (%s)",
> > +            fd, errno, strerror(errno));
> > +        return -EIO ;
> > +    }
> > +
> > +    return ret;
> > +}
> > +
> > +static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
> > +                                hwaddr size)
> > +{
> > +    struct vhost_msg_v2 msg;
> > +    int fd = v->device_fd;
> > +    int ret = 0;
> > +
> > +    msg.type =  v->msg_type;
> > +    msg.iotlb.iova = iova;
> > +    msg.iotlb.size = size;
> > +    msg.iotlb.type = VHOST_IOTLB_INVALIDATE;
> > +
> > +    if (write(fd, &msg, sizeof(msg)) != sizeof(msg)) {
> > +        error_report("failed to write, fd=%d, errno=%d (%s)",
> > +            fd, errno, strerror(errno));
> > +        return -EIO ;
> > +    }
> > +
> > +    return ret;
> > +}
> > +
> > +static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> > +                                           MemoryRegionSection *section)
> > +{
> > +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> > +    hwaddr iova;
> > +    Int128 llend, llsize;
> > +    void *vaddr;
> > +    int ret;
> > +
> > +    if (vhost_vdpa_listener_skipped_section(section)) {
> > +        return;
> > +    }
> > +
> > +    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
> > +                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> > +        error_report("%s received unaligned region", __func__);
> > +        return;
> > +    }
> > +
> > +    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
> > +    llend = int128_make64(section->offset_within_address_space);
> > +    llend = int128_add(llend, section->size);
> > +    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
> > +
> > +    if (int128_ge(int128_make64(iova), llend)) {
> > +        return;
> > +    }
> > +
> > +    memory_region_ref(section->mr);
> > +
> > +    /* Here we assume that memory_region_is_ram(section->mr)==true */
> > +
> > +    vaddr = memory_region_get_ram_ptr(section->mr) +
> > +            section->offset_within_region +
> > +            (iova - section->offset_within_address_space);
> > +
> > +    llsize = int128_sub(llend, int128_make64(iova));
> > +
> > +    ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
> > +                             vaddr, section->readonly);
> > +    if (ret) {
> > +        error_report("vhost vdpa map fail!");
> > +        if (memory_region_is_ram_device(section->mr)) {
> > +            /* Allow unexpected mappings not to be fatal for RAM devices */
> > +            error_report("map ram fail!");
> > +          return ;
> > +        }
> > +        goto fail;
> > +    }
> > +
> > +    return;
> > +
> > +fail:
> > +    if (memory_region_is_ram_device(section->mr)) {
> > +        error_report("failed to vdpa_dma_map. pci p2p may not work");
> > +        return;
> > +
> > +    }
> > +    /*
> > +     * On the initfn path, store the first error in the container so we
> > +     * can gracefully fail.  Runtime, there's not much we can do other
> > +     * than throw a hardware error.
> > +     */
> > +    error_report("vhost-vdpa: DMA mapping failed, unable to continue");
> > +    return;
> > +
> > +}
> > +
> > +static void vhost_vdpa_listener_region_del(MemoryListener *listener,
> > +                                           MemoryRegionSection *section)
> > +{
> > +    struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, listener);
> > +    hwaddr iova;
> > +    Int128 llend, llsize;
> > +    int ret;
> > +    bool try_unmap = true;
> > +
> > +    if (vhost_vdpa_listener_skipped_section(section)) {
> > +        return;
> > +    }
> > +
> > +    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
> > +                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
> > +        error_report("%s received unaligned region", __func__);
> > +        return;
> > +    }
> > +
> > +    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
> > +    llend = int128_make64(section->offset_within_address_space);
> > +    llend = int128_add(llend, section->size);
> > +    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
> > +
> > +    if (int128_ge(int128_make64(iova), llend)) {
> > +        return;
> > +    }
> > +
> > +    llsize = int128_sub(llend, int128_make64(iova));
> > +
> > +    if (try_unmap) {
> > +        ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
> > +        if (ret) {
> > +            error_report("vhost_vdpa dma unmap error!");
> > +        }
> > +    }
> > +
> > +    memory_region_unref(section->mr);
> > +}
> > +
> > +static const MemoryListener vhost_vdpa_memory_listener = {
> > +    .region_add = vhost_vdpa_listener_region_add,
> > +    .region_del = vhost_vdpa_listener_region_del,
> > +};
> > +
>
>
> Need comment to explain why vhost-vdpa requires a specific memory
> listener other than just reusing the one that is provided by vhost.
>
>
will fix this
> > +
>
>
> This newline is unnecessary.
>
will fix this
>
> > +static int vhost_vdpa_call(struct vhost_dev *dev, unsigned long int request,
> > +                             void *arg)
> > +{
> > +    struct vhost_vdpa *v = dev->opaque;
> > +    int fd = v->device_fd;
> > +
> > +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> > +
> > +    return ioctl(fd, request, arg);
> > +}
> > +
> > +static int vhost_vdpa_get_backend_features(struct vhost_dev *dev,
> > +                                   uint64_t *features)
> > +{
> > +    return vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, features);
> > +}
> > +
> > +static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque)
> > +{
> > +    struct vhost_vdpa *v;
> > +    uint64_t backend_features = 0;
> > +    int ret = 0;
> > +
> > +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> > +
> > +    v = opaque;
> > +    dev->opaque =  opaque ;
> > +
> > +    v->listener = vhost_vdpa_memory_listener;
> > +    memory_listener_register(&v->listener, &address_space_memory);
> > +
> > +    ret = vhost_vdpa_get_backend_features(dev, &backend_features);
> > +   /*
> > +    * In order to compatible with older version kernel,
> > +    * If the kernel not support this ioctl,
> > +    * set the msg_type for v2 by default
> > +    */
> > +    if (ret) {
> > +        v->msg_type = VHOST_IOTLB_MSG_V2;
> > +        return 0;
> > +     }
> > +    if (backend_features &  (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)) {
> > +        v->msg_type = VHOST_IOTLB_MSG_V2;
> > +     } else {
> > +        v->msg_type = VHOST_IOTLB_MSG;
> > +     }
>
>
> But in vhost_vdpa_dma_unmap/vhost_vdpa_dma_map, v2 is still being used.
>
> Consider vdpa implies msg_v2, and it doesn't implement
> get_backend_features. We use v2 unconditionally and leave this check in
> the future.
>
>
will fix this
> > +    return 0;
> > +}
> > +
> > +static int vhost_vdpa_cleanup(struct vhost_dev *dev)
> > +{
> > +    struct vhost_vdpa *v;
> > +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> > +
> > +    v = dev->opaque;
> > +    memory_listener_unregister(&v->listener);
> > +
> > +    dev->opaque = NULL;
> > +    return 0;
> > +}
> > +
> > +static int vhost_vdpa_memslots_limit(struct vhost_dev *dev)
> > +{
> > +    /*Using IOTLB API here,NOTE:please set the ulimit before using*/
>
>
> The "note" seems unrelated to the memslot here.
>
>
> > +    return INT_MAX;
> > +}
> > +
> > +static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
> > +                                   struct vhost_log *log)
> > +{
> > +    return -1;
> > +}
> > +
> > +static int vhost_vdpa_set_mem_table(struct vhost_dev *dev,
> > +                                    struct vhost_memory *mem)
> > +{
> > +
> > +    if (mem->padding) {
> > +        return -1;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static int vhost_vdpa_set_vring_addr(struct vhost_dev *dev,
> > +                                     struct vhost_vring_addr *addr)
> > +{
> > +    return vhost_vdpa_call(dev, VHOST_SET_VRING_ADDR, addr);
> > +}
> > +
> > +static int vhost_vdpa_set_vring_num(struct vhost_dev *dev,
> > +                                    struct vhost_vring_state *ring)
> > +{
> > +    return vhost_vdpa_call(dev, VHOST_SET_VRING_NUM, ring);
> > +}
> > +
> > +static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
> > +                                     struct vhost_vring_state *ring)
> > +{
> > +    return vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
> > +}
> > +
> > +static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
> > +                                     struct vhost_vring_state *ring)
> > +{
> > +
> > +    return vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
> > +}
> > +
> > +static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
> > +                                     struct vhost_vring_file *file)
> > +{
> > +    return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
> > +}
> > +
> > +static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> > +                                     struct vhost_vring_file *file)
> > +{
> > +    return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
> > +}
> > +
> > +static int vhost_vdpa_set_features(struct vhost_dev *dev,
> > +                                   uint64_t features)
> > +{
> > +    return vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
> > +}
> > +
> > +static int vhost_vdpa_get_features(struct vhost_dev *dev,
> > +                                   uint64_t *features)
> > +{
> > +    return vhost_vdpa_call(dev, VHOST_GET_FEATURES, features);
> > +}
> > +
> > +static int vhost_vdpa_get_device_id(struct vhost_dev *dev,
> > +                                   uint32_t *device_id)
> > +{
> > +    return vhost_vdpa_call(dev, VHOST_VDPA_GET_DEVICE_ID, device_id);
> > +}
> > +
> > +static int vhost_vdpa_set_owner(struct vhost_dev *dev)
> > +{
> > +    return vhost_vdpa_call(dev, VHOST_SET_OWNER, NULL);
> > +}
> > +
> > +static int vhost_vdpa_reset_device(struct vhost_dev *dev)
> > +{
> > +    return vhost_vdpa_call(dev, VHOST_RESET_OWNER, NULL);
> > +}
>
>
> I think we may want to reuse the helpers of kenrel backends instead of
> duplicating functions here.
>
will fix this
>
> > +
> > +static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, int idx)
> > +{
> > +    assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs);
> > +
> > +    return idx - dev->vq_index;
> > +}
> > +
> > +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
> > +{
> > +    int i;
> > +    for (i = 0; i < dev->nvqs; ++i) {
> > +        struct vhost_vring_state state = {
> > +            .index = dev->vq_index + i,
> > +            .num = 1,
> > +        };
> > +        vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
> > +    }
> > +    return 0;
> > +}
> > +static int  vhost_vdpa_set_config(struct vhost_dev *dev, const uint8_t *data,
>
>
> One unnecessary space between int and "vhost_vdpa_set_config"
>
will fix this
>
> > +                                   uint32_t offset, uint32_t size,
> > +                                   uint32_t flags)
> > +{
> > +
> > +    struct vhost_vdpa_config *config;
> > +    int ret;
> > +    if ((size > VHOST_VDPA_MAX_CONFIG_SIZE) || (data == NULL)) {
> > +        return -1;
> > +    }
> > +    config = g_new0(struct vhost_vdpa_config, 1);
>
>
> It's better to just use a temporary variable instead of allocate it
> dynamically?
>
sure, will fix this
>
> > +    if (config == NULL) {
> > +        return -EINVAL;
> > +    }
> > +    config->off = 0;
> > +    config->len = size;
> > +    memcpy(config->buf, data, size);
> > +    ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_CONFIG, config);
> > +     g_free(config);
> > +     return ret;
> > +}
> > +
> > + static int  vhost_vdpa_get_config(struct vhost_dev *dev, uint8_t *config,
> > +                                   uint32_t config_len)
> > +{
> > +    struct vhost_vdpa_config *v_config = NULL;
> > +    int ret;
> > +
> > +     if ((config_len > VHOST_VDPA_MAX_CONFIG_SIZE) || (config == NULL)) {
> > +        return -1;
> > +    }
> > +    v_config = g_new0(struct vhost_vdpa_config, 1);
>
>
> Let's use a variable on the stack.
>
will fix this
>
> > +
> > +    if (v_config == NULL) {
> > +        return -EINVAL;
> > +    }
> > +   ret =  vhost_vdpa_call(dev, VHOST_VDPA_GET_CONFIG, v_config);
>
>
> One extra space after '='.
>
>
> > +   if ((v_config->len > config_len - v_config->off) || (v_config->len == 0)) {
> > +        g_free(v_config);
> > +       return -EINVAL;
> > +    }
>
>
> Indentation is wired and why need such check?
>
>
> > +
> > +    memcpy(config, v_config->buf + v_config->off, config_len);
> > +    g_free(v_config);
> > +    return ret;
> > + }
>
>
> Newline is needed here.
>
>
> > +static int vhost_vdpa_set_state(struct vhost_dev *dev, uint8_t state)
> > +{
> > +    return vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &state);
> > +}
> > +
> > +
>
>
> This new line is not needed.
>
Will fix these all
>
> > +const VhostOps vdpa_ops = {
> > +        .backend_type = VHOST_BACKEND_TYPE_VDPA,
> > +        .vhost_backend_init = vhost_vdpa_init,
> > +        .vhost_backend_cleanup = vhost_vdpa_cleanup,
> > +        .vhost_backend_memslots_limit = vhost_vdpa_memslots_limit,
> > +        .vhost_set_log_base = vhost_vdpa_set_log_base,
> > +        .vhost_set_mem_table = vhost_vdpa_set_mem_table,
> > +        .vhost_set_vring_addr = vhost_vdpa_set_vring_addr,
> > +        .vhost_set_vring_endian = NULL,
> > +        .vhost_set_vring_num = vhost_vdpa_set_vring_num,
> > +        .vhost_set_vring_base = vhost_vdpa_set_vring_base,
> > +        .vhost_get_vring_base = vhost_vdpa_get_vring_base,
> > +        .vhost_set_vring_kick = vhost_vdpa_set_vring_kick,
> > +        .vhost_set_vring_call = vhost_vdpa_set_vring_call,
> > +        .vhost_set_features = vhost_vdpa_set_features,
> > +        .vhost_get_features = vhost_vdpa_get_features,
> > +        .vhost_set_owner = vhost_vdpa_set_owner,
> > +        .vhost_reset_device = vhost_vdpa_reset_device,
> > +        .vhost_get_vq_index = vhost_vdpa_get_vq_index,
> > +        .vhost_set_vring_ready = vhost_vdpa_set_vring_ready,
> > +        .vhost_get_config  = vhost_vdpa_get_config,
> > +        .vhost_set_config = vhost_vdpa_set_config,
> > +        .vhost_requires_shm_log = NULL,
> > +        .vhost_migration_done = NULL,
> > +        .vhost_backend_can_merge = NULL,
> > +        .vhost_net_set_mtu = NULL,
> > +        .vhost_set_iotlb_callback = NULL,
> > +        .vhost_send_device_iotlb_msg = NULL,
> > +        .vhost_set_state = vhost_vdpa_set_state,
> > +        .vhost_get_device_id = vhost_vdpa_get_device_id,
> > +};
>
>
> As said previously, need to consider to reuse vhost kernel helpers.
>
I will double check this
>
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index 4da0d5a6c5..6d2aa461d6 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -746,6 +746,12 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
> >           .log_guest_addr = vq->used_phys,
> >           .flags = enable_log ? (1 << VHOST_VRING_F_LOG) : 0,
> >       };
> > +    /*vDPA need to use the phys address here to set to hardware*/
> > +    if (dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA) {
> > +        addr.desc_user_addr = (uint64_t)(unsigned long)vq->desc_phys;
> > +        addr.avail_user_addr = (uint64_t)(unsigned long)vq->avail_phys;
> > +        addr.used_user_addr = (uint64_t)(unsigned long)vq->used_phys;
> > +    }
> >       int r = dev->vhost_ops->vhost_set_vring_addr(dev, &addr);
> >       if (r < 0) {
> >           VHOST_OPS_DEBUG("vhost_set_vring_addr failed");
> > @@ -1493,6 +1499,14 @@ int vhost_dev_set_config(struct vhost_dev *hdev, const uint8_t *data,
> >       return -1;
> >   }
> >
> > +int vhost_dev_get_device_id(struct vhost_dev *hdev, uint32_t *device_id)
> > +{
> > +    assert(hdev->vhost_ops);
> > +    if (hdev->vhost_ops->vhost_get_device_id) {
>
>
> If you introduce a new vhost_ops, please use a separate patch.
>
>
> > +        return hdev->vhost_ops->vhost_get_device_id(hdev, device_id);
> > +    }
> > +    return -1;
> > +}
> >   void vhost_dev_set_config_notifier(struct vhost_dev *hdev,
> >                                      const VhostDevConfigOps *ops)
> >   {
> > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> > index f823055167..8ac898a987 100644
> > --- a/include/hw/virtio/vhost-backend.h
> > +++ b/include/hw/virtio/vhost-backend.h
> > @@ -17,7 +17,8 @@ typedef enum VhostBackendType {
> >       VHOST_BACKEND_TYPE_NONE = 0,
> >       VHOST_BACKEND_TYPE_KERNEL = 1,
> >       VHOST_BACKEND_TYPE_USER = 2,
> > -    VHOST_BACKEND_TYPE_MAX = 3,
> > +    VHOST_BACKEND_TYPE_VDPA = 3,
> > +    VHOST_BACKEND_TYPE_MAX = 4,
> >   } VhostBackendType;
> >
> >   typedef enum VhostSetConfigType {
> > @@ -77,6 +78,7 @@ typedef int (*vhost_reset_device_op)(struct vhost_dev *dev);
> >   typedef int (*vhost_get_vq_index_op)(struct vhost_dev *dev, int idx);
> >   typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev,
> >                                            int enable);
> > +typedef int (*vhost_set_vring_ready_op)(struct vhost_dev *dev);
>
>
> A new patch for this patch.
>
> Thanks
>
Sure, Will fix this
>
> >   typedef bool (*vhost_requires_shm_log_op)(struct vhost_dev *dev);
> >   typedef int (*vhost_migration_done_op)(struct vhost_dev *dev,
> >                                          char *mac_addr);
> > @@ -113,6 +115,7 @@ typedef int (*vhost_set_inflight_fd_op)(struct vhost_dev *dev,
> >                                           struct vhost_inflight *inflight);
> >
> >   typedef int (*vhost_set_state_op)(struct vhost_dev *dev, uint8_t state);
> > +typedef int (*vhost_get_device_id_op)(struct vhost_dev *dev, uint32_t *dev_id);
> >   typedef struct VhostOps {
> >       VhostBackendType backend_type;
> >       vhost_backend_init vhost_backend_init;
> > @@ -139,6 +142,7 @@ typedef struct VhostOps {
> >       vhost_reset_device_op vhost_reset_device;
> >       vhost_get_vq_index_op vhost_get_vq_index;
> >       vhost_set_vring_enable_op vhost_set_vring_enable;
> > +    vhost_set_vring_ready_op vhost_set_vring_ready;
> >       vhost_requires_shm_log_op vhost_requires_shm_log;
> >       vhost_migration_done_op vhost_migration_done;
> >       vhost_backend_can_merge_op vhost_backend_can_merge;
> > @@ -154,9 +158,11 @@ typedef struct VhostOps {
> >       vhost_get_inflight_fd_op vhost_get_inflight_fd;
> >       vhost_set_inflight_fd_op vhost_set_inflight_fd;
> >       vhost_set_state_op vhost_set_state;
> > +    vhost_get_device_id_op vhost_get_device_id;
> >   } VhostOps;
> >
> >   extern const VhostOps user_ops;
> > +extern const VhostOps vdpa_ops;
> >
> >   int vhost_set_backend_type(struct vhost_dev *dev,
> >                              VhostBackendType backend_type);
> > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > new file mode 100644
> > index 0000000000..9d64fbfea9
> > --- /dev/null
> > +++ b/include/hw/virtio/vhost-vdpa.h
> > @@ -0,0 +1,25 @@
> > +/*
> > + * vhost-vdpa.h
> > + *
> > + * Copyright(c) 2017-2018 Intel Corporation.
> > + * Copyright(c) 2020 Red Hat, Inc.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#ifndef HW_VIRTIO_VHOST_VDPA_H
> > +#define HW_VIRTIO_VHOST_VDPA_H
> > +
> > +#include "hw/virtio/virtio.h"
> > +
> > +typedef struct vhost_vdpa {
> > +    int device_fd;
> > +    uint32_t msg_type;
> > +    MemoryListener listener;
> > +} VhostVDPA;
> > +
> > +extern AddressSpace address_space_memory;
> > +
> > +#endif
> > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > index 085450c6f8..020956d5a9 100644
> > --- a/include/hw/virtio/vhost.h
> > +++ b/include/hw/virtio/vhost.h
> > @@ -124,6 +124,7 @@ int vhost_dev_get_config(struct vhost_dev *dev, uint8_t *config,
> >                            uint32_t config_len);
> >   int vhost_dev_set_config(struct vhost_dev *dev, const uint8_t *data,
> >                            uint32_t offset, uint32_t size, uint32_t flags);
> > +int vhost_dev_get_device_id(struct vhost_dev *hdev, uint32_t *device_id);
> >   /* notifier callback in case vhost device config space changed
> >    */
> >   void vhost_dev_set_config_notifier(struct vhost_dev *dev,
>



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 7/9] virito-pci: implement queue_enabled method
  2020-05-08 16:32 ` [RFC v2 7/9] virito-pci: implement " Cindy Lu
  2020-05-09  3:02   ` Jason Wang
@ 2020-05-09 12:01   ` Philippe Mathieu-Daudé
  2020-05-11  5:08     ` Cindy Lu
  1 sibling, 1 reply; 33+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-05-09 12:01 UTC (permalink / raw)
  To: Cindy Lu, mst, armbru, eblake, cohuck, jasowang
  Cc: mhabets, qemu-devel, rob.miller, saugatm, maxime.coquelin, hch,
	eperezma, jgg, shahafs, kevin.tian, parav, vmireyno,
	cunming.liang, gdawar, jiri, xiao.w.wang, stefanha, zhihong.wang,
	aadam, rdunlap, hanand, lingshan.zhu

Typo "virtio-pci" in patch subject.

On 5/8/20 6:32 PM, Cindy Lu wrote:
> From: Jason Wang <jasowang@redhat.com>
> 
> With version 1, we can detect whether a queue is enabled via
> queue_enabled.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>   hw/virtio/virtio-pci.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index c6b47a9c73..4aaf5d953e 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1103,6 +1103,18 @@ static AddressSpace *virtio_pci_get_dma_as(DeviceState *d)
>       return pci_get_address_space(dev);
>   }
>   
> +static bool virtio_pci_queue_enabled(DeviceState *d, int n)
> +{
> +    VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
> +    VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
> +
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> +        return proxy->vqs[vdev->queue_sel].enabled;
> +    }
> +
> +    return virtio_queue_get_desc_addr(vdev, n) != 0;
> +}
> +
>   static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
>                                      struct virtio_pci_cap *cap)
>   {
> @@ -2053,6 +2065,7 @@ static void virtio_pci_bus_class_init(ObjectClass *klass, void *data)
>       k->ioeventfd_enabled = virtio_pci_ioeventfd_enabled;
>       k->ioeventfd_assign = virtio_pci_ioeventfd_assign;
>       k->get_dma_as = virtio_pci_get_dma_as;
> +    k->queue_enabled = virtio_pci_queue_enabled;
>   }
>   
>   static const TypeInfo virtio_pci_bus_info = {
> 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 7/9] virito-pci: implement queue_enabled method
  2020-05-09 12:01   ` Philippe Mathieu-Daudé
@ 2020-05-11  5:08     ` Cindy Lu
  0 siblings, 0 replies; 33+ messages in thread
From: Cindy Lu @ 2020-05-11  5:08 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: rdunlap, Michael Tsirkin, mhabets, qemu-devel, rob.miller,
	saugatm, Markus Armbruster, hch, Eugenio Perez Martin, jgg,
	Jason Wang, shahafs, kevin.tian, parav, vmireyno, Liang, Cunming,
	gdawar, jiri, xiao.w.wang, Stefan Hajnoczi, Wang, Zhihong,
	Maxime Coquelin, Ariel Adam, Cornelia Huck, hanand, Zhu,
	Lingshan

On Sat, May 9, 2020 at 8:02 PM Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
>
> Typo "virtio-pci" in patch subject.
>
Thanks Philippe, I will fix this
> On 5/8/20 6:32 PM, Cindy Lu wrote:
> > From: Jason Wang <jasowang@redhat.com>
> >
> > With version 1, we can detect whether a queue is enabled via
> > queue_enabled.
> >
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >   hw/virtio/virtio-pci.c | 13 +++++++++++++
> >   1 file changed, 13 insertions(+)
> >
> > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > index c6b47a9c73..4aaf5d953e 100644
> > --- a/hw/virtio/virtio-pci.c
> > +++ b/hw/virtio/virtio-pci.c
> > @@ -1103,6 +1103,18 @@ static AddressSpace *virtio_pci_get_dma_as(DeviceState *d)
> >       return pci_get_address_space(dev);
> >   }
> >
> > +static bool virtio_pci_queue_enabled(DeviceState *d, int n)
> > +{
> > +    VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
> > +    VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
> > +
> > +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> > +        return proxy->vqs[vdev->queue_sel].enabled;
> > +    }
> > +
> > +    return virtio_queue_get_desc_addr(vdev, n) != 0;
> > +}
> > +
> >   static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
> >                                      struct virtio_pci_cap *cap)
> >   {
> > @@ -2053,6 +2065,7 @@ static void virtio_pci_bus_class_init(ObjectClass *klass, void *data)
> >       k->ioeventfd_enabled = virtio_pci_ioeventfd_enabled;
> >       k->ioeventfd_assign = virtio_pci_ioeventfd_assign;
> >       k->get_dma_as = virtio_pci_get_dma_as;
> > +    k->queue_enabled = virtio_pci_queue_enabled;
> >   }
> >
> >   static const TypeInfo virtio_pci_bus_info = {
> >
>



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 5/9] vhost-vdpa: implement vhost-vdpa backend
  2020-05-08 16:32 ` [RFC v2 5/9] vhost-vdpa: implement vhost-vdpa backend Cindy Lu
  2020-05-09  3:00   ` Jason Wang
@ 2020-05-21 12:40   ` Stefan Hajnoczi
  2020-05-25 15:33     ` Cindy Lu
  1 sibling, 1 reply; 33+ messages in thread
From: Stefan Hajnoczi @ 2020-05-21 12:40 UTC (permalink / raw)
  To: Cindy Lu
  Cc: rdunlap, mst, mhabets, qemu-devel, rob.miller, saugatm, armbru,
	hch, eperezma, jgg, jasowang, shahafs, kevin.tian, parav,
	vmireyno, cunming.liang, gdawar, jiri, xiao.w.wang, stefanha,
	zhihong.wang, maxime.coquelin, Tiwei Bie, aadam, cohuck, hanand,
	lingshan.zhu

[-- Attachment #1: Type: text/plain, Size: 958 bytes --]

On Sat, May 09, 2020 at 12:32:14AM +0800, Cindy Lu wrote:
> From: Tiwei Bie <tiwei.bie@intel.com>
> 
> Currently we have 2 types of vhost backends in QEMU: vhost kernel and
> vhost-user. The above patch provides a generic device for vDPA purpose,
> this vDPA device exposes to user space a non-vendor-specific configuration
> interface for setting up a vhost HW accelerator, this patch set introduces
> a third vhost backend called vhost-vdpa based on the vDPA interface.
> 
> Vhost-vdpa usage:
> 
>   qemu-system-x86_64 -cpu host -enable-kvm \
>     ......
>   -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-id,id=vhost-vdpa0 \
>   -device virtio-net-pci,netdev=vhost-vdpa0,page-per-vq=on \

I haven't looked at vDPA in depth. What is different here compared to
the existing vhost-backend.c kernel backend?

It seems to be making the same ioctls calls so I wonder if it makes
sense to share the vhost-backend.c kernel code?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 5/9] vhost-vdpa: implement vhost-vdpa backend
  2020-05-21 12:40   ` Stefan Hajnoczi
@ 2020-05-25 15:33     ` Cindy Lu
  2020-05-25 16:15       ` Stefan Hajnoczi
  0 siblings, 1 reply; 33+ messages in thread
From: Cindy Lu @ 2020-05-25 15:33 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: rdunlap, Michael Tsirkin, mhabets, qemu-devel, rob.miller,
	saugatm, Markus Armbruster, hch, Eugenio Perez Martin, jgg,
	Jason Wang, Shahaf Shuler, kevin.tian, parav, vmireyno, Liang,
	Cunming, gdawar, jiri, xiao.w.wang, Stefan Hajnoczi, Wang,
	Zhihong, Maxime Coquelin, Tiwei Bie, Ariel Adam, Cornelia Huck,
	hanand, Zhu, Lingshan

On Thu, May 21, 2020 at 8:40 PM Stefan Hajnoczi <stefanha@gmail.com> wrote:
>
> On Sat, May 09, 2020 at 12:32:14AM +0800, Cindy Lu wrote:
> > From: Tiwei Bie <tiwei.bie@intel.com>
> >
> > Currently we have 2 types of vhost backends in QEMU: vhost kernel and
> > vhost-user. The above patch provides a generic device for vDPA purpose,
> > this vDPA device exposes to user space a non-vendor-specific configuration
> > interface for setting up a vhost HW accelerator, this patch set introduces
> > a third vhost backend called vhost-vdpa based on the vDPA interface.
> >
> > Vhost-vdpa usage:
> >
> >   qemu-system-x86_64 -cpu host -enable-kvm \
> >     ......
> >   -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-id,id=vhost-vdpa0 \
> >   -device virtio-net-pci,netdev=vhost-vdpa0,page-per-vq=on \
>
> I haven't looked at vDPA in depth. What is different here compared to
> the existing vhost-backend.c kernel backend?
>
> It seems to be making the same ioctls calls so I wonder if it makes
> sense to share the vhost-backend.c kernel code?
>
> Stefan
Hi Stefan,
Sorry for the late reply and Thanks for these suggestions.
I think the most difference between vhost kernel and vdpa is vdpa
depends on a real hardware.
The point is that vDPA devices work as a virtio device, but vhost-vdpa
qemu must present a vhost-like device in qemu vhost layer.
The ioctl  calls are similar  with vhost-backend.c now, but after more
and more NIC support vdpa. The difference between vhost-backend.c and
vpda will become more and more big. It will make the code complicated
to share the  code with kernel code.

Thanks
Cindy



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC v2 5/9] vhost-vdpa: implement vhost-vdpa backend
  2020-05-25 15:33     ` Cindy Lu
@ 2020-05-25 16:15       ` Stefan Hajnoczi
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2020-05-25 16:15 UTC (permalink / raw)
  To: Cindy Lu
  Cc: rdunlap, Michael Tsirkin, mhabets, qemu-devel, rob.miller,
	saugatm, Markus Armbruster, Christoph Hellwig,
	Eugenio Perez Martin, jgg, Jason Wang, Shahaf Shuler, Tian,
	Kevin, parav, vmireyno, Liang, Cunming, gdawar, jiri, Wang,
	Xiao W, Stefan Hajnoczi, Wang, Zhihong, Maxime Coquelin,
	Tiwei Bie, Ariel Adam, Cornelia Huck, hanand, Zhu, Lingshan

On Mon, May 25, 2020 at 4:34 PM Cindy Lu <lulu@redhat.com> wrote:
> On Thu, May 21, 2020 at 8:40 PM Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Sat, May 09, 2020 at 12:32:14AM +0800, Cindy Lu wrote:
> > > From: Tiwei Bie <tiwei.bie@intel.com>
> > >
> > > Currently we have 2 types of vhost backends in QEMU: vhost kernel and
> > > vhost-user. The above patch provides a generic device for vDPA purpose,
> > > this vDPA device exposes to user space a non-vendor-specific configuration
> > > interface for setting up a vhost HW accelerator, this patch set introduces
> > > a third vhost backend called vhost-vdpa based on the vDPA interface.
> > >
> > > Vhost-vdpa usage:
> > >
> > >   qemu-system-x86_64 -cpu host -enable-kvm \
> > >     ......
> > >   -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-id,id=vhost-vdpa0 \
> > >   -device virtio-net-pci,netdev=vhost-vdpa0,page-per-vq=on \
> >
> > I haven't looked at vDPA in depth. What is different here compared to
> > the existing vhost-backend.c kernel backend?
> >
> > It seems to be making the same ioctls calls so I wonder if it makes
> > sense to share the vhost-backend.c kernel code?
> >
> > Stefan
> Hi Stefan,
> Sorry for the late reply and Thanks for these suggestions.
> I think the most difference between vhost kernel and vdpa is vdpa
> depends on a real hardware.
> The point is that vDPA devices work as a virtio device, but vhost-vdpa
> qemu must present a vhost-like device in qemu vhost layer.
> The ioctl  calls are similar  with vhost-backend.c now, but after more
> and more NIC support vdpa. The difference between vhost-backend.c and
> vpda will become more and more big. It will make the code complicated
> to share the  code with kernel code.

Okay, thanks.

Stefan


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2020-05-25 16:17 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-08 16:32 [RFC v2 0/9] vDPA support in qemu Cindy Lu
2020-05-08 16:32 ` [RFC v2 1/9] net: introduce qemu_get_peer Cindy Lu
2020-05-08 16:32 ` [RFC v2 2/9] net: use the function qemu_get_peer Cindy Lu
2020-05-09  2:19   ` Jason Wang
2020-05-09  6:49     ` Cindy Lu
2020-05-08 16:32 ` [RFC v2 3/9] virtio_net: introduce vhost_set_state Cindy Lu
2020-05-09  2:25   ` Jason Wang
2020-05-09  7:08     ` Cindy Lu
2020-05-08 16:32 ` [RFC v2 4/9] vhost-vdpa: introduce vhost-vdpa net client Cindy Lu
2020-05-08 16:41   ` Eric Blake
2020-05-09  7:17     ` Cindy Lu
2020-05-09  2:40   ` Jason Wang
2020-05-09  7:31     ` Cindy Lu
2020-05-08 16:32 ` [RFC v2 5/9] vhost-vdpa: implement vhost-vdpa backend Cindy Lu
2020-05-09  3:00   ` Jason Wang
2020-05-09  3:07     ` Jason Wang
2020-05-09  8:14     ` Cindy Lu
2020-05-21 12:40   ` Stefan Hajnoczi
2020-05-25 15:33     ` Cindy Lu
2020-05-25 16:15       ` Stefan Hajnoczi
2020-05-08 16:32 ` [RFC v2 6/9] virtio-bus: introduce queue_enabled method Cindy Lu
2020-05-09  3:01   ` Jason Wang
2020-05-09  6:50     ` Cindy Lu
2020-05-08 16:32 ` [RFC v2 7/9] virito-pci: implement " Cindy Lu
2020-05-09  3:02   ` Jason Wang
2020-05-09 12:01   ` Philippe Mathieu-Daudé
2020-05-11  5:08     ` Cindy Lu
2020-05-08 16:32 ` [RFC v2 8/9] vhost_net: set vq ready during start if necessary Cindy Lu
2020-05-09  3:03   ` Jason Wang
2020-05-09  6:51     ` Cindy Lu
2020-05-08 16:32 ` [RFC v2 9/9] vhost: introduce vhost_set_vring_ready method Cindy Lu
2020-05-09  3:05   ` Jason Wang
2020-05-09  3:10 ` [RFC v2 0/9] vDPA support in qemu Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.