All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2] Add VHOST PMD
@ 2015-08-31  3:55 Tetsuya Mukawa
  2015-08-31  3:55 ` [RFC PATCH v2] vhost: " Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-08-31  3:55 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. To work the PMD correctly, below patches are needed.

 - [PATCH 1/3] vhost: Fix return value of GET_VRING_BASE message
 - [PATCH 2/3] vhost: Fix RESET_OWNER handling not to close callfd
 - [PATCH 3/3] vhost: Fix RESET_OWNER handling not to free virtqueue


PATCH v2 changes:
 - Fix issues reported by checkpatch.pl
   (Thanks to Stephen Hemminger)

Tetsuya Mukawa (1):
  vhost: Add VHOST PMD

 config/common_linuxapp                      |   6 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  61 +++
 drivers/net/vhost/rte_eth_vhost.c           | 640 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_pmd_vhost_version.map |   4 +
 mk/rte.app.mk                               |   8 +-
 6 files changed, 722 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

-- 
2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [RFC PATCH v2] vhost: Add VHOST PMD
  2015-08-31  3:55 [RFC PATCH v2] Add VHOST PMD Tetsuya Mukawa
@ 2015-08-31  3:55 ` Tetsuya Mukawa
  2015-09-23 17:47   ` Loftus, Ciara
                     ` (2 more replies)
  0 siblings, 3 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-08-31  3:55 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The PMD can have 'iface' parameter like below to specify a path to connect
to a virtio-net device.

$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
        <snip>
        -chardev socket,id=chr0,path=/tmp/sock0 \
        -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
        -device virtio-net-pci,netdev=net0

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 config/common_linuxapp                      |   6 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  61 +++
 drivers/net/vhost/rte_eth_vhost.c           | 640 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_pmd_vhost_version.map |   4 +
 mk/rte.app.mk                               |   8 +-
 6 files changed, 722 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0de43d5..7310240 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -446,6 +446,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 
 #
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 5ebf963..e46a38e 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -49,5 +49,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 include $(RTE_SDK)/mk/rte.sharelib.mk
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..018edde
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,61 @@
+#   BSD LICENSE
+#
+#   Copyright (c) 2010-2015 Intel Corporation.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include +=
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..679e893
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,640 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2010-2015 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+
+#define ETH_VHOST_IFACE_ARG		"iface"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+	ETH_VHOST_IFACE_ARG,
+	NULL
+};
+
+static struct ether_addr base_eth_addr = {
+	.addr_bytes = {
+		0x56 /* V */,
+		0x48 /* H */,
+		0x4F /* O */,
+		0x53 /* S */,
+		0x54 /* T */,
+		0x00
+	}
+};
+
+struct vhost_queue {
+	struct virtio_net *device;
+	struct pmd_internal *internal;
+	struct rte_mempool *mb_pool;
+	rte_atomic64_t rx_pkts;
+	rte_atomic64_t tx_pkts;
+	rte_atomic64_t err_pkts;
+	rte_atomic16_t rx_executing;
+	rte_atomic16_t tx_executing;
+};
+
+struct pmd_internal {
+	TAILQ_ENTRY(pmd_internal) next;
+	char *dev_name;
+	char *iface_name;
+	unsigned nb_rx_queues;
+	unsigned nb_tx_queues;
+	rte_atomic16_t xfer;
+
+	struct vhost_queue rx_vhost_queues[RTE_PMD_RING_MAX_RX_RINGS];
+	struct vhost_queue tx_vhost_queues[RTE_PMD_RING_MAX_TX_RINGS];
+};
+
+TAILQ_HEAD(pmd_internal_head, pmd_internal);
+static struct pmd_internal_head internals_list =
+	TAILQ_HEAD_INITIALIZER(internals_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static struct rte_eth_link pmd_link = {
+		.link_speed = 10000,
+		.link_duplex = ETH_LINK_FULL_DUPLEX,
+		.link_status = 0
+};
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t nb_rx = 0;
+
+	if (unlikely(r->internal == NULL))
+		return 0;
+
+	if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
+		return 0;
+
+	rte_atomic16_set(&r->rx_executing, 1);
+
+	if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
+		goto out;
+
+	nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
+			VIRTIO_TXQ, r->mb_pool, bufs, nb_bufs);
+
+	rte_atomic64_add(&(r->rx_pkts), nb_rx);
+
+out:
+	rte_atomic16_set(&r->rx_executing, 0);
+
+	return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_tx = 0;
+
+	if (unlikely(r->internal == NULL))
+		return 0;
+
+	if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
+		return 0;
+
+	rte_atomic16_set(&r->tx_executing, 1);
+
+	if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
+		goto out;
+
+	nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
+			VIRTIO_RXQ, bufs, nb_bufs);
+
+	rte_atomic64_add(&(r->tx_pkts), nb_tx);
+	rte_atomic64_add(&(r->err_pkts), nb_bufs - nb_tx);
+
+	for (i = 0; likely(i < nb_tx); i++)
+		rte_pktmbuf_free(bufs[i]);
+
+out:
+	rte_atomic16_set(&r->tx_executing, 0);
+
+	return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { return 0; }
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	return rte_vhost_driver_register(internal->iface_name);
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	rte_vhost_driver_unregister(internal->iface_name);
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc __rte_unused,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	internal->rx_vhost_queues[rx_queue_id].mb_pool = mb_pool;
+	dev->data->rx_queues[rx_queue_id] = &internal->rx_vhost_queues[rx_queue_id];
+	return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	dev->data->tx_queues[tx_queue_id] = &internal->tx_vhost_queues[tx_queue_id];
+	return 0;
+}
+
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+	     struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	dev_info->driver_name = drivername;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = (uint32_t)-1;
+	dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
+	dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
+	dev_info->min_rx_bufsize = 0;
+	dev_info->pci_dev = dev->pci_dev;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+	unsigned i;
+	unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
+	const struct pmd_internal *internal = dev->data->dev_private;
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+	     i < internal->nb_rx_queues; i++) {
+		igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i].rx_pkts.cnt;
+		rx_total += igb_stats->q_ipackets[i];
+	}
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+	     i < internal->nb_tx_queues; i++) {
+		igb_stats->q_opackets[i] = internal->tx_vhost_queues[i].tx_pkts.cnt;
+		igb_stats->q_errors[i] = internal->tx_vhost_queues[i].err_pkts.cnt;
+		tx_total += igb_stats->q_opackets[i];
+		tx_err_total += igb_stats->q_errors[i];
+	}
+
+	igb_stats->ipackets = rx_total;
+	igb_stats->opackets = tx_total;
+	igb_stats->oerrors = tx_err_total;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	unsigned i;
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	for (i = 0; i < internal->nb_rx_queues; i++)
+		internal->rx_vhost_queues[i].rx_pkts.cnt = 0;
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		internal->tx_vhost_queues[i].tx_pkts.cnt = 0;
+		internal->tx_vhost_queues[i].err_pkts.cnt = 0;
+	}
+}
+
+static void
+eth_queue_release(void *q __rte_unused) { ; }
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused) { return 0; }
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+static struct eth_driver rte_vhost_pmd = {
+	.pci_drv = {
+		.name = "rte_vhost_pmd",
+		.drv_flags = RTE_PCI_DRV_DETACHABLE,
+	},
+};
+
+static struct rte_pci_id id_table;
+
+static inline struct pmd_internal *
+find_internal_resource(char *ifname)
+{
+	int found = 0;
+	struct pmd_internal *internal;
+
+	if (ifname == NULL)
+		return NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(internal, &internals_list, next) {
+		if (!strcmp(internal->iface_name, ifname)) {
+			found = 1;
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	if (!found)
+		return NULL;
+
+	return internal;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "invalid argument\n");
+		return -1;
+	}
+
+	internal = find_internal_resource(dev->ifname);
+	if (internal == NULL) {
+		RTE_LOG(INFO, PMD, "invalid device name\n");
+		return -1;
+	}
+
+	eth_dev = rte_eth_dev_allocated(internal->dev_name);
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "failuer to find ethdev\n");
+		return -1;
+	}
+
+	internal = eth_dev->data->dev_private;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = &internal->rx_vhost_queues[i];
+		vq->device = dev;
+		vq->internal = internal;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = &internal->tx_vhost_queues[i];
+		vq->device = dev;
+		vq->internal = internal;
+	}
+
+	dev->flags |= VIRTIO_DEV_RUNNING;
+	dev->priv = eth_dev;
+
+	eth_dev->data->dev_link.link_status = 1;
+	rte_atomic16_set(&internal->xfer, 1);
+
+	RTE_LOG(INFO, PMD, "New connection established\n");
+
+	return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "invalid argument\n");
+		return;
+	}
+
+	eth_dev = (struct rte_eth_dev *)dev->priv;
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "failuer to find a ethdev\n");
+		return;
+	}
+
+	internal = eth_dev->data->dev_private;
+
+	/* Wait until rx/tx_pkt_burst stops accesing vhost device */
+	rte_atomic16_set(&internal->xfer, 0);
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = &internal->rx_vhost_queues[i];
+		while (rte_atomic16_read(&vq->rx_executing))
+			rte_pause();
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = &internal->tx_vhost_queues[i];
+		while (rte_atomic16_read(&vq->tx_executing))
+			rte_pause();
+	}
+
+	eth_dev->data->dev_link.link_status = 0;
+
+	dev->priv = NULL;
+	dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = &internal->rx_vhost_queues[i];
+		vq->device = NULL;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = &internal->tx_vhost_queues[i];
+		vq->device = NULL;
+	}
+
+	RTE_LOG(INFO, PMD, "Connection closed\n");
+}
+
+static void *vhost_driver_session(void *param __rte_unused)
+{
+	static struct virtio_net_device_ops *vhost_ops;
+
+	vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
+	if (vhost_ops == NULL)
+		rte_panic("Can't allocate memory\n");
+
+	/* set vhost arguments */
+	vhost_ops->new_device = new_device;
+	vhost_ops->destroy_device = destroy_device;
+	if (rte_vhost_driver_callback_register(vhost_ops) < 0)
+		rte_panic("Can't register callbacks\n");
+
+	/* start event handling */
+	rte_vhost_driver_session_start();
+
+	rte_free(vhost_ops);
+	pthread_exit(0);
+}
+
+static pthread_once_t once_cont = PTHREAD_ONCE_INIT;
+static pthread_t session_th;
+
+static void vhost_driver_session_start(void)
+{
+	int ret;
+
+	ret = pthread_create(&session_th, NULL, vhost_driver_session, NULL);
+	if (ret)
+		rte_panic("Can't create a thread\n");
+}
+
+static int
+eth_dev_vhost_create(const char *name, int index,
+		     char *iface_name,
+		     const unsigned numa_node)
+{
+	struct rte_eth_dev_data *data = NULL;
+	struct rte_pci_device *pci_dev = NULL;
+	struct pmd_internal *internal = NULL;
+	struct rte_eth_dev *eth_dev = NULL;
+	struct ether_addr *eth_addr = NULL;
+	uint16_t nb_rx_queues = 1;
+	uint16_t nb_tx_queues = 1;
+
+	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+		numa_node);
+
+	/* now do all data allocation - for eth_dev structure, dummy pci driver
+	 * and internal (private) data
+	 */
+	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+	if (data == NULL)
+		goto error;
+
+	pci_dev = rte_zmalloc_socket(name, sizeof(*pci_dev), 0, numa_node);
+	if (pci_dev == NULL)
+		goto error;
+
+	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+	if (internal == NULL)
+		goto error;
+
+	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+	if (eth_addr == NULL)
+		goto error;
+	*eth_addr = base_eth_addr;
+	eth_addr->addr_bytes[5] = index;
+
+	/* reserve an ethdev entry */
+	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+	if (eth_dev == NULL)
+		goto error;
+
+	/* now put it all together
+	 * - store queue data in internal,
+	 * - store numa_node info in pci_driver
+	 * - point eth_dev_data to internal and pci_driver
+	 * - and point eth_dev structure to new eth_dev_data structure
+	 */
+	internal->nb_rx_queues = nb_rx_queues;
+	internal->nb_tx_queues = nb_tx_queues;
+	internal->dev_name = strdup(name);
+	if (internal->dev_name == NULL)
+		goto error;
+	internal->iface_name = strdup(iface_name);
+	if (internal->iface_name == NULL)
+		goto error;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internals_list, internal, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	rte_vhost_pmd.pci_drv.name = drivername;
+	rte_vhost_pmd.pci_drv.id_table = &id_table;
+
+	pci_dev->numa_node = numa_node;
+	pci_dev->driver = &rte_vhost_pmd.pci_drv;
+
+	data->dev_private = internal;
+	data->port_id = eth_dev->data->port_id;
+	memmove(data->name, eth_dev->data->name, sizeof(data->name));
+	data->nb_rx_queues = (uint16_t)nb_rx_queues;
+	data->nb_tx_queues = (uint16_t)nb_tx_queues;
+	data->dev_link = pmd_link;
+	data->mac_addrs = eth_addr;
+
+	/* We'll replace the 'data' originally allocated by eth_dev. So the
+	 * vhost PMD resources won't be shared between multi processes.
+	 */
+	eth_dev->data = data;
+	eth_dev->driver = &rte_vhost_pmd;
+	eth_dev->dev_ops = &ops;
+	eth_dev->pci_dev = pci_dev;
+
+	/* finally assign rx and tx ops */
+	eth_dev->rx_pkt_burst = eth_vhost_rx;
+	eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+	/* start vhost driver session. It should be called only once */
+	pthread_once(&once_cont, vhost_driver_session_start);
+
+	return data->port_id;
+
+error:
+	rte_free(data);
+	rte_free(pci_dev);
+	rte_free(internal);
+	rte_free(eth_addr);
+
+	return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	const char **iface_name = extra_args;
+
+	if (value == NULL)
+		return -1;
+
+	*iface_name = value;
+
+	return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
+	int index;
+	char *iface_name;
+
+	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+		return -1;
+
+	if (strlen(name) < strlen("eth_vhost"))
+		return -1;
+
+	index = strtol(name + strlen("eth_vhost"), NULL, 0);
+	if (errno == ERANGE)
+		return -1;
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+					 &open_iface, &iface_name);
+		if (ret < 0)
+			goto out_free;
+
+		eth_dev_vhost_create(name, index, iface_name, rte_socket_id());
+	}
+
+out_free:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internal *internal;
+
+	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+	if (name == NULL)
+		return -EINVAL;
+
+	/* find an ethdev entry */
+	eth_dev = rte_eth_dev_allocated(name);
+	if (eth_dev == NULL)
+		return -ENODEV;
+
+	internal = eth_dev->data->dev_private;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_REMOVE(&internals_list, internal, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	eth_dev_stop(eth_dev);
+
+	if ((internal) && (internal->dev_name))
+		free(internal->dev_name);
+	if ((internal) && (internal->iface_name))
+		free(internal->iface_name);
+	rte_free(eth_dev->data->dev_private);
+	rte_free(eth_dev->data);
+	rte_free(eth_dev->pci_dev);
+
+	rte_eth_dev_release_port(eth_dev);
+	return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+	.name = "eth_vhost",
+	.type = PMD_VDEV,
+	.init = rte_pmd_vhost_devinit,
+	.uninit = rte_pmd_vhost_devuninit,
+};
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..5151684
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,4 @@
+DPDK_2.2 {
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 3871205..1c42fb1 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -144,7 +144,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP)       += -lrte_pmd_pcap
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL)       += -lrte_pmd_null
 
-endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
+
+endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
+
+endif # $(CONFIG_RTE_BUILD_SHARED_LIB)
 
 endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v2] vhost: Add VHOST PMD
  2015-08-31  3:55 ` [RFC PATCH v2] vhost: " Tetsuya Mukawa
@ 2015-09-23 17:47   ` Loftus, Ciara
  2015-10-16  8:40     ` Tetsuya Mukawa
  2015-10-16 12:52   ` Bruce Richardson
  2015-10-22  9:45   ` [RFC PATCH v3 0/2] " Tetsuya Mukawa
  2 siblings, 1 reply; 200+ messages in thread
From: Loftus, Ciara @ 2015-09-23 17:47 UTC (permalink / raw)
  To: Tetsuya Mukawa, dev; +Cc: ann.zhuangyanying

> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> The PMD can have 'iface' parameter like below to specify a path to connect
> to a virtio-net device.
> 
> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
> 
> To connect above testpmd, here is qemu command example.
> 
> $ qemu-system-x86_64 \
>         <snip>
>         -chardev socket,id=chr0,path=/tmp/sock0 \
>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
>         -device virtio-net-pci,netdev=net0
> 
> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> ---
>  config/common_linuxapp                      |   6 +
>  drivers/net/Makefile                        |   4 +
>  drivers/net/vhost/Makefile                  |  61 +++
>  drivers/net/vhost/rte_eth_vhost.c           | 640
> ++++++++++++++++++++++++++++
>  drivers/net/vhost/rte_pmd_vhost_version.map |   4 +
>  mk/rte.app.mk                               |   8 +-
>  6 files changed, 722 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/vhost/Makefile
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
> 
> diff --git a/config/common_linuxapp b/config/common_linuxapp
> index 0de43d5..7310240 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -446,6 +446,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
>  CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
> 
>  #
> +# Compile vhost PMD
> +# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
> +#
> +CONFIG_RTE_LIBRTE_PMD_VHOST=y
> +
> +#
>  #Compile Xen domain0 support
>  #
>  CONFIG_RTE_LIBRTE_XEN_DOM0=n
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index 5ebf963..e46a38e 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -49,5 +49,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
>  DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
> 
> +ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
> +endif # $(CONFIG_RTE_LIBRTE_VHOST)
> +
>  include $(RTE_SDK)/mk/rte.sharelib.mk
>  include $(RTE_SDK)/mk/rte.subdir.mk
> diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
> new file mode 100644
> index 0000000..018edde
> --- /dev/null
> +++ b/drivers/net/vhost/Makefile
> @@ -0,0 +1,61 @@
> +#   BSD LICENSE
> +#
> +#   Copyright (c) 2010-2015 Intel Corporation.
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +#     * Redistributions of source code must retain the above copyright
> +#       notice, this list of conditions and the following disclaimer.
> +#     * Redistributions in binary form must reproduce the above copyright
> +#       notice, this list of conditions and the following disclaimer in
> +#       the documentation and/or other materials provided with the
> +#       distribution.
> +#     * Neither the name of Intel corporation nor the names of its
> +#       contributors may be used to endorse or promote products derived
> +#       from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +#
> +# library name
> +#
> +LIB = librte_pmd_vhost.a
> +
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS)
> +
> +EXPORT_MAP := rte_pmd_vhost_version.map
> +
> +LIBABIVER := 1
> +
> +#
> +# all source are stored in SRCS-y
> +#
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
> +
> +#
> +# Export include files
> +#
> +SYMLINK-y-include +=
> +
> +# this lib depends upon:
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/drivers/net/vhost/rte_eth_vhost.c
> b/drivers/net/vhost/rte_eth_vhost.c
> new file mode 100644
> index 0000000..679e893
> --- /dev/null
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -0,0 +1,640 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright (c) 2010-2015 Intel Corporation.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> + */
> +#include <unistd.h>
> +#include <pthread.h>
> +
> +#include <rte_mbuf.h>
> +#include <rte_ethdev.h>
> +#include <rte_malloc.h>
> +#include <rte_memcpy.h>
> +#include <rte_dev.h>
> +#include <rte_kvargs.h>
> +#include <rte_virtio_net.h>
> +
> +#define ETH_VHOST_IFACE_ARG		"iface"
> +
> +static const char *drivername = "VHOST PMD";
> +
> +static const char *valid_arguments[] = {
> +	ETH_VHOST_IFACE_ARG,
> +	NULL
> +};
> +
> +static struct ether_addr base_eth_addr = {
> +	.addr_bytes = {
> +		0x56 /* V */,
> +		0x48 /* H */,
> +		0x4F /* O */,
> +		0x53 /* S */,
> +		0x54 /* T */,
> +		0x00
> +	}
> +};
> +
> +struct vhost_queue {
> +	struct virtio_net *device;
> +	struct pmd_internal *internal;
> +	struct rte_mempool *mb_pool;
> +	rte_atomic64_t rx_pkts;
> +	rte_atomic64_t tx_pkts;
> +	rte_atomic64_t err_pkts;
> +	rte_atomic16_t rx_executing;
> +	rte_atomic16_t tx_executing;
> +};
> +
> +struct pmd_internal {
> +	TAILQ_ENTRY(pmd_internal) next;
> +	char *dev_name;
> +	char *iface_name;
> +	unsigned nb_rx_queues;
> +	unsigned nb_tx_queues;
> +	rte_atomic16_t xfer;
Is this flag just used to indicate the state of the virtio_net device?
Ie. if =0 then virtio_dev=NULL and if =1 then virtio_net !=NULL & the VIRTIO_DEV_RUNNING flag is set?

> +
> +	struct vhost_queue
> rx_vhost_queues[RTE_PMD_RING_MAX_RX_RINGS];
> +	struct vhost_queue
> tx_vhost_queues[RTE_PMD_RING_MAX_TX_RINGS];
> +};
> +
> +TAILQ_HEAD(pmd_internal_head, pmd_internal);
> +static struct pmd_internal_head internals_list =
> +	TAILQ_HEAD_INITIALIZER(internals_list);
> +
> +static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
> +
> +static struct rte_eth_link pmd_link = {
> +		.link_speed = 10000,
> +		.link_duplex = ETH_LINK_FULL_DUPLEX,
> +		.link_status = 0
> +};
> +
> +static uint16_t
> +eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
> +	struct vhost_queue *r = q;
> +	uint16_t nb_rx = 0;
> +
> +	if (unlikely(r->internal == NULL))
> +		return 0;
> +
> +	if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
> +		return 0;
> +
> +	rte_atomic16_set(&r->rx_executing, 1);
> +
> +	if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
> +		goto out;
> +
> +	nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
> +			VIRTIO_TXQ, r->mb_pool, bufs, nb_bufs);
> +
> +	rte_atomic64_add(&(r->rx_pkts), nb_rx);
> +
> +out:
> +	rte_atomic16_set(&r->rx_executing, 0);
> +
> +	return nb_rx;
> +}
> +
> +static uint16_t
> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
> +	struct vhost_queue *r = q;
> +	uint16_t i, nb_tx = 0;
> +
> +	if (unlikely(r->internal == NULL))
> +		return 0;
> +
> +	if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
> +		return 0;
> +
> +	rte_atomic16_set(&r->tx_executing, 1);
> +
> +	if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
> +		goto out;
> +
> +	nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
> +			VIRTIO_RXQ, bufs, nb_bufs);
> +
> +	rte_atomic64_add(&(r->tx_pkts), nb_tx);
> +	rte_atomic64_add(&(r->err_pkts), nb_bufs - nb_tx);
> +
> +	for (i = 0; likely(i < nb_tx); i++)
> +		rte_pktmbuf_free(bufs[i]);

We may not always want to free these mbufs. For example, if a call is made to rte_eth_tx_burst with buffers from another (non DPDK) source, they may not be ours to free.

> +
> +out:
> +	rte_atomic16_set(&r->tx_executing, 0);
> +
> +	return nb_tx;
> +}
> +
> +static int
> +eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { return 0; }
> +
> +static int
> +eth_dev_start(struct rte_eth_dev *dev)
> +{
> +	struct pmd_internal *internal = dev->data->dev_private;
> +
> +	return rte_vhost_driver_register(internal->iface_name);
> +}
> +
> +static void
> +eth_dev_stop(struct rte_eth_dev *dev)
> +{
> +	struct pmd_internal *internal = dev->data->dev_private;
> +
> +	rte_vhost_driver_unregister(internal->iface_name);
> +}
> +
> +static int
> +eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
> +		   uint16_t nb_rx_desc __rte_unused,
> +		   unsigned int socket_id __rte_unused,
> +		   const struct rte_eth_rxconf *rx_conf __rte_unused,
> +		   struct rte_mempool *mb_pool)
> +{
> +	struct pmd_internal *internal = dev->data->dev_private;
> +
> +	internal->rx_vhost_queues[rx_queue_id].mb_pool = mb_pool;
> +	dev->data->rx_queues[rx_queue_id] = &internal-
> >rx_vhost_queues[rx_queue_id];
> +	return 0;
> +}
> +
> +static int
> +eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
> +		   uint16_t nb_tx_desc __rte_unused,
> +		   unsigned int socket_id __rte_unused,
> +		   const struct rte_eth_txconf *tx_conf __rte_unused)
> +{
> +	struct pmd_internal *internal = dev->data->dev_private;
> +
> +	dev->data->tx_queues[tx_queue_id] = &internal-
> >tx_vhost_queues[tx_queue_id];
> +	return 0;
> +}
> +
> +
> +static void
> +eth_dev_info(struct rte_eth_dev *dev,
> +	     struct rte_eth_dev_info *dev_info)
> +{
> +	struct pmd_internal *internal = dev->data->dev_private;
> +
> +	dev_info->driver_name = drivername;
> +	dev_info->max_mac_addrs = 1;
> +	dev_info->max_rx_pktlen = (uint32_t)-1;
> +	dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
> +	dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
> +	dev_info->min_rx_bufsize = 0;
> +	dev_info->pci_dev = dev->pci_dev;
> +}
> +
> +static void
> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
> +{
> +	unsigned i;
> +	unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
> +	const struct pmd_internal *internal = dev->data->dev_private;
> +
> +	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
> +	     i < internal->nb_rx_queues; i++) {
> +		igb_stats->q_ipackets[i] = internal-
> >rx_vhost_queues[i].rx_pkts.cnt;
> +		rx_total += igb_stats->q_ipackets[i];
> +	}
> +
> +	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
> +	     i < internal->nb_tx_queues; i++) {
> +		igb_stats->q_opackets[i] = internal-
> >tx_vhost_queues[i].tx_pkts.cnt;
> +		igb_stats->q_errors[i] = internal-
> >tx_vhost_queues[i].err_pkts.cnt;
> +		tx_total += igb_stats->q_opackets[i];
> +		tx_err_total += igb_stats->q_errors[i];
> +	}
> +
> +	igb_stats->ipackets = rx_total;
> +	igb_stats->opackets = tx_total;
> +	igb_stats->oerrors = tx_err_total;
> +}
> +
> +static void
> +eth_stats_reset(struct rte_eth_dev *dev)
> +{
> +	unsigned i;
> +	struct pmd_internal *internal = dev->data->dev_private;
> +
> +	for (i = 0; i < internal->nb_rx_queues; i++)
> +		internal->rx_vhost_queues[i].rx_pkts.cnt = 0;
> +	for (i = 0; i < internal->nb_tx_queues; i++) {
> +		internal->tx_vhost_queues[i].tx_pkts.cnt = 0;
> +		internal->tx_vhost_queues[i].err_pkts.cnt = 0;
> +	}
> +}
> +
> +static void
> +eth_queue_release(void *q __rte_unused) { ; }
> +static int
> +eth_link_update(struct rte_eth_dev *dev __rte_unused,
> +		int wait_to_complete __rte_unused) { return 0; }
> +
> +static const struct eth_dev_ops ops = {
> +	.dev_start = eth_dev_start,
> +	.dev_stop = eth_dev_stop,
> +	.dev_configure = eth_dev_configure,
> +	.dev_infos_get = eth_dev_info,
> +	.rx_queue_setup = eth_rx_queue_setup,
> +	.tx_queue_setup = eth_tx_queue_setup,
> +	.rx_queue_release = eth_queue_release,
> +	.tx_queue_release = eth_queue_release,
> +	.link_update = eth_link_update,
> +	.stats_get = eth_stats_get,
> +	.stats_reset = eth_stats_reset,
> +};
> +
> +static struct eth_driver rte_vhost_pmd = {
> +	.pci_drv = {
> +		.name = "rte_vhost_pmd",
> +		.drv_flags = RTE_PCI_DRV_DETACHABLE,
> +	},
> +};
> +
> +static struct rte_pci_id id_table;
> +
> +static inline struct pmd_internal *
> +find_internal_resource(char *ifname)
> +{
> +	int found = 0;
> +	struct pmd_internal *internal;
> +
> +	if (ifname == NULL)
> +		return NULL;
> +
> +	pthread_mutex_lock(&internal_list_lock);
> +
> +	TAILQ_FOREACH(internal, &internals_list, next) {
> +		if (!strcmp(internal->iface_name, ifname)) {
> +			found = 1;
> +			break;
> +		}
> +	}
> +
> +	pthread_mutex_unlock(&internal_list_lock);
> +
> +	if (!found)
> +		return NULL;
> +
> +	return internal;
> +}
> +
> +static int
> +new_device(struct virtio_net *dev)
> +{
> +	struct rte_eth_dev *eth_dev;
> +	struct pmd_internal *internal;
> +	struct vhost_queue *vq;
> +	unsigned i;
> +
> +	if (dev == NULL) {
> +		RTE_LOG(INFO, PMD, "invalid argument\n");
> +		return -1;
> +	}
> +
> +	internal = find_internal_resource(dev->ifname);
> +	if (internal == NULL) {
> +		RTE_LOG(INFO, PMD, "invalid device name\n");
> +		return -1;
> +	}
> +
> +	eth_dev = rte_eth_dev_allocated(internal->dev_name);
> +	if (eth_dev == NULL) {
> +		RTE_LOG(INFO, PMD, "failuer to find ethdev\n");
Typo: Failure. Same for the destroy_device function

> +		return -1;
> +	}
> +
> +	internal = eth_dev->data->dev_private;
> +
> +	for (i = 0; i < internal->nb_rx_queues; i++) {
> +		vq = &internal->rx_vhost_queues[i];
> +		vq->device = dev;
> +		vq->internal = internal;
> +	}
> +	for (i = 0; i < internal->nb_tx_queues; i++) {
> +		vq = &internal->tx_vhost_queues[i];
> +		vq->device = dev;
> +		vq->internal = internal;
> +	}
> +
> +	dev->flags |= VIRTIO_DEV_RUNNING;
> +	dev->priv = eth_dev;
> +
> +	eth_dev->data->dev_link.link_status = 1;
> +	rte_atomic16_set(&internal->xfer, 1);
> +
> +	RTE_LOG(INFO, PMD, "New connection established\n");
> +
> +	return 0;

Some freedom is taken away if the new_device and destroy_device callbacks are implemented in the driver.
For example if one wishes to  call the rte_vhost_enable_guest_notification function when a new device is brought up. They cannot now as there is no scope to modify these callbacks, as is done in for example the vHost sample app. Is this correct?

> +}
> +
> +static void
> +destroy_device(volatile struct virtio_net *dev)
> +{
> +	struct rte_eth_dev *eth_dev;
> +	struct pmd_internal *internal;
> +	struct vhost_queue *vq;
> +	unsigned i;
> +
> +	if (dev == NULL) {
> +		RTE_LOG(INFO, PMD, "invalid argument\n");
> +		return;
> +	}
> +
> +	eth_dev = (struct rte_eth_dev *)dev->priv;
> +	if (eth_dev == NULL) {
> +		RTE_LOG(INFO, PMD, "failuer to find a ethdev\n");
> +		return;
> +	}
> +
> +	internal = eth_dev->data->dev_private;
> +
> +	/* Wait until rx/tx_pkt_burst stops accesing vhost device */
> +	rte_atomic16_set(&internal->xfer, 0);
> +	for (i = 0; i < internal->nb_rx_queues; i++) {
> +		vq = &internal->rx_vhost_queues[i];
> +		while (rte_atomic16_read(&vq->rx_executing))
> +			rte_pause();
> +	}
> +	for (i = 0; i < internal->nb_tx_queues; i++) {
> +		vq = &internal->tx_vhost_queues[i];
> +		while (rte_atomic16_read(&vq->tx_executing))
> +			rte_pause();
> +	}
> +
> +	eth_dev->data->dev_link.link_status = 0;
> +
> +	dev->priv = NULL;
> +	dev->flags &= ~VIRTIO_DEV_RUNNING;
> +
> +	for (i = 0; i < internal->nb_rx_queues; i++) {
> +		vq = &internal->rx_vhost_queues[i];
> +		vq->device = NULL;
> +	}
> +	for (i = 0; i < internal->nb_tx_queues; i++) {
> +		vq = &internal->tx_vhost_queues[i];
> +		vq->device = NULL;
> +	}
> +
> +	RTE_LOG(INFO, PMD, "Connection closed\n");
> +}
> +
> +static void *vhost_driver_session(void *param __rte_unused)
> +{
> +	static struct virtio_net_device_ops *vhost_ops;
> +
> +	vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
> +	if (vhost_ops == NULL)
> +		rte_panic("Can't allocate memory\n");
> +
> +	/* set vhost arguments */
> +	vhost_ops->new_device = new_device;
> +	vhost_ops->destroy_device = destroy_device;
> +	if (rte_vhost_driver_callback_register(vhost_ops) < 0)
> +		rte_panic("Can't register callbacks\n");
> +
> +	/* start event handling */
> +	rte_vhost_driver_session_start();
> +
> +	rte_free(vhost_ops);
> +	pthread_exit(0);
> +}
> +
> +static pthread_once_t once_cont = PTHREAD_ONCE_INIT;
> +static pthread_t session_th;
> +
> +static void vhost_driver_session_start(void)
> +{
> +	int ret;
> +
> +	ret = pthread_create(&session_th, NULL, vhost_driver_session,
> NULL);
> +	if (ret)
> +		rte_panic("Can't create a thread\n");
> +}
> +
> +static int
> +eth_dev_vhost_create(const char *name, int index,
> +		     char *iface_name,
> +		     const unsigned numa_node)
> +{
> +	struct rte_eth_dev_data *data = NULL;
> +	struct rte_pci_device *pci_dev = NULL;
> +	struct pmd_internal *internal = NULL;
> +	struct rte_eth_dev *eth_dev = NULL;
> +	struct ether_addr *eth_addr = NULL;
> +	uint16_t nb_rx_queues = 1;
> +	uint16_t nb_tx_queues = 1;
> +
> +	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa
> socket %u\n",
> +		numa_node);
> +
> +	/* now do all data allocation - for eth_dev structure, dummy pci
> driver
> +	 * and internal (private) data
> +	 */
> +	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
> +	if (data == NULL)
> +		goto error;
> +
> +	pci_dev = rte_zmalloc_socket(name, sizeof(*pci_dev), 0,
> numa_node);
> +	if (pci_dev == NULL)
> +		goto error;
> +
> +	internal = rte_zmalloc_socket(name, sizeof(*internal), 0,
> numa_node);
> +	if (internal == NULL)
> +		goto error;
> +
> +	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0,
> numa_node);
> +	if (eth_addr == NULL)
> +		goto error;
> +	*eth_addr = base_eth_addr;
> +	eth_addr->addr_bytes[5] = index;
> +
> +	/* reserve an ethdev entry */
> +	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
> +	if (eth_dev == NULL)
> +		goto error;
> +
> +	/* now put it all together
> +	 * - store queue data in internal,
> +	 * - store numa_node info in pci_driver
> +	 * - point eth_dev_data to internal and pci_driver
> +	 * - and point eth_dev structure to new eth_dev_data structure
> +	 */
> +	internal->nb_rx_queues = nb_rx_queues;
> +	internal->nb_tx_queues = nb_tx_queues;
> +	internal->dev_name = strdup(name);
> +	if (internal->dev_name == NULL)
> +		goto error;
> +	internal->iface_name = strdup(iface_name);
> +	if (internal->iface_name == NULL)
> +		goto error;
> +
> +	pthread_mutex_lock(&internal_list_lock);
> +	TAILQ_INSERT_TAIL(&internals_list, internal, next);
> +	pthread_mutex_unlock(&internal_list_lock);
> +
> +	rte_vhost_pmd.pci_drv.name = drivername;
> +	rte_vhost_pmd.pci_drv.id_table = &id_table;
> +
> +	pci_dev->numa_node = numa_node;
> +	pci_dev->driver = &rte_vhost_pmd.pci_drv;
> +
> +	data->dev_private = internal;
> +	data->port_id = eth_dev->data->port_id;
> +	memmove(data->name, eth_dev->data->name, sizeof(data-
> >name));
> +	data->nb_rx_queues = (uint16_t)nb_rx_queues;
> +	data->nb_tx_queues = (uint16_t)nb_tx_queues;
> +	data->dev_link = pmd_link;
> +	data->mac_addrs = eth_addr;
> +
> +	/* We'll replace the 'data' originally allocated by eth_dev. So the
> +	 * vhost PMD resources won't be shared between multi processes.
> +	 */
> +	eth_dev->data = data;
> +	eth_dev->driver = &rte_vhost_pmd;
> +	eth_dev->dev_ops = &ops;
> +	eth_dev->pci_dev = pci_dev;
> +
> +	/* finally assign rx and tx ops */
> +	eth_dev->rx_pkt_burst = eth_vhost_rx;
> +	eth_dev->tx_pkt_burst = eth_vhost_tx;
> +
> +	/* start vhost driver session. It should be called only once */
> +	pthread_once(&once_cont, vhost_driver_session_start);
> +
> +	return data->port_id;
> +
> +error:
> +	rte_free(data);
> +	rte_free(pci_dev);
> +	rte_free(internal);
> +	rte_free(eth_addr);
> +
> +	return -1;
> +}
> +
> +static inline int
> +open_iface(const char *key __rte_unused, const char *value, void
> *extra_args)
> +{
> +	const char **iface_name = extra_args;
> +
> +	if (value == NULL)
> +		return -1;
> +
> +	*iface_name = value;
> +
> +	return 0;
> +}
> +
> +static int
> +rte_pmd_vhost_devinit(const char *name, const char *params)
> +{
> +	struct rte_kvargs *kvlist = NULL;
> +	int ret = 0;
> +	int index;
> +	char *iface_name;
> +
> +	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
> +
> +	kvlist = rte_kvargs_parse(params, valid_arguments);
> +	if (kvlist == NULL)
> +		return -1;
> +
> +	if (strlen(name) < strlen("eth_vhost"))
> +		return -1;
> +
> +	index = strtol(name + strlen("eth_vhost"), NULL, 0);
> +	if (errno == ERANGE)
> +		return -1;
> +
> +	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
> +		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
> +					 &open_iface, &iface_name);
> +		if (ret < 0)
> +			goto out_free;
> +
> +		eth_dev_vhost_create(name, index, iface_name,
> rte_socket_id());
> +	}
> +
> +out_free:
> +	rte_kvargs_free(kvlist);
> +	return ret;
> +}
> +
> +static int
> +rte_pmd_vhost_devuninit(const char *name)
> +{
> +	struct rte_eth_dev *eth_dev = NULL;
> +	struct pmd_internal *internal;
> +
> +	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
> +
> +	if (name == NULL)
> +		return -EINVAL;
> +
> +	/* find an ethdev entry */
> +	eth_dev = rte_eth_dev_allocated(name);
> +	if (eth_dev == NULL)
> +		return -ENODEV;
> +
> +	internal = eth_dev->data->dev_private;
> +
> +	pthread_mutex_lock(&internal_list_lock);
> +	TAILQ_REMOVE(&internals_list, internal, next);
> +	pthread_mutex_unlock(&internal_list_lock);
> +
> +	eth_dev_stop(eth_dev);
> +
> +	if ((internal) && (internal->dev_name))
> +		free(internal->dev_name);
> +	if ((internal) && (internal->iface_name))
> +		free(internal->iface_name);
> +	rte_free(eth_dev->data->dev_private);
> +	rte_free(eth_dev->data);
> +	rte_free(eth_dev->pci_dev);
> +
> +	rte_eth_dev_release_port(eth_dev);
> +	return 0;
> +}
> +
> +static struct rte_driver pmd_vhost_drv = {
> +	.name = "eth_vhost",
> +	.type = PMD_VDEV,
> +	.init = rte_pmd_vhost_devinit,
> +	.uninit = rte_pmd_vhost_devuninit,
> +};
> +
> +PMD_REGISTER_DRIVER(pmd_vhost_drv);
> diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map
> b/drivers/net/vhost/rte_pmd_vhost_version.map
> new file mode 100644
> index 0000000..5151684
> --- /dev/null
> +++ b/drivers/net/vhost/rte_pmd_vhost_version.map
> @@ -0,0 +1,4 @@
> +DPDK_2.2 {
> +
> +	local: *;
> +};
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index 3871205..1c42fb1 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -144,7 +144,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP)       += -
> lrte_pmd_pcap
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -
> lrte_pmd_af_packet
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL)       += -lrte_pmd_null
> 
> -endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
> +ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
> +
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
> +
> +endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
> +
> +endif # $(CONFIG_RTE_BUILD_SHARED_LIB)
> 
>  endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
> 
> --
> 2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v2] vhost: Add VHOST PMD
  2015-09-23 17:47   ` Loftus, Ciara
@ 2015-10-16  8:40     ` Tetsuya Mukawa
  2015-10-20 14:13       ` Loftus, Ciara
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-10-16  8:40 UTC (permalink / raw)
  To: Loftus, Ciara, dev; +Cc: ann.zhuangyanying

On 2015/09/24 2:47, Loftus, Ciara wrote:
>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
>> The PMD can have 'iface' parameter like below to specify a path to connect
>> to a virtio-net device.
>>
>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
>>
>> To connect above testpmd, here is qemu command example.
>>
>> $ qemu-system-x86_64 \
>>         <snip>
>>         -chardev socket,id=chr0,path=/tmp/sock0 \
>>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
>>         -device virtio-net-pci,netdev=net0
>>
>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
>> ---
>>  config/common_linuxapp                      |   6 +
>>  drivers/net/Makefile                        |   4 +
>>  drivers/net/vhost/Makefile                  |  61 +++
>>  drivers/net/vhost/rte_eth_vhost.c           | 640
>> ++++++++++++++++++++++++++++
>>  drivers/net/vhost/rte_pmd_vhost_version.map |   4 +
>>  mk/rte.app.mk                               |   8 +-
>>  6 files changed, 722 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/net/vhost/Makefile
>>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
>>
>> +struct pmd_internal {
>> +	TAILQ_ENTRY(pmd_internal) next;
>> +	char *dev_name;
>> +	char *iface_name;
>> +	unsigned nb_rx_queues;
>> +	unsigned nb_tx_queues;
>> +	rte_atomic16_t xfer;
> Is this flag just used to indicate the state of the virtio_net device?
> Ie. if =0 then virtio_dev=NULL and if =1 then virtio_net !=NULL & the VIRTIO_DEV_RUNNING flag is set?

Hi Clara,

I am sorry for very late reply.

Yes, it is. Probably we can optimize it more.
I will change this implementation a bit in next patches.
Could you please check it?

>> +
>> +static uint16_t
>> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
>> +{
>> +	struct vhost_queue *r = q;
>> +	uint16_t i, nb_tx = 0;
>> +
>> +	if (unlikely(r->internal == NULL))
>> +		return 0;
>> +
>> +	if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
>> +		return 0;
>> +
>> +	rte_atomic16_set(&r->tx_executing, 1);
>> +
>> +	if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
>> +		goto out;
>> +
>> +	nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
>> +			VIRTIO_RXQ, bufs, nb_bufs);
>> +
>> +	rte_atomic64_add(&(r->tx_pkts), nb_tx);
>> +	rte_atomic64_add(&(r->err_pkts), nb_bufs - nb_tx);
>> +
>> +	for (i = 0; likely(i < nb_tx); i++)
>> +		rte_pktmbuf_free(bufs[i]);
> We may not always want to free these mbufs. For example, if a call is made to rte_eth_tx_burst with buffers from another (non DPDK) source, they may not be ours to free.

Sorry, I am not sure what type of buffer you want to transfer.

This is a PMD that wraps librte_vhost.
And I guess other PMDs cannot handle buffers from another non DPDK source.
Should we take care such buffers?

I have also checked af_packet PMD.
It seems the tx function of af_packet PMD just frees mbuf.

>> +
>> +
>> +	eth_dev = rte_eth_dev_allocated(internal->dev_name);
>> +	if (eth_dev == NULL) {
>> +		RTE_LOG(INFO, PMD, "failuer to find ethdev\n");
> Typo: Failure. Same for the destroy_device function

Thanks, I will fix it in next patches.

>> +		return -1;
>> +	}
>> +
>> +	internal = eth_dev->data->dev_private;
>> +
>> +	for (i = 0; i < internal->nb_rx_queues; i++) {
>> +		vq = &internal->rx_vhost_queues[i];
>> +		vq->device = dev;
>> +		vq->internal = internal;
>> +	}
>> +	for (i = 0; i < internal->nb_tx_queues; i++) {
>> +		vq = &internal->tx_vhost_queues[i];
>> +		vq->device = dev;
>> +		vq->internal = internal;
>> +	}
>> +
>> +	dev->flags |= VIRTIO_DEV_RUNNING;
>> +	dev->priv = eth_dev;
>> +
>> +	eth_dev->data->dev_link.link_status = 1;
>> +	rte_atomic16_set(&internal->xfer, 1);
>> +
>> +	RTE_LOG(INFO, PMD, "New connection established\n");
>> +
>> +	return 0;
> Some freedom is taken away if the new_device and destroy_device callbacks are implemented in the driver.
> For example if one wishes to  call the rte_vhost_enable_guest_notification function when a new device is brought up. They cannot now as there is no scope to modify these callbacks, as is done in for example the vHost sample app. Is this correct?

So how about adding one more parameter to be able to choose guest
notification behavior?

ex)
./testpmd --vdev 'eth_vhost0,iface=/tmp/sock0,guest_notification=0'

In above case, all queues in this device will have VRING_USED_F_NO_NOTIFY.

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v2] vhost: Add VHOST PMD
  2015-08-31  3:55 ` [RFC PATCH v2] vhost: " Tetsuya Mukawa
  2015-09-23 17:47   ` Loftus, Ciara
@ 2015-10-16 12:52   ` Bruce Richardson
  2015-10-19  1:51     ` Tetsuya Mukawa
  2015-10-22  9:45   ` [RFC PATCH v3 0/2] " Tetsuya Mukawa
  2 siblings, 1 reply; 200+ messages in thread
From: Bruce Richardson @ 2015-10-16 12:52 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Mon, Aug 31, 2015 at 12:55:26PM +0900, Tetsuya Mukawa wrote:
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> The PMD can have 'iface' parameter like below to specify a path to connect
> to a virtio-net device.
> 
> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
> 
> To connect above testpmd, here is qemu command example.
> 
> $ qemu-system-x86_64 \
>         <snip>
>         -chardev socket,id=chr0,path=/tmp/sock0 \
>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
>         -device virtio-net-pci,netdev=net0
> 
> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>

With this PMD in place, is there any need to keep the existing vhost library
around as a separate entity? Can the existing library be subsumed/converted into
a standard PMD?

/Bruce

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v2] vhost: Add VHOST PMD
  2015-10-16 12:52   ` Bruce Richardson
@ 2015-10-19  1:51     ` Tetsuya Mukawa
  2015-10-19  9:32       ` Loftus, Ciara
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-10-19  1:51 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, ann.zhuangyanying

On 2015/10/16 21:52, Bruce Richardson wrote:
> On Mon, Aug 31, 2015 at 12:55:26PM +0900, Tetsuya Mukawa wrote:
>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
>> The PMD can have 'iface' parameter like below to specify a path to connect
>> to a virtio-net device.
>>
>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
>>
>> To connect above testpmd, here is qemu command example.
>>
>> $ qemu-system-x86_64 \
>>         <snip>
>>         -chardev socket,id=chr0,path=/tmp/sock0 \
>>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
>>         -device virtio-net-pci,netdev=net0
>>
>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> With this PMD in place, is there any need to keep the existing vhost library
> around as a separate entity? Can the existing library be subsumed/converted into
> a standard PMD?
>
> /Bruce

Hi Bruce,

I concern about whether the PMD has all features of librte_vhost,
because librte_vhost provides more features and freedom than ethdev API
provides.
In some cases, user needs to choose limited implementation without
librte_vhost.
I am going to eliminate such cases while implementing the PMD.
But I don't have strong belief that we can remove librte_vhost now.

So how about keeping current separation in next DPDK?
I guess people will try to replace librte_vhost to vhost PMD, because
apparently using ethdev APIs will be useful in many cases.
And we will get feedbacks like "vhost PMD needs to support like this usage".
(Or we will not have feedbacks, but it's also OK.)
Then, we will be able to merge librte_vhost and vhost PMD.

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v2] vhost: Add VHOST PMD
  2015-10-19  1:51     ` Tetsuya Mukawa
@ 2015-10-19  9:32       ` Loftus, Ciara
  2015-10-19  9:45         ` Bruce Richardson
  0 siblings, 1 reply; 200+ messages in thread
From: Loftus, Ciara @ 2015-10-19  9:32 UTC (permalink / raw)
  To: Tetsuya Mukawa, Richardson, Bruce; +Cc: dev, ann.zhuangyanying

> On 2015/10/16 21:52, Bruce Richardson wrote:
> > On Mon, Aug 31, 2015 at 12:55:26PM +0900, Tetsuya Mukawa wrote:
> >> The patch introduces a new PMD. This PMD is implemented as thin
> wrapper
> >> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> >> The PMD can have 'iface' parameter like below to specify a path to
> connect
> >> to a virtio-net device.
> >>
> >> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
> >>
> >> To connect above testpmd, here is qemu command example.
> >>
> >> $ qemu-system-x86_64 \
> >>         <snip>
> >>         -chardev socket,id=chr0,path=/tmp/sock0 \
> >>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
> >>         -device virtio-net-pci,netdev=net0
> >>
> >> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> > With this PMD in place, is there any need to keep the existing vhost library
> > around as a separate entity? Can the existing library be
> subsumed/converted into
> > a standard PMD?
> >
> > /Bruce
> 
> Hi Bruce,
> 
> I concern about whether the PMD has all features of librte_vhost,
> because librte_vhost provides more features and freedom than ethdev API
> provides.
> In some cases, user needs to choose limited implementation without
> librte_vhost.
> I am going to eliminate such cases while implementing the PMD.
> But I don't have strong belief that we can remove librte_vhost now.
> 
> So how about keeping current separation in next DPDK?
> I guess people will try to replace librte_vhost to vhost PMD, because
> apparently using ethdev APIs will be useful in many cases.
> And we will get feedbacks like "vhost PMD needs to support like this usage".
> (Or we will not have feedbacks, but it's also OK.)
> Then, we will be able to merge librte_vhost and vhost PMD.

I agree with the above. One the concerns I had when reviewing the patch was that the PMD removes some freedom that is available with the library. Eg. Ability to implement the new_device and destroy_device callbacks. If using the PMD you are constrained to the implementations of these in the PMD driver, but if using librte_vhost, you can implement your own with whatever functionality you like - a good example of this can be seen in the vhost sample app.
On the other hand, the PMD is useful in that it removes a lot of complexity for the user and may work for some more general use cases. So I would be in favour of having both options available too.

Ciara

> 
> Thanks,
> Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v2] vhost: Add VHOST PMD
  2015-10-19  9:32       ` Loftus, Ciara
@ 2015-10-19  9:45         ` Bruce Richardson
  2015-10-19 10:50           ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Bruce Richardson @ 2015-10-19  9:45 UTC (permalink / raw)
  To: Loftus, Ciara; +Cc: dev, ann.zhuangyanying

On Mon, Oct 19, 2015 at 10:32:50AM +0100, Loftus, Ciara wrote:
> > On 2015/10/16 21:52, Bruce Richardson wrote:
> > > On Mon, Aug 31, 2015 at 12:55:26PM +0900, Tetsuya Mukawa wrote:
> > >> The patch introduces a new PMD. This PMD is implemented as thin
> > wrapper
> > >> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> > >> The PMD can have 'iface' parameter like below to specify a path to
> > connect
> > >> to a virtio-net device.
> > >>
> > >> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
> > >>
> > >> To connect above testpmd, here is qemu command example.
> > >>
> > >> $ qemu-system-x86_64 \
> > >>         <snip>
> > >>         -chardev socket,id=chr0,path=/tmp/sock0 \
> > >>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
> > >>         -device virtio-net-pci,netdev=net0
> > >>
> > >> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> > > With this PMD in place, is there any need to keep the existing vhost library
> > > around as a separate entity? Can the existing library be
> > subsumed/converted into
> > > a standard PMD?
> > >
> > > /Bruce
> > 
> > Hi Bruce,
> > 
> > I concern about whether the PMD has all features of librte_vhost,
> > because librte_vhost provides more features and freedom than ethdev API
> > provides.
> > In some cases, user needs to choose limited implementation without
> > librte_vhost.
> > I am going to eliminate such cases while implementing the PMD.
> > But I don't have strong belief that we can remove librte_vhost now.
> > 
> > So how about keeping current separation in next DPDK?
> > I guess people will try to replace librte_vhost to vhost PMD, because
> > apparently using ethdev APIs will be useful in many cases.
> > And we will get feedbacks like "vhost PMD needs to support like this usage".
> > (Or we will not have feedbacks, but it's also OK.)
> > Then, we will be able to merge librte_vhost and vhost PMD.
> 
> I agree with the above. One the concerns I had when reviewing the patch was that the PMD removes some freedom that is available with the library. Eg. Ability to implement the new_device and destroy_device callbacks. If using the PMD you are constrained to the implementations of these in the PMD driver, but if using librte_vhost, you can implement your own with whatever functionality you like - a good example of this can be seen in the vhost sample app.
> On the other hand, the PMD is useful in that it removes a lot of complexity for the user and may work for some more general use cases. So I would be in favour of having both options available too.
> 
> Ciara
>

Thanks.
However, just because the libraries are merged does not mean that you need
be limited by PMD functionality. Many PMDs provide additional library-specific
functions over and above their PMD capabilities. The bonded PMD is a good example
here, as it has a whole set of extra functions to create and manipulate bonded
devices - things that are obviously not part of the general ethdev API. Other
vPMDs similarly include functions to allow them to be created on the fly too.

regards,
/Bruce

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v2] vhost: Add VHOST PMD
  2015-10-19  9:45         ` Bruce Richardson
@ 2015-10-19 10:50           ` Tetsuya Mukawa
  2015-10-19 13:26             ` Panu Matilainen
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-10-19 10:50 UTC (permalink / raw)
  To: Bruce Richardson, Loftus, Ciara; +Cc: dev, ann.zhuangyanying

On 2015/10/19 18:45, Bruce Richardson wrote:
> On Mon, Oct 19, 2015 at 10:32:50AM +0100, Loftus, Ciara wrote:
>>> On 2015/10/16 21:52, Bruce Richardson wrote:
>>>> On Mon, Aug 31, 2015 at 12:55:26PM +0900, Tetsuya Mukawa wrote:
>>>>> The patch introduces a new PMD. This PMD is implemented as thin
>>> wrapper
>>>>> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
>>>>> The PMD can have 'iface' parameter like below to specify a path to
>>> connect
>>>>> to a virtio-net device.
>>>>>
>>>>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
>>>>>
>>>>> To connect above testpmd, here is qemu command example.
>>>>>
>>>>> $ qemu-system-x86_64 \
>>>>>         <snip>
>>>>>         -chardev socket,id=chr0,path=/tmp/sock0 \
>>>>>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
>>>>>         -device virtio-net-pci,netdev=net0
>>>>>
>>>>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
>>>> With this PMD in place, is there any need to keep the existing vhost library
>>>> around as a separate entity? Can the existing library be
>>> subsumed/converted into
>>>> a standard PMD?
>>>>
>>>> /Bruce
>>> Hi Bruce,
>>>
>>> I concern about whether the PMD has all features of librte_vhost,
>>> because librte_vhost provides more features and freedom than ethdev API
>>> provides.
>>> In some cases, user needs to choose limited implementation without
>>> librte_vhost.
>>> I am going to eliminate such cases while implementing the PMD.
>>> But I don't have strong belief that we can remove librte_vhost now.
>>>
>>> So how about keeping current separation in next DPDK?
>>> I guess people will try to replace librte_vhost to vhost PMD, because
>>> apparently using ethdev APIs will be useful in many cases.
>>> And we will get feedbacks like "vhost PMD needs to support like this usage".
>>> (Or we will not have feedbacks, but it's also OK.)
>>> Then, we will be able to merge librte_vhost and vhost PMD.
>> I agree with the above. One the concerns I had when reviewing the patch was that the PMD removes some freedom that is available with the library. Eg. Ability to implement the new_device and destroy_device callbacks. If using the PMD you are constrained to the implementations of these in the PMD driver, but if using librte_vhost, you can implement your own with whatever functionality you like - a good example of this can be seen in the vhost sample app.
>> On the other hand, the PMD is useful in that it removes a lot of complexity for the user and may work for some more general use cases. So I would be in favour of having both options available too.
>>
>> Ciara
>>
> Thanks.
> However, just because the libraries are merged does not mean that you need
> be limited by PMD functionality. Many PMDs provide additional library-specific
> functions over and above their PMD capabilities. The bonded PMD is a good example
> here, as it has a whole set of extra functions to create and manipulate bonded
> devices - things that are obviously not part of the general ethdev API. Other
> vPMDs similarly include functions to allow them to be created on the fly too.
>
> regards,
> /Bruce

Hi Bruce,

I appreciate for showing a good example. I haven't noticed the PMD.
I will check the bonding PMD, and try to remove librte_vhost without
losing freedom and features of the library.

Regards,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v2] vhost: Add VHOST PMD
  2015-10-19 10:50           ` Tetsuya Mukawa
@ 2015-10-19 13:26             ` Panu Matilainen
  2015-10-19 13:27               ` Richardson, Bruce
  0 siblings, 1 reply; 200+ messages in thread
From: Panu Matilainen @ 2015-10-19 13:26 UTC (permalink / raw)
  To: Tetsuya Mukawa, Bruce Richardson, Loftus, Ciara; +Cc: dev, ann.zhuangyanying

On 10/19/2015 01:50 PM, Tetsuya Mukawa wrote:
> On 2015/10/19 18:45, Bruce Richardson wrote:
>> On Mon, Oct 19, 2015 at 10:32:50AM +0100, Loftus, Ciara wrote:
>>>> On 2015/10/16 21:52, Bruce Richardson wrote:
>>>>> On Mon, Aug 31, 2015 at 12:55:26PM +0900, Tetsuya Mukawa wrote:
>>>>>> The patch introduces a new PMD. This PMD is implemented as thin
>>>> wrapper
>>>>>> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
>>>>>> The PMD can have 'iface' parameter like below to specify a path to
>>>> connect
>>>>>> to a virtio-net device.
>>>>>>
>>>>>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
>>>>>>
>>>>>> To connect above testpmd, here is qemu command example.
>>>>>>
>>>>>> $ qemu-system-x86_64 \
>>>>>>          <snip>
>>>>>>          -chardev socket,id=chr0,path=/tmp/sock0 \
>>>>>>          -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
>>>>>>          -device virtio-net-pci,netdev=net0
>>>>>>
>>>>>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
>>>>> With this PMD in place, is there any need to keep the existing vhost library
>>>>> around as a separate entity? Can the existing library be
>>>> subsumed/converted into
>>>>> a standard PMD?
>>>>>
>>>>> /Bruce
>>>> Hi Bruce,
>>>>
>>>> I concern about whether the PMD has all features of librte_vhost,
>>>> because librte_vhost provides more features and freedom than ethdev API
>>>> provides.
>>>> In some cases, user needs to choose limited implementation without
>>>> librte_vhost.
>>>> I am going to eliminate such cases while implementing the PMD.
>>>> But I don't have strong belief that we can remove librte_vhost now.
>>>>
>>>> So how about keeping current separation in next DPDK?
>>>> I guess people will try to replace librte_vhost to vhost PMD, because
>>>> apparently using ethdev APIs will be useful in many cases.
>>>> And we will get feedbacks like "vhost PMD needs to support like this usage".
>>>> (Or we will not have feedbacks, but it's also OK.)
>>>> Then, we will be able to merge librte_vhost and vhost PMD.
>>> I agree with the above. One the concerns I had when reviewing the patch was that the PMD removes some freedom that is available with the library. Eg. Ability to implement the new_device and destroy_device callbacks. If using the PMD you are constrained to the implementations of these in the PMD driver, but if using librte_vhost, you can implement your own with whatever functionality you like - a good example of this can be seen in the vhost sample app.
>>> On the other hand, the PMD is useful in that it removes a lot of complexity for the user and may work for some more general use cases. So I would be in favour of having both options available too.
>>>
>>> Ciara
>>>
>> Thanks.
>> However, just because the libraries are merged does not mean that you need
>> be limited by PMD functionality. Many PMDs provide additional library-specific
>> functions over and above their PMD capabilities. The bonded PMD is a good example
>> here, as it has a whole set of extra functions to create and manipulate bonded
>> devices - things that are obviously not part of the general ethdev API. Other
>> vPMDs similarly include functions to allow them to be created on the fly too.
>>
>> regards,
>> /Bruce
>
> Hi Bruce,
>
> I appreciate for showing a good example. I haven't noticed the PMD.
> I will check the bonding PMD, and try to remove librte_vhost without
> losing freedom and features of the library.

Hi,

Just a gentle reminder - if you consider removing (even if by just 
replacing/renaming) an entire library, it needs to happen the ABI 
deprecation process.

It seems obvious enough. But for all the ABI policing here, somehow we 
all failed to notice the two compatibility breaking rename-elephants in 
the room during 2.1 development:
- libintel_dpdk was renamed to libdpdk
- librte_pmd_virtio_uio was renamed to librte_pmd_virtio

Of course these cases are easy to work around with symlinks, and are 
unrelated to the matter at hand. Just wanting to make sure such things 
dont happen again.

	- Panu -

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v2] vhost: Add VHOST PMD
  2015-10-19 13:26             ` Panu Matilainen
@ 2015-10-19 13:27               ` Richardson, Bruce
  2015-10-21  4:35                 ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Richardson, Bruce @ 2015-10-19 13:27 UTC (permalink / raw)
  To: Panu Matilainen, Tetsuya Mukawa, Loftus, Ciara; +Cc: dev, ann.zhuangyanying



> -----Original Message-----
> From: Panu Matilainen [mailto:pmatilai@redhat.com]
> Sent: Monday, October 19, 2015 2:26 PM
> To: Tetsuya Mukawa <mukawa@igel.co.jp>; Richardson, Bruce
> <bruce.richardson@intel.com>; Loftus, Ciara <ciara.loftus@intel.com>
> Cc: dev@dpdk.org; ann.zhuangyanying@huawei.com
> Subject: Re: [dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD
> 
> On 10/19/2015 01:50 PM, Tetsuya Mukawa wrote:
> > On 2015/10/19 18:45, Bruce Richardson wrote:
> >> On Mon, Oct 19, 2015 at 10:32:50AM +0100, Loftus, Ciara wrote:
> >>>> On 2015/10/16 21:52, Bruce Richardson wrote:
> >>>>> On Mon, Aug 31, 2015 at 12:55:26PM +0900, Tetsuya Mukawa wrote:
> >>>>>> The patch introduces a new PMD. This PMD is implemented as thin
> >>>> wrapper
> >>>>>> of librte_vhost. It means librte_vhost is also needed to compile
> the PMD.
> >>>>>> The PMD can have 'iface' parameter like below to specify a path
> >>>>>> to
> >>>> connect
> >>>>>> to a virtio-net device.
> >>>>>>
> >>>>>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
> >>>>>>
> >>>>>> To connect above testpmd, here is qemu command example.
> >>>>>>
> >>>>>> $ qemu-system-x86_64 \
> >>>>>>          <snip>
> >>>>>>          -chardev socket,id=chr0,path=/tmp/sock0 \
> >>>>>>          -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
> >>>>>>          -device virtio-net-pci,netdev=net0
> >>>>>>
> >>>>>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> >>>>> With this PMD in place, is there any need to keep the existing
> >>>>> vhost library around as a separate entity? Can the existing
> >>>>> library be
> >>>> subsumed/converted into
> >>>>> a standard PMD?
> >>>>>
> >>>>> /Bruce
> >>>> Hi Bruce,
> >>>>
> >>>> I concern about whether the PMD has all features of librte_vhost,
> >>>> because librte_vhost provides more features and freedom than ethdev
> >>>> API provides.
> >>>> In some cases, user needs to choose limited implementation without
> >>>> librte_vhost.
> >>>> I am going to eliminate such cases while implementing the PMD.
> >>>> But I don't have strong belief that we can remove librte_vhost now.
> >>>>
> >>>> So how about keeping current separation in next DPDK?
> >>>> I guess people will try to replace librte_vhost to vhost PMD,
> >>>> because apparently using ethdev APIs will be useful in many cases.
> >>>> And we will get feedbacks like "vhost PMD needs to support like this
> usage".
> >>>> (Or we will not have feedbacks, but it's also OK.) Then, we will be
> >>>> able to merge librte_vhost and vhost PMD.
> >>> I agree with the above. One the concerns I had when reviewing the
> patch was that the PMD removes some freedom that is available with the
> library. Eg. Ability to implement the new_device and destroy_device
> callbacks. If using the PMD you are constrained to the implementations of
> these in the PMD driver, but if using librte_vhost, you can implement your
> own with whatever functionality you like - a good example of this can be
> seen in the vhost sample app.
> >>> On the other hand, the PMD is useful in that it removes a lot of
> complexity for the user and may work for some more general use cases. So I
> would be in favour of having both options available too.
> >>>
> >>> Ciara
> >>>
> >> Thanks.
> >> However, just because the libraries are merged does not mean that you
> >> need be limited by PMD functionality. Many PMDs provide additional
> >> library-specific functions over and above their PMD capabilities. The
> >> bonded PMD is a good example here, as it has a whole set of extra
> >> functions to create and manipulate bonded devices - things that are
> >> obviously not part of the general ethdev API. Other vPMDs similarly
> include functions to allow them to be created on the fly too.
> >>
> >> regards,
> >> /Bruce
> >
> > Hi Bruce,
> >
> > I appreciate for showing a good example. I haven't noticed the PMD.
> > I will check the bonding PMD, and try to remove librte_vhost without
> > losing freedom and features of the library.
> 
> Hi,
> 
> Just a gentle reminder - if you consider removing (even if by just
> replacing/renaming) an entire library, it needs to happen the ABI
> deprecation process.
> 
> It seems obvious enough. But for all the ABI policing here, somehow we all
> failed to notice the two compatibility breaking rename-elephants in the
> room during 2.1 development:
> - libintel_dpdk was renamed to libdpdk
> - librte_pmd_virtio_uio was renamed to librte_pmd_virtio
> 
> Of course these cases are easy to work around with symlinks, and are
> unrelated to the matter at hand. Just wanting to make sure such things
> dont happen again.
> 
> 	- Panu -

Still doesn't hurt to remind us, Panu! Thanks. :-)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v2] vhost: Add VHOST PMD
  2015-10-16  8:40     ` Tetsuya Mukawa
@ 2015-10-20 14:13       ` Loftus, Ciara
  2015-10-21  4:30         ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Loftus, Ciara @ 2015-10-20 14:13 UTC (permalink / raw)
  To: Tetsuya Mukawa, dev; +Cc: ann.zhuangyanying

> 
> On 2015/09/24 2:47, Loftus, Ciara wrote:
> >> The patch introduces a new PMD. This PMD is implemented as thin
> wrapper
> >> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> >> The PMD can have 'iface' parameter like below to specify a path to
> connect
> >> to a virtio-net device.
> >>
> >> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
> >>
> >> To connect above testpmd, here is qemu command example.
> >>
> >> $ qemu-system-x86_64 \
> >>         <snip>
> >>         -chardev socket,id=chr0,path=/tmp/sock0 \
> >>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
> >>         -device virtio-net-pci,netdev=net0
> >>
> >> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> >> ---
> >>  config/common_linuxapp                      |   6 +
> >>  drivers/net/Makefile                        |   4 +
> >>  drivers/net/vhost/Makefile                  |  61 +++
> >>  drivers/net/vhost/rte_eth_vhost.c           | 640
> >> ++++++++++++++++++++++++++++
> >>  drivers/net/vhost/rte_pmd_vhost_version.map |   4 +
> >>  mk/rte.app.mk                               |   8 +-
> >>  6 files changed, 722 insertions(+), 1 deletion(-)
> >>  create mode 100644 drivers/net/vhost/Makefile
> >>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
> >>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
> >>
> >> +struct pmd_internal {
> >> +	TAILQ_ENTRY(pmd_internal) next;
> >> +	char *dev_name;
> >> +	char *iface_name;
> >> +	unsigned nb_rx_queues;
> >> +	unsigned nb_tx_queues;
> >> +	rte_atomic16_t xfer;
> > Is this flag just used to indicate the state of the virtio_net device?
> > Ie. if =0 then virtio_dev=NULL and if =1 then virtio_net !=NULL & the
> VIRTIO_DEV_RUNNING flag is set?
> 
> Hi Clara,
> 
> I am sorry for very late reply.
> 
> Yes, it is. Probably we can optimize it more.
> I will change this implementation a bit in next patches.
> Could you please check it?
Of course, thanks.

> 
> >> +
> >> +static uint16_t
> >> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> >> +{
> >> +	struct vhost_queue *r = q;
> >> +	uint16_t i, nb_tx = 0;
> >> +
> >> +	if (unlikely(r->internal == NULL))
> >> +		return 0;
> >> +
> >> +	if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
> >> +		return 0;
> >> +
> >> +	rte_atomic16_set(&r->tx_executing, 1);
> >> +
> >> +	if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
> >> +		goto out;
> >> +
> >> +	nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
> >> +			VIRTIO_RXQ, bufs, nb_bufs);
> >> +
> >> +	rte_atomic64_add(&(r->tx_pkts), nb_tx);
> >> +	rte_atomic64_add(&(r->err_pkts), nb_bufs - nb_tx);
> >> +
> >> +	for (i = 0; likely(i < nb_tx); i++)
> >> +		rte_pktmbuf_free(bufs[i]);
> > We may not always want to free these mbufs. For example, if a call is made
> to rte_eth_tx_burst with buffers from another (non DPDK) source, they may
> not be ours to free.
> 
> Sorry, I am not sure what type of buffer you want to transfer.
> 
> This is a PMD that wraps librte_vhost.
> And I guess other PMDs cannot handle buffers from another non DPDK
> source.
> Should we take care such buffers?
> 
> I have also checked af_packet PMD.
> It seems the tx function of af_packet PMD just frees mbuf.

For example if using the PMD with an application that receives buffers from another source. Eg. a virtual switch receiving packets from an interface using the kernel driver.
I see that af_packet also frees the mbuf. I've checked the ixgbe and ring pmds though and they don't seem to free the buffers, although I may have missed something, the code for these is rather large and I am unfamiliar with most of it. If I am correct though, should this behaviour vary from PMD to PMD I wonder?
> 
> >> +
> >> +
> >> +	eth_dev = rte_eth_dev_allocated(internal->dev_name);
> >> +	if (eth_dev == NULL) {
> >> +		RTE_LOG(INFO, PMD, "failuer to find ethdev\n");
> > Typo: Failure. Same for the destroy_device function
> 
> Thanks, I will fix it in next patches.
> 
> >> +		return -1;
> >> +	}
> >> +
> >> +	internal = eth_dev->data->dev_private;
> >> +
> >> +	for (i = 0; i < internal->nb_rx_queues; i++) {
> >> +		vq = &internal->rx_vhost_queues[i];
> >> +		vq->device = dev;
> >> +		vq->internal = internal;
> >> +	}
> >> +	for (i = 0; i < internal->nb_tx_queues; i++) {
> >> +		vq = &internal->tx_vhost_queues[i];
> >> +		vq->device = dev;
> >> +		vq->internal = internal;
> >> +	}
> >> +
> >> +	dev->flags |= VIRTIO_DEV_RUNNING;
> >> +	dev->priv = eth_dev;
> >> +
> >> +	eth_dev->data->dev_link.link_status = 1;
> >> +	rte_atomic16_set(&internal->xfer, 1);
> >> +
> >> +	RTE_LOG(INFO, PMD, "New connection established\n");
> >> +
> >> +	return 0;
> > Some freedom is taken away if the new_device and destroy_device
> callbacks are implemented in the driver.
> > For example if one wishes to  call the rte_vhost_enable_guest_notification
> function when a new device is brought up. They cannot now as there is no
> scope to modify these callbacks, as is done in for example the vHost sample
> app. Is this correct?
> 
> So how about adding one more parameter to be able to choose guest
> notification behavior?
> 
> ex)
> ./testpmd --vdev 'eth_vhost0,iface=/tmp/sock0,guest_notification=0'
> 
> In above case, all queues in this device will have
> VRING_USED_F_NO_NOTIFY.

I'm not too concerned about this particular function, I was just making an example. The main concern I was expressing here and in the other thread with Bruce, is the risk that we will lose some functionality available in the library but not in the PMD. This function is an example of that. If we could find some way to retain the functionality available in the library, it would be ideal.

Thanks for the response! I will review and test further patches if they become available.

Ciara

> 
> Thanks,
> Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v2] vhost: Add VHOST PMD
  2015-10-20 14:13       ` Loftus, Ciara
@ 2015-10-21  4:30         ` Tetsuya Mukawa
  2015-10-21 10:09           ` Bruce Richardson
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-10-21  4:30 UTC (permalink / raw)
  To: Loftus, Ciara, dev; +Cc: ann.zhuangyanying

On 2015/10/20 23:13, Loftus, Ciara wrote:
>
>>>> +
>>>> +static uint16_t
>>>> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
>>>> +{
>>>> +	struct vhost_queue *r = q;
>>>> +	uint16_t i, nb_tx = 0;
>>>> +
>>>> +	if (unlikely(r->internal == NULL))
>>>> +		return 0;
>>>> +
>>>> +	if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
>>>> +		return 0;
>>>> +
>>>> +	rte_atomic16_set(&r->tx_executing, 1);
>>>> +
>>>> +	if (unlikely(rte_atomic16_read(&r->internal->xfer) == 0))
>>>> +		goto out;
>>>> +
>>>> +	nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
>>>> +			VIRTIO_RXQ, bufs, nb_bufs);
>>>> +
>>>> +	rte_atomic64_add(&(r->tx_pkts), nb_tx);
>>>> +	rte_atomic64_add(&(r->err_pkts), nb_bufs - nb_tx);
>>>> +
>>>> +	for (i = 0; likely(i < nb_tx); i++)
>>>> +		rte_pktmbuf_free(bufs[i]);
>>> We may not always want to free these mbufs. For example, if a call is made
>> to rte_eth_tx_burst with buffers from another (non DPDK) source, they may
>> not be ours to free.
>>
>> Sorry, I am not sure what type of buffer you want to transfer.
>>
>> This is a PMD that wraps librte_vhost.
>> And I guess other PMDs cannot handle buffers from another non DPDK
>> source.
>> Should we take care such buffers?
>>
>> I have also checked af_packet PMD.
>> It seems the tx function of af_packet PMD just frees mbuf.
> For example if using the PMD with an application that receives buffers from another source. Eg. a virtual switch receiving packets from an interface using the kernel driver.

For example, if a software switch on host tries to send data to DPDK
application on guest using vhost PMD and virtio-net PMD.
Also let's assume transfer data of software switch is come from kernel
driver.
In this case, these data on software switch will be copied and
transferred to virio-net PMD through virtqueue.
Because of this, we can free data after sending.
Could you please also check API documentation rte_eth_tx_burst?
(Freeing buffer is default behavior)

> I see that af_packet also frees the mbuf. I've checked the ixgbe and ring pmds though and they don't seem to free the buffers, although I may have missed something, the code for these is rather large and I am unfamiliar with most of it. If I am correct though, should this behaviour vary from PMD to PMD I wonder?

I guess ring PMD is something special.
Because we don't want to copy data with this PMD, RX function doesn't
allocate buffers, also TX function doesn't free buffers.
But other normal PMD will allocate buffers when RX is called, and free
buffers when TX is called.

>>>> +
>>>> +
>>>> +	eth_dev = rte_eth_dev_allocated(internal->dev_name);
>>>> +	if (eth_dev == NULL) {
>>>> +		RTE_LOG(INFO, PMD, "failuer to find ethdev\n");
>>> Typo: Failure. Same for the destroy_device function
>> Thanks, I will fix it in next patches.
>>
>>>> +		return -1;
>>>> +	}
>>>> +
>>>> +	internal = eth_dev->data->dev_private;
>>>> +
>>>> +	for (i = 0; i < internal->nb_rx_queues; i++) {
>>>> +		vq = &internal->rx_vhost_queues[i];
>>>> +		vq->device = dev;
>>>> +		vq->internal = internal;
>>>> +	}
>>>> +	for (i = 0; i < internal->nb_tx_queues; i++) {
>>>> +		vq = &internal->tx_vhost_queues[i];
>>>> +		vq->device = dev;
>>>> +		vq->internal = internal;
>>>> +	}
>>>> +
>>>> +	dev->flags |= VIRTIO_DEV_RUNNING;
>>>> +	dev->priv = eth_dev;
>>>> +
>>>> +	eth_dev->data->dev_link.link_status = 1;
>>>> +	rte_atomic16_set(&internal->xfer, 1);
>>>> +
>>>> +	RTE_LOG(INFO, PMD, "New connection established\n");
>>>> +
>>>> +	return 0;
>>> Some freedom is taken away if the new_device and destroy_device
>> callbacks are implemented in the driver.
>>> For example if one wishes to  call the rte_vhost_enable_guest_notification
>> function when a new device is brought up. They cannot now as there is no
>> scope to modify these callbacks, as is done in for example the vHost sample
>> app. Is this correct?
>>
>> So how about adding one more parameter to be able to choose guest
>> notification behavior?
>>
>> ex)
>> ./testpmd --vdev 'eth_vhost0,iface=/tmp/sock0,guest_notification=0'
>>
>> In above case, all queues in this device will have
>> VRING_USED_F_NO_NOTIFY.
> I'm not too concerned about this particular function, I was just making an example. The main concern I was expressing here and in the other thread with Bruce, is the risk that we will lose some functionality available in the library but not in the PMD. This function is an example of that. If we could find some way to retain the functionality available in the library, it would be ideal.

I will reply to an other thread.
Anyway, I am going to keep current vhost library APIs.

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v2] vhost: Add VHOST PMD
  2015-10-19 13:27               ` Richardson, Bruce
@ 2015-10-21  4:35                 ` Tetsuya Mukawa
  2015-10-21  6:25                   ` Panu Matilainen
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-10-21  4:35 UTC (permalink / raw)
  To: Richardson, Bruce, Panu Matilainen, Loftus, Ciara; +Cc: dev, ann.zhuangyanying

On 2015/10/19 22:27, Richardson, Bruce wrote:
>> -----Original Message-----
>> From: Panu Matilainen [mailto:pmatilai@redhat.com]
>> Sent: Monday, October 19, 2015 2:26 PM
>> To: Tetsuya Mukawa <mukawa@igel.co.jp>; Richardson, Bruce
>> <bruce.richardson@intel.com>; Loftus, Ciara <ciara.loftus@intel.com>
>> Cc: dev@dpdk.org; ann.zhuangyanying@huawei.com
>> Subject: Re: [dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD
>>
>> On 10/19/2015 01:50 PM, Tetsuya Mukawa wrote:
>>> On 2015/10/19 18:45, Bruce Richardson wrote:
>>>> On Mon, Oct 19, 2015 at 10:32:50AM +0100, Loftus, Ciara wrote:
>>>>>> On 2015/10/16 21:52, Bruce Richardson wrote:
>>>>>>> On Mon, Aug 31, 2015 at 12:55:26PM +0900, Tetsuya Mukawa wrote:
>>>>>>>> The patch introduces a new PMD. This PMD is implemented as thin
>>>>>> wrapper
>>>>>>>> of librte_vhost. It means librte_vhost is also needed to compile
>> the PMD.
>>>>>>>> The PMD can have 'iface' parameter like below to specify a path
>>>>>>>> to
>>>>>> connect
>>>>>>>> to a virtio-net device.
>>>>>>>>
>>>>>>>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
>>>>>>>>
>>>>>>>> To connect above testpmd, here is qemu command example.
>>>>>>>>
>>>>>>>> $ qemu-system-x86_64 \
>>>>>>>>          <snip>
>>>>>>>>          -chardev socket,id=chr0,path=/tmp/sock0 \
>>>>>>>>          -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
>>>>>>>>          -device virtio-net-pci,netdev=net0
>>>>>>>>
>>>>>>>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
>>>>>>> With this PMD in place, is there any need to keep the existing
>>>>>>> vhost library around as a separate entity? Can the existing
>>>>>>> library be
>>>>>> subsumed/converted into
>>>>>>> a standard PMD?
>>>>>>>
>>>>>>> /Bruce
>>>>>> Hi Bruce,
>>>>>>
>>>>>> I concern about whether the PMD has all features of librte_vhost,
>>>>>> because librte_vhost provides more features and freedom than ethdev
>>>>>> API provides.
>>>>>> In some cases, user needs to choose limited implementation without
>>>>>> librte_vhost.
>>>>>> I am going to eliminate such cases while implementing the PMD.
>>>>>> But I don't have strong belief that we can remove librte_vhost now.
>>>>>>
>>>>>> So how about keeping current separation in next DPDK?
>>>>>> I guess people will try to replace librte_vhost to vhost PMD,
>>>>>> because apparently using ethdev APIs will be useful in many cases.
>>>>>> And we will get feedbacks like "vhost PMD needs to support like this
>> usage".
>>>>>> (Or we will not have feedbacks, but it's also OK.) Then, we will be
>>>>>> able to merge librte_vhost and vhost PMD.
>>>>> I agree with the above. One the concerns I had when reviewing the
>> patch was that the PMD removes some freedom that is available with the
>> library. Eg. Ability to implement the new_device and destroy_device
>> callbacks. If using the PMD you are constrained to the implementations of
>> these in the PMD driver, but if using librte_vhost, you can implement your
>> own with whatever functionality you like - a good example of this can be
>> seen in the vhost sample app.
>>>>> On the other hand, the PMD is useful in that it removes a lot of
>> complexity for the user and may work for some more general use cases. So I
>> would be in favour of having both options available too.
>>>>> Ciara
>>>>>
>>>> Thanks.
>>>> However, just because the libraries are merged does not mean that you
>>>> need be limited by PMD functionality. Many PMDs provide additional
>>>> library-specific functions over and above their PMD capabilities. The
>>>> bonded PMD is a good example here, as it has a whole set of extra
>>>> functions to create and manipulate bonded devices - things that are
>>>> obviously not part of the general ethdev API. Other vPMDs similarly
>> include functions to allow them to be created on the fly too.
>>>> regards,
>>>> /Bruce
>>> Hi Bruce,
>>>
>>> I appreciate for showing a good example. I haven't noticed the PMD.
>>> I will check the bonding PMD, and try to remove librte_vhost without
>>> losing freedom and features of the library.
>> Hi,
>>
>> Just a gentle reminder - if you consider removing (even if by just
>> replacing/renaming) an entire library, it needs to happen the ABI
>> deprecation process.
>>
>> It seems obvious enough. But for all the ABI policing here, somehow we all
>> failed to notice the two compatibility breaking rename-elephants in the
>> room during 2.1 development:
>> - libintel_dpdk was renamed to libdpdk
>> - librte_pmd_virtio_uio was renamed to librte_pmd_virtio
>>
>> Of course these cases are easy to work around with symlinks, and are
>> unrelated to the matter at hand. Just wanting to make sure such things
>> dont happen again.
>>
>> 	- Panu -
> Still doesn't hurt to remind us, Panu! Thanks. :-)

Hi,

Thanks for reminder. I've checked the DPDK documentation.
I will submit deprecation notice to follow DPDK deprecation process.
(Probably we will be able to remove vhost library in DPDK-2.3 or later.)

BTW, I will merge vhost library and PMD like below.
Step1. Move vhost library under vhost PMD.
Step2. Rename current APIs.
Step3. Add a function to get a pointer of "struct virtio_net device" by
a portno.

Last steps allows us to be able to convert a portno to the pointer of
corresponding vrtio_net device.
And we can still use features and freedom vhost library APIs provided.

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v2] vhost: Add VHOST PMD
  2015-10-21  4:35                 ` Tetsuya Mukawa
@ 2015-10-21  6:25                   ` Panu Matilainen
  2015-10-21 10:22                     ` Bruce Richardson
  0 siblings, 1 reply; 200+ messages in thread
From: Panu Matilainen @ 2015-10-21  6:25 UTC (permalink / raw)
  To: Tetsuya Mukawa, Richardson, Bruce, Loftus, Ciara; +Cc: dev, ann.zhuangyanying

On 10/21/2015 07:35 AM, Tetsuya Mukawa wrote:
> On 2015/10/19 22:27, Richardson, Bruce wrote:
>>> -----Original Message-----
>>> From: Panu Matilainen [mailto:pmatilai@redhat.com]
>>> Sent: Monday, October 19, 2015 2:26 PM
>>> To: Tetsuya Mukawa <mukawa@igel.co.jp>; Richardson, Bruce
>>> <bruce.richardson@intel.com>; Loftus, Ciara <ciara.loftus@intel.com>
>>> Cc: dev@dpdk.org; ann.zhuangyanying@huawei.com
>>> Subject: Re: [dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD
>>>
>>> On 10/19/2015 01:50 PM, Tetsuya Mukawa wrote:
>>>> On 2015/10/19 18:45, Bruce Richardson wrote:
>>>>> On Mon, Oct 19, 2015 at 10:32:50AM +0100, Loftus, Ciara wrote:
>>>>>>> On 2015/10/16 21:52, Bruce Richardson wrote:
>>>>>>>> On Mon, Aug 31, 2015 at 12:55:26PM +0900, Tetsuya Mukawa wrote:
>>>>>>>>> The patch introduces a new PMD. This PMD is implemented as thin
>>>>>>> wrapper
>>>>>>>>> of librte_vhost. It means librte_vhost is also needed to compile
>>> the PMD.
>>>>>>>>> The PMD can have 'iface' parameter like below to specify a path
>>>>>>>>> to
>>>>>>> connect
>>>>>>>>> to a virtio-net device.
>>>>>>>>>
>>>>>>>>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
>>>>>>>>>
>>>>>>>>> To connect above testpmd, here is qemu command example.
>>>>>>>>>
>>>>>>>>> $ qemu-system-x86_64 \
>>>>>>>>>           <snip>
>>>>>>>>>           -chardev socket,id=chr0,path=/tmp/sock0 \
>>>>>>>>>           -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
>>>>>>>>>           -device virtio-net-pci,netdev=net0
>>>>>>>>>
>>>>>>>>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
>>>>>>>> With this PMD in place, is there any need to keep the existing
>>>>>>>> vhost library around as a separate entity? Can the existing
>>>>>>>> library be
>>>>>>> subsumed/converted into
>>>>>>>> a standard PMD?
>>>>>>>>
>>>>>>>> /Bruce
>>>>>>> Hi Bruce,
>>>>>>>
>>>>>>> I concern about whether the PMD has all features of librte_vhost,
>>>>>>> because librte_vhost provides more features and freedom than ethdev
>>>>>>> API provides.
>>>>>>> In some cases, user needs to choose limited implementation without
>>>>>>> librte_vhost.
>>>>>>> I am going to eliminate such cases while implementing the PMD.
>>>>>>> But I don't have strong belief that we can remove librte_vhost now.
>>>>>>>
>>>>>>> So how about keeping current separation in next DPDK?
>>>>>>> I guess people will try to replace librte_vhost to vhost PMD,
>>>>>>> because apparently using ethdev APIs will be useful in many cases.
>>>>>>> And we will get feedbacks like "vhost PMD needs to support like this
>>> usage".
>>>>>>> (Or we will not have feedbacks, but it's also OK.) Then, we will be
>>>>>>> able to merge librte_vhost and vhost PMD.
>>>>>> I agree with the above. One the concerns I had when reviewing the
>>> patch was that the PMD removes some freedom that is available with the
>>> library. Eg. Ability to implement the new_device and destroy_device
>>> callbacks. If using the PMD you are constrained to the implementations of
>>> these in the PMD driver, but if using librte_vhost, you can implement your
>>> own with whatever functionality you like - a good example of this can be
>>> seen in the vhost sample app.
>>>>>> On the other hand, the PMD is useful in that it removes a lot of
>>> complexity for the user and may work for some more general use cases. So I
>>> would be in favour of having both options available too.
>>>>>> Ciara
>>>>>>
>>>>> Thanks.
>>>>> However, just because the libraries are merged does not mean that you
>>>>> need be limited by PMD functionality. Many PMDs provide additional
>>>>> library-specific functions over and above their PMD capabilities. The
>>>>> bonded PMD is a good example here, as it has a whole set of extra
>>>>> functions to create and manipulate bonded devices - things that are
>>>>> obviously not part of the general ethdev API. Other vPMDs similarly
>>> include functions to allow them to be created on the fly too.
>>>>> regards,
>>>>> /Bruce
>>>> Hi Bruce,
>>>>
>>>> I appreciate for showing a good example. I haven't noticed the PMD.
>>>> I will check the bonding PMD, and try to remove librte_vhost without
>>>> losing freedom and features of the library.
>>> Hi,
>>>
>>> Just a gentle reminder - if you consider removing (even if by just
>>> replacing/renaming) an entire library, it needs to happen the ABI
>>> deprecation process.
>>>
>>> It seems obvious enough. But for all the ABI policing here, somehow we all
>>> failed to notice the two compatibility breaking rename-elephants in the
>>> room during 2.1 development:
>>> - libintel_dpdk was renamed to libdpdk
>>> - librte_pmd_virtio_uio was renamed to librte_pmd_virtio
>>>
>>> Of course these cases are easy to work around with symlinks, and are
>>> unrelated to the matter at hand. Just wanting to make sure such things
>>> dont happen again.
>>>
>>> 	- Panu -
>> Still doesn't hurt to remind us, Panu! Thanks. :-)
>
> Hi,
>
> Thanks for reminder. I've checked the DPDK documentation.
> I will submit deprecation notice to follow DPDK deprecation process.
> (Probably we will be able to remove vhost library in DPDK-2.3 or later.)
>
> BTW, I will merge vhost library and PMD like below.
> Step1. Move vhost library under vhost PMD.
> Step2. Rename current APIs.
> Step3. Add a function to get a pointer of "struct virtio_net device" by
> a portno.
>
> Last steps allows us to be able to convert a portno to the pointer of
> corresponding vrtio_net device.
> And we can still use features and freedom vhost library APIs provided.

Just wondering, is that *really* worth the price of breaking every 
single vhost library user out there?

I mean, this is not about removing some bitrotten function or two which 
nobody cares about anymore but removing (by renaming) one of the more 
widely (AFAICS) used libraries and its entire API.

If current APIs are kept then compatibility is largely a matter of 
planting a strategic symlink or two, but it might make the API look 
inconsistent.

But just wondering about the benefit of this merge thing, compared to 
just adding a vhost pmd and leaving the library be. The ABI process is 
not there to make life miserable for DPDK developers, its there to help 
make DPDK nicer for *other* developers. And the first and the foremost 
rule is simply: dont break backwards compatibility. Not unless there's a 
damn good reason to doing so, and I fail to see that reason here.

	- Panu -

> Thanks,
> Tetsuya
>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v2] vhost: Add VHOST PMD
  2015-10-21  4:30         ` Tetsuya Mukawa
@ 2015-10-21 10:09           ` Bruce Richardson
  0 siblings, 0 replies; 200+ messages in thread
From: Bruce Richardson @ 2015-10-21 10:09 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Wed, Oct 21, 2015 at 01:30:54PM +0900, Tetsuya Mukawa wrote:
> On 2015/10/20 23:13, Loftus, Ciara wrote:
> >
> > I see that af_packet also frees the mbuf. I've checked the ixgbe and ring pmds though and they don't seem to free the buffers, although I may have missed something, the code for these is rather large and I am unfamiliar with most of it. If I am correct though, should this behaviour vary from PMD to PMD I wonder?
> 
> I guess ring PMD is something special.
> Because we don't want to copy data with this PMD, RX function doesn't
> allocate buffers, also TX function doesn't free buffers.
> But other normal PMD will allocate buffers when RX is called, and free
> buffers when TX is called.
> 

Yes, this is correct. Ring pmd is the exception since it automatically recycles
buffers, and so does not need to alloc/free mbufs. (ixgbe frees the buffers 
post-TX as part of the TX ring cleanup)

/Bruce

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v2] vhost: Add VHOST PMD
  2015-10-21  6:25                   ` Panu Matilainen
@ 2015-10-21 10:22                     ` Bruce Richardson
  2015-10-22  9:50                       ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Bruce Richardson @ 2015-10-21 10:22 UTC (permalink / raw)
  To: Panu Matilainen; +Cc: dev, ann.zhuangyanying

On Wed, Oct 21, 2015 at 09:25:12AM +0300, Panu Matilainen wrote:
> On 10/21/2015 07:35 AM, Tetsuya Mukawa wrote:
> >On 2015/10/19 22:27, Richardson, Bruce wrote:
> >>>-----Original Message-----
> >>>From: Panu Matilainen [mailto:pmatilai@redhat.com]
> >>>Sent: Monday, October 19, 2015 2:26 PM
> >>>To: Tetsuya Mukawa <mukawa@igel.co.jp>; Richardson, Bruce
> >>><bruce.richardson@intel.com>; Loftus, Ciara <ciara.loftus@intel.com>
> >>>Cc: dev@dpdk.org; ann.zhuangyanying@huawei.com
> >>>Subject: Re: [dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD
> >>>
> >>>On 10/19/2015 01:50 PM, Tetsuya Mukawa wrote:
> >>>>On 2015/10/19 18:45, Bruce Richardson wrote:
> >>>>>On Mon, Oct 19, 2015 at 10:32:50AM +0100, Loftus, Ciara wrote:
> >>>>>>>On 2015/10/16 21:52, Bruce Richardson wrote:
> >>>>>>>>On Mon, Aug 31, 2015 at 12:55:26PM +0900, Tetsuya Mukawa wrote:
> >>>>>>>>>The patch introduces a new PMD. This PMD is implemented as thin
> >>>>>>>wrapper
> >>>>>>>>>of librte_vhost. It means librte_vhost is also needed to compile
> >>>the PMD.
> >>>>>>>>>The PMD can have 'iface' parameter like below to specify a path
> >>>>>>>>>to
> >>>>>>>connect
> >>>>>>>>>to a virtio-net device.
> >>>>>>>>>
> >>>>>>>>>$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
> >>>>>>>>>
> >>>>>>>>>To connect above testpmd, here is qemu command example.
> >>>>>>>>>
> >>>>>>>>>$ qemu-system-x86_64 \
> >>>>>>>>>          <snip>
> >>>>>>>>>          -chardev socket,id=chr0,path=/tmp/sock0 \
> >>>>>>>>>          -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
> >>>>>>>>>          -device virtio-net-pci,netdev=net0
> >>>>>>>>>
> >>>>>>>>>Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> >>>>>>>>With this PMD in place, is there any need to keep the existing
> >>>>>>>>vhost library around as a separate entity? Can the existing
> >>>>>>>>library be
> >>>>>>>subsumed/converted into
> >>>>>>>>a standard PMD?
> >>>>>>>>
> >>>>>>>>/Bruce
> >>>>>>>Hi Bruce,
> >>>>>>>
> >>>>>>>I concern about whether the PMD has all features of librte_vhost,
> >>>>>>>because librte_vhost provides more features and freedom than ethdev
> >>>>>>>API provides.
> >>>>>>>In some cases, user needs to choose limited implementation without
> >>>>>>>librte_vhost.
> >>>>>>>I am going to eliminate such cases while implementing the PMD.
> >>>>>>>But I don't have strong belief that we can remove librte_vhost now.
> >>>>>>>
> >>>>>>>So how about keeping current separation in next DPDK?
> >>>>>>>I guess people will try to replace librte_vhost to vhost PMD,
> >>>>>>>because apparently using ethdev APIs will be useful in many cases.
> >>>>>>>And we will get feedbacks like "vhost PMD needs to support like this
> >>>usage".
> >>>>>>>(Or we will not have feedbacks, but it's also OK.) Then, we will be
> >>>>>>>able to merge librte_vhost and vhost PMD.
> >>>>>>I agree with the above. One the concerns I had when reviewing the
> >>>patch was that the PMD removes some freedom that is available with the
> >>>library. Eg. Ability to implement the new_device and destroy_device
> >>>callbacks. If using the PMD you are constrained to the implementations of
> >>>these in the PMD driver, but if using librte_vhost, you can implement your
> >>>own with whatever functionality you like - a good example of this can be
> >>>seen in the vhost sample app.
> >>>>>>On the other hand, the PMD is useful in that it removes a lot of
> >>>complexity for the user and may work for some more general use cases. So I
> >>>would be in favour of having both options available too.
> >>>>>>Ciara
> >>>>>>
> >>>>>Thanks.
> >>>>>However, just because the libraries are merged does not mean that you
> >>>>>need be limited by PMD functionality. Many PMDs provide additional
> >>>>>library-specific functions over and above their PMD capabilities. The
> >>>>>bonded PMD is a good example here, as it has a whole set of extra
> >>>>>functions to create and manipulate bonded devices - things that are
> >>>>>obviously not part of the general ethdev API. Other vPMDs similarly
> >>>include functions to allow them to be created on the fly too.
> >>>>>regards,
> >>>>>/Bruce
> >>>>Hi Bruce,
> >>>>
> >>>>I appreciate for showing a good example. I haven't noticed the PMD.
> >>>>I will check the bonding PMD, and try to remove librte_vhost without
> >>>>losing freedom and features of the library.
> >>>Hi,
> >>>
> >>>Just a gentle reminder - if you consider removing (even if by just
> >>>replacing/renaming) an entire library, it needs to happen the ABI
> >>>deprecation process.
> >>>
> >>>It seems obvious enough. But for all the ABI policing here, somehow we all
> >>>failed to notice the two compatibility breaking rename-elephants in the
> >>>room during 2.1 development:
> >>>- libintel_dpdk was renamed to libdpdk
> >>>- librte_pmd_virtio_uio was renamed to librte_pmd_virtio
> >>>
> >>>Of course these cases are easy to work around with symlinks, and are
> >>>unrelated to the matter at hand. Just wanting to make sure such things
> >>>dont happen again.
> >>>
> >>>	- Panu -
> >>Still doesn't hurt to remind us, Panu! Thanks. :-)
> >
> >Hi,
> >
> >Thanks for reminder. I've checked the DPDK documentation.
> >I will submit deprecation notice to follow DPDK deprecation process.
> >(Probably we will be able to remove vhost library in DPDK-2.3 or later.)
> >
> >BTW, I will merge vhost library and PMD like below.
> >Step1. Move vhost library under vhost PMD.
> >Step2. Rename current APIs.
> >Step3. Add a function to get a pointer of "struct virtio_net device" by
> >a portno.
> >
> >Last steps allows us to be able to convert a portno to the pointer of
> >corresponding vrtio_net device.
> >And we can still use features and freedom vhost library APIs provided.
> 
> Just wondering, is that *really* worth the price of breaking every single
> vhost library user out there?
> 
> I mean, this is not about removing some bitrotten function or two which
> nobody cares about anymore but removing (by renaming) one of the more widely
> (AFAICS) used libraries and its entire API.
> 
> If current APIs are kept then compatibility is largely a matter of planting
> a strategic symlink or two, but it might make the API look inconsistent.
> 
> But just wondering about the benefit of this merge thing, compared to just
> adding a vhost pmd and leaving the library be. The ABI process is not there
> to make life miserable for DPDK developers, its there to help make DPDK
> nicer for *other* developers. And the first and the foremost rule is simply:
> dont break backwards compatibility. Not unless there's a damn good reason to
> doing so, and I fail to see that reason here.
> 
> 	- Panu -
>
Good question, and I'll accept that maybe it's not worth doing. I'm not that
much of an expert on the internals and APIs of vhost library.

However, the merge I was looking for was more from a code locality point
of view, to have all the vhost code in one directory (under drivers/net),
than spread across multiple ones. What API's need to be deprecated
or not as part of that work, is a separate question, and so in theory we could
create a combined vhost library that does not deprecate anything (though to
avoid a build-up of technical debt, we'll probably want to deprecate some 
functions).

I'll leave it up to the vhost experts do decide what's best, but for me, any
library that handles transmission and reception of packets outside of a DPDK
app should be a PMD library using ethdev rx/tx burst routines, and located
under drivers/net. (KNI is another obvious target for such a move and conversion).

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [RFC PATCH v3 0/2] Add VHOST PMD
  2015-08-31  3:55 ` [RFC PATCH v2] vhost: " Tetsuya Mukawa
  2015-09-23 17:47   ` Loftus, Ciara
  2015-10-16 12:52   ` Bruce Richardson
@ 2015-10-22  9:45   ` Tetsuya Mukawa
  2015-10-22  9:45     ` [RFC PATCH v3 1/2] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
  2015-10-22  9:45     ` [RFC PATCH v3 2/2] vhost: " Tetsuya Mukawa
  2 siblings, 2 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-10-22  9:45 UTC (permalink / raw)
  To: dev, ciara.loftus; +Cc: ann.zhuangyanying

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.

I've submitted below patches in former patch sets. But it seems some issues
were fixed already.

 - [PATCH 1/3] vhost: Fix return value of GET_VRING_BASE message
 - [PATCH 2/3] vhost: Fix RESET_OWNER handling not to close callfd
 - [PATCH 3/3] vhost: Fix RESET_OWNER handling not to free virtqueue

I've still seen some reasource leaks of vhost library, but in this RFC,
I focused on vhost PMD.
After I get agreement, I will submit a patch for the leak issue as separated
patch. So please check directionality of vhost PMD.

PATCH v3 changes:
 - Optimize performance.
   In RX/TX functions, change code to access only per core data.
 - Add below API to allow user to use vhost library APIs for a port managed
   by vhost PMD. There are a bit of limitations. See "rte_eth_vhost.h".
    - rte_eth_vhost_portid2vdev()
   To support this functionality, vhost library is also changed.
   Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
 - Add code to support vhost multiple queues.
   Actually, multiple queues functionality is not enabled so far.

Tetsuya Mukawa (2):
  vhost: Add callback and private data for vhost PMD
  vhost: Add VHOST PMD

 config/common_linuxapp                        |   6 +
 drivers/net/Makefile                          |   4 +
 drivers/net/vhost/Makefile                    |  62 +++
 drivers/net/vhost/rte_eth_vhost.c             | 735 ++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h             |  65 +++
 drivers/net/vhost/rte_pmd_vhost_version.map   |   8 +
 lib/librte_vhost/rte_virtio_net.h             |   3 +
 lib/librte_vhost/vhost_user/virtio-net-user.c |   8 +-
 lib/librte_vhost/virtio-net.c                 |  40 +-
 lib/librte_vhost/virtio-net.h                 |   3 +-
 mk/rte.app.mk                                 |   8 +-
 11 files changed, 934 insertions(+), 8 deletions(-)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

-- 
2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [RFC PATCH v3 1/2] vhost: Add callback and private data for vhost PMD
  2015-10-22  9:45   ` [RFC PATCH v3 0/2] " Tetsuya Mukawa
@ 2015-10-22  9:45     ` Tetsuya Mukawa
  2015-10-27  6:12       ` [PATCH 0/3] Add VHOST PMD Tetsuya Mukawa
  2015-10-22  9:45     ` [RFC PATCH v3 2/2] vhost: " Tetsuya Mukawa
  1 sibling, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-10-22  9:45 UTC (permalink / raw)
  To: dev, ciara.loftus; +Cc: ann.zhuangyanying

These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs. To avoid it, callback and private
data for vhost PMD are needed.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 lib/librte_vhost/rte_virtio_net.h             |  3 ++
 lib/librte_vhost/vhost_user/virtio-net-user.c |  8 +++---
 lib/librte_vhost/virtio-net.c                 | 40 +++++++++++++++++++++++++--
 lib/librte_vhost/virtio-net.h                 |  3 +-
 4 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 93d3e27..ec84c9b 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -108,6 +108,7 @@ struct virtio_net {
 	uint32_t		virt_qp_nb;
 	uint32_t		mem_idx;	/** Used in set memory layout, unique for each queue within virtio device. */
 	void			*priv;		/**< private context */
+	void			*pmd_priv;	/**< private context for vhost PMD */
 } __rte_cache_aligned;
 
 /**
@@ -198,6 +199,8 @@ int rte_vhost_driver_unregister(const char *dev_name);
 
 /* Register callbacks. */
 int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const);
+/* Register callbacks for vhost PMD (Only for internal). */
+int rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const);
 /* Start vhost driver session blocking loop. */
 int rte_vhost_driver_session_start(void);
 
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index 6a12d96..a75697f 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -288,7 +288,7 @@ user_set_vring_kick(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
 
 	if (virtio_is_ready(dev) &&
 		!(dev->flags & VIRTIO_DEV_RUNNING))
-			notify_ops->new_device(dev);
+			notify_new_device(dev);
 }
 
 /*
@@ -302,7 +302,7 @@ user_get_vring_base(struct vhost_device_ctx ctx,
 
 	/* We have to stop the queue (virtio) if it is running. */
 	if (dev->flags & VIRTIO_DEV_RUNNING)
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	/* Here we are safe to get the last used index */
 	ops->get_vring_base(ctx, state->index, state);
@@ -333,7 +333,7 @@ user_reset_owner(struct vhost_device_ctx ctx,
 
 	/* We have to stop the queue (virtio) if it is running. */
 	if (dev->flags & VIRTIO_DEV_RUNNING)
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	RTE_LOG(INFO, VHOST_CONFIG,
 		"reset owner --- state idx:%d state num:%d\n", state->index, state->num);
@@ -379,7 +379,7 @@ user_destroy_device(struct vhost_device_ctx ctx)
 	uint32_t i;
 
 	if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	for (i = 0; i < dev->virt_qp_nb; i++)
 		if (dev && dev->mem_arr[i]) {
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 3131719..eec3c22 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -64,6 +64,8 @@ struct virtio_net_config_ll {
 
 /* device ops to add/remove device to/from data core. */
 struct virtio_net_device_ops const *notify_ops;
+/* device ops for vhost PMD to add/remove device to/from data core. */
+struct virtio_net_device_ops const *pmd_notify_ops;
 /* root address of the linked list of managed virtio devices */
 static struct virtio_net_config_ll *ll_root;
 
@@ -84,6 +86,29 @@ static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
 
 static uint64_t VHOST_PROTOCOL_FEATURES = VHOST_SUPPORTED_PROTOCOL_FEATURES;
 
+int
+notify_new_device(struct virtio_net *dev)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->new_device != NULL)) {
+		int ret = pmd_notify_ops->new_device(dev);
+		if (ret != 0)
+			return ret;
+	}
+	if ((notify_ops != NULL) && (notify_ops->new_device != NULL))
+		return notify_ops->new_device(dev);
+
+	return 0;
+}
+
+void
+notify_destroy_device(volatile struct virtio_net *dev)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->destroy_device != NULL))
+		pmd_notify_ops->destroy_device(dev);
+	if ((notify_ops != NULL) && (notify_ops->destroy_device != NULL))
+		notify_ops->destroy_device(dev);
+}
+
 /*
  * Converts QEMU virtual address to Vhost virtual address. This function is
  * used to convert the ring addresses to our address space.
@@ -421,7 +446,7 @@ destroy_device(struct vhost_device_ctx ctx)
 			 * the function to remove it from the data core.
 			 */
 			if ((ll_dev_cur->dev.flags & VIRTIO_DEV_RUNNING))
-				notify_ops->destroy_device(&(ll_dev_cur->dev));
+				notify_destroy_device(&(ll_dev_cur->dev));
 			ll_dev_cur = rm_config_ll_entry(ll_dev_cur,
 					ll_dev_last);
 		} else {
@@ -884,7 +909,7 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 				dev->virtqueue[VIRTIO_RXQ]->enabled = 1;
 				dev->virtqueue[VIRTIO_TXQ]->enabled = 1;
 			}
-			return notify_ops->new_device(dev);
+			return notify_new_device(dev);
 		}
 	/* Otherwise we remove it. */
 	} else
@@ -1006,3 +1031,14 @@ rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const op
 
 	return 0;
 }
+
+/*
+ * Register ops so that we can add/remove device to data core.
+ */
+int
+rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const ops)
+{
+	pmd_notify_ops = ops;
+
+	return 0;
+}
diff --git a/lib/librte_vhost/virtio-net.h b/lib/librte_vhost/virtio-net.h
index ef6efae..f92ed73 100644
--- a/lib/librte_vhost/virtio-net.h
+++ b/lib/librte_vhost/virtio-net.h
@@ -39,7 +39,8 @@
 
 #define VHOST_USER_PROTOCOL_F_VRING_FLAG 2
 
-struct virtio_net_device_ops const *notify_ops;
 struct virtio_net *get_device(struct vhost_device_ctx ctx);
 
+int notify_new_device(struct virtio_net *dev);
+void notify_destroy_device(volatile struct virtio_net *dev);
 #endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH v3 2/2] vhost: Add VHOST PMD
  2015-10-22  9:45   ` [RFC PATCH v3 0/2] " Tetsuya Mukawa
  2015-10-22  9:45     ` [RFC PATCH v3 1/2] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
@ 2015-10-22  9:45     ` Tetsuya Mukawa
  2015-10-22 12:49       ` Bruce Richardson
  2015-10-29 14:25       ` Xie, Huawei
  1 sibling, 2 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-10-22  9:45 UTC (permalink / raw)
  To: dev, ciara.loftus; +Cc: ann.zhuangyanying

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
 - iface:  The parameter is used to specify a path connect to a
           virtio-net device.
 - queues: The parameter is used to specify the number of the queues
           virtio-net device has.
           (Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=3' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
        <snip>
        -chardev socket,id=chr0,path=/tmp/sock0 \
        -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
        -device virtio-net-pci,netdev=net0

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 config/common_linuxapp                      |   6 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 +++
 drivers/net/vhost/rte_eth_vhost.c           | 735 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           |  65 +++
 drivers/net/vhost/rte_pmd_vhost_version.map |   8 +
 mk/rte.app.mk                               |   8 +-
 7 files changed, 887 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0de43d5..7310240 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -446,6 +446,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 
 #
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 5ebf963..e46a38e 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -49,5 +49,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 include $(RTE_SDK)/mk/rte.sharelib.mk
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..8186a80
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+#   BSD LICENSE
+#
+#   Copyright (c) 2010-2015 Intel Corporation.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..66bfc2b
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,735 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2010-2015 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+
+#include "rte_eth_vhost.h"
+
+#define ETH_VHOST_IFACE_ARG		"iface"
+#define ETH_VHOST_QUEUES_ARG		"queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+	ETH_VHOST_IFACE_ARG,
+	ETH_VHOST_QUEUES_ARG,
+	NULL
+};
+
+static struct ether_addr base_eth_addr = {
+	.addr_bytes = {
+		0x56 /* V */,
+		0x48 /* H */,
+		0x4F /* O */,
+		0x53 /* S */,
+		0x54 /* T */,
+		0x00
+	}
+};
+
+struct vhost_queue {
+	struct virtio_net *device;
+	struct pmd_internal *internal;
+	struct rte_mempool *mb_pool;
+	rte_atomic32_t allow_queuing;
+	rte_atomic32_t while_queuing;
+	rte_atomic64_t rx_pkts;
+	rte_atomic64_t tx_pkts;
+	rte_atomic64_t err_pkts;
+};
+
+struct pmd_internal {
+	TAILQ_ENTRY(pmd_internal) next;
+	char *dev_name;
+	char *iface_name;
+	unsigned nb_rx_queues;
+	unsigned nb_tx_queues;
+
+	struct vhost_queue rx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+	struct vhost_queue tx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+
+	volatile uint16_t once;
+	pthread_t session_th;
+};
+
+TAILQ_HEAD(pmd_internal_head, pmd_internal);
+static struct pmd_internal_head internals_list =
+	TAILQ_HEAD_INITIALIZER(internals_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static struct rte_eth_link pmd_link = {
+		.link_speed = 10000,
+		.link_duplex = ETH_LINK_FULL_DUPLEX,
+		.link_status = 0
+};
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t nb_rx = 0;
+
+	if (unlikely(r->internal == NULL))
+		return 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Dequeue packets from guest TX queue */
+	nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
+			VIRTIO_TXQ, r->mb_pool, bufs, nb_bufs);
+
+	rte_atomic64_add(&(r->rx_pkts), nb_rx);
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_tx = 0;
+
+	if (unlikely(r->internal == NULL))
+		return 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Enqueue packets to guest RX queue */
+	nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
+			VIRTIO_RXQ, bufs, nb_bufs);
+
+	rte_atomic64_add(&(r->tx_pkts), nb_tx);
+	rte_atomic64_add(&(r->err_pkts), nb_bufs - nb_tx);
+
+	for (i = 0; likely(i < nb_tx); i++)
+		rte_pktmbuf_free(bufs[i]);
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { return 0; }
+
+static inline struct pmd_internal *
+find_internal_resource(char *ifname)
+{
+	int found = 0;
+	struct pmd_internal *internal;
+
+	if (ifname == NULL)
+		return NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(internal, &internals_list, next) {
+		if (!strcmp(internal->iface_name, ifname)) {
+			found = 1;
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	if (!found)
+		return NULL;
+
+	return internal;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	uint16_t queues;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "invalid argument\n");
+		return -1;
+	}
+
+	internal = find_internal_resource(dev->ifname);
+	if (internal == NULL) {
+		RTE_LOG(INFO, PMD, "invalid device name\n");
+		return -1;
+	}
+
+	/*
+	 * Todo: To support multi queue, get the number of queues here.
+	 * So far, vhost provides only one queue.
+	 */
+	queues = 1;
+
+	if ((queues < internal->nb_rx_queues) ||
+			(queues < internal->nb_tx_queues)) {
+		RTE_LOG(INFO, PMD, "Not enough queues\n");
+		return -1;
+	}
+
+	eth_dev = rte_eth_dev_allocated(internal->dev_name);
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "failuer to find ethdev\n");
+		return -1;
+	}
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = &internal->rx_vhost_queues[i];
+		vq->device = dev;
+		vq->internal = internal;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = &internal->tx_vhost_queues[i];
+		vq->device = dev;
+		vq->internal = internal;
+	}
+
+	dev->flags |= VIRTIO_DEV_RUNNING;
+	dev->pmd_priv = eth_dev;
+	eth_dev->data->dev_link.link_status = 1;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = &internal->rx_vhost_queues[i];
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = &internal->tx_vhost_queues[i];
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	RTE_LOG(INFO, PMD, "New connection established\n");
+
+	return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "invalid argument\n");
+		return;
+	}
+
+	eth_dev = (struct rte_eth_dev *)dev->pmd_priv;
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "failuer to find a ethdev\n");
+		return;
+	}
+
+	internal = eth_dev->data->dev_private;
+
+	/* Wait until rx/tx_pkt_burst stops accesing vhost device */
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = &internal->rx_vhost_queues[i];
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = &internal->tx_vhost_queues[i];
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+
+	eth_dev->data->dev_link.link_status = 0;
+
+	dev->pmd_priv = NULL;
+	dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = &internal->rx_vhost_queues[i];
+		vq->device = NULL;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = &internal->tx_vhost_queues[i];
+		vq->device = NULL;
+	}
+
+	RTE_LOG(INFO, PMD, "Connection closed\n");
+}
+
+static void *vhost_driver_session(void *param __rte_unused)
+{
+	static struct virtio_net_device_ops *vhost_ops;
+
+	vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
+	if (vhost_ops == NULL)
+		rte_panic("Can't allocate memory\n");
+
+	/* set vhost arguments */
+	vhost_ops->new_device = new_device;
+	vhost_ops->destroy_device = destroy_device;
+	if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
+		rte_panic("Can't register callbacks\n");
+
+	/* start event handling */
+	rte_vhost_driver_session_start();
+
+	rte_free(vhost_ops);
+	pthread_exit(0);
+}
+
+static void vhost_driver_session_start(struct pmd_internal *internal)
+{
+	int ret;
+
+	ret = pthread_create(&internal->session_th,
+			NULL, vhost_driver_session, NULL);
+	if (ret)
+		rte_panic("Can't create a thread\n");
+}
+
+static void vhost_driver_session_stop(struct pmd_internal *internal)
+{
+	int ret;
+
+	ret = pthread_cancel(internal->session_th);
+	if (ret)
+		rte_panic("Can't cancel the thread\n");
+
+	ret = pthread_join(internal->session_th, NULL);
+	if (ret)
+		rte_panic("Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	int ret;
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+		ret = rte_vhost_driver_register(internal->iface_name);
+		if (ret)
+			return ret;
+
+		vhost_driver_session_start(internal);
+	}
+	return 0;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 1, 0)) {
+		rte_vhost_driver_unregister(internal->iface_name);
+		vhost_driver_session_stop(internal);
+	}
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc __rte_unused,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	internal->rx_vhost_queues[rx_queue_id].mb_pool = mb_pool;
+	dev->data->rx_queues[rx_queue_id] = &internal->rx_vhost_queues[rx_queue_id];
+	return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	dev->data->tx_queues[tx_queue_id] = &internal->tx_vhost_queues[tx_queue_id];
+	return 0;
+}
+
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+	     struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	dev_info->driver_name = drivername;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = (uint32_t)-1;
+	dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
+	dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
+	dev_info->min_rx_bufsize = 0;
+	dev_info->pci_dev = dev->pci_dev;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+	unsigned i;
+	unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
+	const struct pmd_internal *internal = dev->data->dev_private;
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+	     i < internal->nb_rx_queues; i++) {
+		igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i].rx_pkts.cnt;
+		rx_total += igb_stats->q_ipackets[i];
+	}
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+	     i < internal->nb_tx_queues; i++) {
+		igb_stats->q_opackets[i] = internal->tx_vhost_queues[i].tx_pkts.cnt;
+		igb_stats->q_errors[i] = internal->tx_vhost_queues[i].err_pkts.cnt;
+		tx_total += igb_stats->q_opackets[i];
+		tx_err_total += igb_stats->q_errors[i];
+	}
+
+	igb_stats->ipackets = rx_total;
+	igb_stats->opackets = tx_total;
+	igb_stats->oerrors = tx_err_total;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	unsigned i;
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	for (i = 0; i < internal->nb_rx_queues; i++)
+		internal->rx_vhost_queues[i].rx_pkts.cnt = 0;
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		internal->tx_vhost_queues[i].tx_pkts.cnt = 0;
+		internal->tx_vhost_queues[i].err_pkts.cnt = 0;
+	}
+}
+
+static void
+eth_queue_release(void *q __rte_unused) { ; }
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused) { return 0; }
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+static struct eth_driver rte_vhost_pmd = {
+	.pci_drv = {
+		.name = "rte_vhost_pmd",
+		.drv_flags = RTE_PCI_DRV_DETACHABLE,
+	},
+};
+
+static struct rte_pci_id id_table;
+
+static int
+eth_dev_vhost_create(const char *name, int index,
+		     char *iface_name,
+		     int16_t queues,
+		     const unsigned numa_node)
+{
+	struct rte_eth_dev_data *data = NULL;
+	struct rte_pci_device *pci_dev = NULL;
+	struct pmd_internal *internal = NULL;
+	struct rte_eth_dev *eth_dev = NULL;
+	struct ether_addr *eth_addr = NULL;
+
+	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+		numa_node);
+
+	/* now do all data allocation - for eth_dev structure, dummy pci driver
+	 * and internal (private) data
+	 */
+	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+	if (data == NULL)
+		goto error;
+
+	pci_dev = rte_zmalloc_socket(name, sizeof(*pci_dev), 0, numa_node);
+	if (pci_dev == NULL)
+		goto error;
+
+	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+	if (internal == NULL)
+		goto error;
+
+	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+	if (eth_addr == NULL)
+		goto error;
+	*eth_addr = base_eth_addr;
+	eth_addr->addr_bytes[5] = index;
+
+	/* reserve an ethdev entry */
+	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+	if (eth_dev == NULL)
+		goto error;
+
+	/* now put it all together
+	 * - store queue data in internal,
+	 * - store numa_node info in pci_driver
+	 * - point eth_dev_data to internal and pci_driver
+	 * - and point eth_dev structure to new eth_dev_data structure
+	 */
+	internal->nb_rx_queues = queues;
+	internal->nb_tx_queues = queues;
+	internal->dev_name = strdup(name);
+	if (internal->dev_name == NULL)
+		goto error;
+	internal->iface_name = strdup(iface_name);
+	if (internal->iface_name == NULL)
+		goto error;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internals_list, internal, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	rte_vhost_pmd.pci_drv.name = drivername;
+	rte_vhost_pmd.pci_drv.id_table = &id_table;
+
+	pci_dev->numa_node = numa_node;
+	pci_dev->driver = &rte_vhost_pmd.pci_drv;
+
+	data->dev_private = internal;
+	data->port_id = eth_dev->data->port_id;
+	memmove(data->name, eth_dev->data->name, sizeof(data->name));
+	data->nb_rx_queues = queues;
+	data->nb_tx_queues = queues;
+	data->dev_link = pmd_link;
+	data->mac_addrs = eth_addr;
+
+	/* We'll replace the 'data' originally allocated by eth_dev. So the
+	 * vhost PMD resources won't be shared between multi processes.
+	 */
+	eth_dev->data = data;
+	eth_dev->driver = &rte_vhost_pmd;
+	eth_dev->dev_ops = &ops;
+	eth_dev->pci_dev = pci_dev;
+
+	/* finally assign rx and tx ops */
+	eth_dev->rx_pkt_burst = eth_vhost_rx;
+	eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+	return data->port_id;
+
+error:
+	rte_free(data);
+	rte_free(pci_dev);
+	rte_free(internal);
+	rte_free(eth_addr);
+
+	return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	const char **iface_name = extra_args;
+
+	if (value == NULL)
+		return -1;
+
+	*iface_name = value;
+
+	return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *q = extra_args;
+
+	if ((value == NULL) || (extra_args == NULL))
+		return -EINVAL;
+
+	*q = (uint16_t)strtoul(value, NULL, 0);
+	if ((*q == USHRT_MAX) && (errno == ERANGE))
+		return -1;
+
+	if (*q > RTE_MAX_QUEUES_PER_PORT)
+		return -1;
+
+	return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
+	int index;
+	char *iface_name;
+	uint16_t queues;
+
+	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+		return -1;
+
+	if (strlen(name) < strlen("eth_vhost"))
+		return -1;
+
+	index = strtol(name + strlen("eth_vhost"), NULL, 0);
+	if (errno == ERANGE)
+		return -1;
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+					 &open_iface, &iface_name);
+		if (ret < 0)
+			goto out_free;
+	}
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+					 &open_queues, &queues);
+		if (ret < 0)
+			goto out_free;
+
+	} else
+		queues = 1;
+
+	eth_dev_vhost_create(name, index,
+			iface_name, queues, rte_socket_id());
+
+out_free:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internal *internal;
+
+	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+	if (name == NULL)
+		return -EINVAL;
+
+	/* find an ethdev entry */
+	eth_dev = rte_eth_dev_allocated(name);
+	if (eth_dev == NULL)
+		return -ENODEV;
+
+	internal = eth_dev->data->dev_private;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_REMOVE(&internals_list, internal, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	eth_dev_stop(eth_dev);
+
+	if ((internal) && (internal->dev_name))
+		free(internal->dev_name);
+	if ((internal) && (internal->iface_name))
+		free(internal->iface_name);
+	rte_free(eth_dev->data->dev_private);
+	rte_free(eth_dev->data);
+	rte_free(eth_dev->pci_dev);
+
+	rte_eth_dev_release_port(eth_dev);
+	return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+	.name = "eth_vhost",
+	.type = PMD_VDEV,
+	.init = rte_pmd_vhost_devinit,
+	.uninit = rte_pmd_vhost_devuninit,
+};
+
+struct
+virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
+{
+	struct rte_eth_dev *eth_dev;
+
+	if (rte_eth_dev_is_valid_port(port_id) == 0)
+		return NULL;
+
+	eth_dev = &rte_eth_devices[port_id];
+	if (eth_dev->driver == &rte_vhost_pmd) {
+		struct pmd_internal *internal;
+		struct vhost_queue *vq;
+
+		internal = eth_dev->data->dev_private;
+		vq = &internal->rx_vhost_queues[0];
+		if (vq->device)
+			return vq->device;
+	}
+
+	return NULL;
+}
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..0c4d4b5
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,65 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_AF_PACKET_H_
+#define _RTE_ETH_AF_PACKET_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_virtio_net.h>
+
+/**
+ * The function convert specified port_id to virtio device structure.
+ * The retured device can be used for vhost library APIs.
+ * To use vhost library APIs and vhost PMD parallely, below API should
+ * not be called, because the API will be called by vhost PMD.
+ * - rte_vhost_driver_session_start()
+ * Once a device is managed by vhost PMD, below API should not be called.
+ * - rte_vhost_driver_unregister()
+ * To unregister the device, call Port Hotplug APIs.
+ *
+ * @param port_id
+ *  port number
+ * @return
+ *  virtio net device structure corresponding to the specified port
+ *  NULL will be returned in error cases.
+ */
+struct virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..bf0361a
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,8 @@
+DPDK_2.2 {
+
+	global:
+
+	rte_eth_vhost_portid2vdev;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 3871205..1c42fb1 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -144,7 +144,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP)       += -lrte_pmd_pcap
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL)       += -lrte_pmd_null
 
-endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
+
+endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
+
+endif # $(CONFIG_RTE_BUILD_SHARED_LIB)
 
 endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v2] vhost: Add VHOST PMD
  2015-10-21 10:22                     ` Bruce Richardson
@ 2015-10-22  9:50                       ` Tetsuya Mukawa
  2015-10-27 13:44                         ` Traynor, Kevin
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-10-22  9:50 UTC (permalink / raw)
  To: Bruce Richardson, Panu Matilainen; +Cc: dev, ann.zhuangyanying

On 2015/10/21 19:22, Bruce Richardson wrote:
> On Wed, Oct 21, 2015 at 09:25:12AM +0300, Panu Matilainen wrote:
>> On 10/21/2015 07:35 AM, Tetsuya Mukawa wrote:
>>> On 2015/10/19 22:27, Richardson, Bruce wrote:
>>>>> -----Original Message-----
>>>>> From: Panu Matilainen [mailto:pmatilai@redhat.com]
>>>>> Sent: Monday, October 19, 2015 2:26 PM
>>>>> To: Tetsuya Mukawa <mukawa@igel.co.jp>; Richardson, Bruce
>>>>> <bruce.richardson@intel.com>; Loftus, Ciara <ciara.loftus@intel.com>
>>>>> Cc: dev@dpdk.org; ann.zhuangyanying@huawei.com
>>>>> Subject: Re: [dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD
>>>>>
>>>>> On 10/19/2015 01:50 PM, Tetsuya Mukawa wrote:
>>>>>> On 2015/10/19 18:45, Bruce Richardson wrote:
>>>>>>> On Mon, Oct 19, 2015 at 10:32:50AM +0100, Loftus, Ciara wrote:
>>>>>>>>> On 2015/10/16 21:52, Bruce Richardson wrote:
>>>>>>>>>> On Mon, Aug 31, 2015 at 12:55:26PM +0900, Tetsuya Mukawa wrote:
>>>>>>>>>>> The patch introduces a new PMD. This PMD is implemented as thin
>>>>>>>>> wrapper
>>>>>>>>>>> of librte_vhost. It means librte_vhost is also needed to compile
>>>>> the PMD.
>>>>>>>>>>> The PMD can have 'iface' parameter like below to specify a path
>>>>>>>>>>> to
>>>>>>>>> connect
>>>>>>>>>>> to a virtio-net device.
>>>>>>>>>>>
>>>>>>>>>>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
>>>>>>>>>>>
>>>>>>>>>>> To connect above testpmd, here is qemu command example.
>>>>>>>>>>>
>>>>>>>>>>> $ qemu-system-x86_64 \
>>>>>>>>>>>          <snip>
>>>>>>>>>>>          -chardev socket,id=chr0,path=/tmp/sock0 \
>>>>>>>>>>>          -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
>>>>>>>>>>>          -device virtio-net-pci,netdev=net0
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
>>>>>>>>>> With this PMD in place, is there any need to keep the existing
>>>>>>>>>> vhost library around as a separate entity? Can the existing
>>>>>>>>>> library be
>>>>>>>>> subsumed/converted into
>>>>>>>>>> a standard PMD?
>>>>>>>>>>
>>>>>>>>>> /Bruce
>>>>>>>>> Hi Bruce,
>>>>>>>>>
>>>>>>>>> I concern about whether the PMD has all features of librte_vhost,
>>>>>>>>> because librte_vhost provides more features and freedom than ethdev
>>>>>>>>> API provides.
>>>>>>>>> In some cases, user needs to choose limited implementation without
>>>>>>>>> librte_vhost.
>>>>>>>>> I am going to eliminate such cases while implementing the PMD.
>>>>>>>>> But I don't have strong belief that we can remove librte_vhost now.
>>>>>>>>>
>>>>>>>>> So how about keeping current separation in next DPDK?
>>>>>>>>> I guess people will try to replace librte_vhost to vhost PMD,
>>>>>>>>> because apparently using ethdev APIs will be useful in many cases.
>>>>>>>>> And we will get feedbacks like "vhost PMD needs to support like this
>>>>> usage".
>>>>>>>>> (Or we will not have feedbacks, but it's also OK.) Then, we will be
>>>>>>>>> able to merge librte_vhost and vhost PMD.
>>>>>>>> I agree with the above. One the concerns I had when reviewing the
>>>>> patch was that the PMD removes some freedom that is available with the
>>>>> library. Eg. Ability to implement the new_device and destroy_device
>>>>> callbacks. If using the PMD you are constrained to the implementations of
>>>>> these in the PMD driver, but if using librte_vhost, you can implement your
>>>>> own with whatever functionality you like - a good example of this can be
>>>>> seen in the vhost sample app.
>>>>>>>> On the other hand, the PMD is useful in that it removes a lot of
>>>>> complexity for the user and may work for some more general use cases. So I
>>>>> would be in favour of having both options available too.
>>>>>>>> Ciara
>>>>>>>>
>>>>>>> Thanks.
>>>>>>> However, just because the libraries are merged does not mean that you
>>>>>>> need be limited by PMD functionality. Many PMDs provide additional
>>>>>>> library-specific functions over and above their PMD capabilities. The
>>>>>>> bonded PMD is a good example here, as it has a whole set of extra
>>>>>>> functions to create and manipulate bonded devices - things that are
>>>>>>> obviously not part of the general ethdev API. Other vPMDs similarly
>>>>> include functions to allow them to be created on the fly too.
>>>>>>> regards,
>>>>>>> /Bruce
>>>>>> Hi Bruce,
>>>>>>
>>>>>> I appreciate for showing a good example. I haven't noticed the PMD.
>>>>>> I will check the bonding PMD, and try to remove librte_vhost without
>>>>>> losing freedom and features of the library.
>>>>> Hi,
>>>>>
>>>>> Just a gentle reminder - if you consider removing (even if by just
>>>>> replacing/renaming) an entire library, it needs to happen the ABI
>>>>> deprecation process.
>>>>>
>>>>> It seems obvious enough. But for all the ABI policing here, somehow we all
>>>>> failed to notice the two compatibility breaking rename-elephants in the
>>>>> room during 2.1 development:
>>>>> - libintel_dpdk was renamed to libdpdk
>>>>> - librte_pmd_virtio_uio was renamed to librte_pmd_virtio
>>>>>
>>>>> Of course these cases are easy to work around with symlinks, and are
>>>>> unrelated to the matter at hand. Just wanting to make sure such things
>>>>> dont happen again.
>>>>>
>>>>> 	- Panu -
>>>> Still doesn't hurt to remind us, Panu! Thanks. :-)
>>> Hi,
>>>
>>> Thanks for reminder. I've checked the DPDK documentation.
>>> I will submit deprecation notice to follow DPDK deprecation process.
>>> (Probably we will be able to remove vhost library in DPDK-2.3 or later.)
>>>
>>> BTW, I will merge vhost library and PMD like below.
>>> Step1. Move vhost library under vhost PMD.
>>> Step2. Rename current APIs.
>>> Step3. Add a function to get a pointer of "struct virtio_net device" by
>>> a portno.
>>>
>>> Last steps allows us to be able to convert a portno to the pointer of
>>> corresponding vrtio_net device.
>>> And we can still use features and freedom vhost library APIs provided.
>> Just wondering, is that *really* worth the price of breaking every single
>> vhost library user out there?
>>
>> I mean, this is not about removing some bitrotten function or two which
>> nobody cares about anymore but removing (by renaming) one of the more widely
>> (AFAICS) used libraries and its entire API.
>>
>> If current APIs are kept then compatibility is largely a matter of planting
>> a strategic symlink or two, but it might make the API look inconsistent.
>>
>> But just wondering about the benefit of this merge thing, compared to just
>> adding a vhost pmd and leaving the library be. The ABI process is not there
>> to make life miserable for DPDK developers, its there to help make DPDK
>> nicer for *other* developers. And the first and the foremost rule is simply:
>> dont break backwards compatibility. Not unless there's a damn good reason to
>> doing so, and I fail to see that reason here.
>>
>> 	- Panu -
>>
> Good question, and I'll accept that maybe it's not worth doing. I'm not that
> much of an expert on the internals and APIs of vhost library.
>
> However, the merge I was looking for was more from a code locality point
> of view, to have all the vhost code in one directory (under drivers/net),
> than spread across multiple ones. What API's need to be deprecated
> or not as part of that work, is a separate question, and so in theory we could
> create a combined vhost library that does not deprecate anything (though to
> avoid a build-up of technical debt, we'll probably want to deprecate some 
> functions).
>
> I'll leave it up to the vhost experts do decide what's best, but for me, any
> library that handles transmission and reception of packets outside of a DPDK
> app should be a PMD library using ethdev rx/tx burst routines, and located
> under drivers/net. (KNI is another obvious target for such a move and conversion).
>
> Regards,
> /Bruce
>

Hi,

I have submitted latest patches.
I will keep vhost library until we will have agreement to merge it to
vhost PMD.

Regards,
Testuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v3 2/2] vhost: Add VHOST PMD
  2015-10-22  9:45     ` [RFC PATCH v3 2/2] vhost: " Tetsuya Mukawa
@ 2015-10-22 12:49       ` Bruce Richardson
  2015-10-23  3:48         ` Tetsuya Mukawa
  2015-10-29 14:25       ` Xie, Huawei
  1 sibling, 1 reply; 200+ messages in thread
From: Bruce Richardson @ 2015-10-22 12:49 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Thu, Oct 22, 2015 at 06:45:50PM +0900, Tetsuya Mukawa wrote:
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> The vhost messages will be handled only when a port is started. So start
> a port first, then invoke QEMU.
> 
> The PMD has 2 parameters.
>  - iface:  The parameter is used to specify a path connect to a
>            virtio-net device.
>  - queues: The parameter is used to specify the number of the queues
>            virtio-net device has.
>            (Default: 1)
> 
> Here is an example.
> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=3' -- -i
> 
> To connect above testpmd, here is qemu command example.
> 
> $ qemu-system-x86_64 \
>         <snip>
>         -chardev socket,id=chr0,path=/tmp/sock0 \
>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
>         -device virtio-net-pci,netdev=net0
> 
> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>

Hi Tetsuya,

a few comments inline below.

/Bruce

> ---
>  config/common_linuxapp                      |   6 +
<snip>
> index 0000000..66bfc2b
> --- /dev/null
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -0,0 +1,735 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright (c) 2010-2015 Intel Corporation.
This is probably not the copyright line you want on your new files.

> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +#include <unistd.h>
> +#include <pthread.h>
> +
> +#include <rte_mbuf.h>
> +#include <rte_ethdev.h>
> +#include <rte_malloc.h>
> +#include <rte_memcpy.h>
> +#include <rte_dev.h>
> +#include <rte_kvargs.h>
> +#include <rte_virtio_net.h>
> +
> +#include "rte_eth_vhost.h"
> +
> +#define ETH_VHOST_IFACE_ARG		"iface"
> +#define ETH_VHOST_QUEUES_ARG		"queues"
> +
> +static const char *drivername = "VHOST PMD";
> +
> +static const char *valid_arguments[] = {
> +	ETH_VHOST_IFACE_ARG,
> +	ETH_VHOST_QUEUES_ARG,
> +	NULL
> +};
> +
> +static struct ether_addr base_eth_addr = {
> +	.addr_bytes = {
> +		0x56 /* V */,
> +		0x48 /* H */,
> +		0x4F /* O */,
> +		0x53 /* S */,
> +		0x54 /* T */,
> +		0x00
> +	}
> +};
> +
> +struct vhost_queue {
> +	struct virtio_net *device;
> +	struct pmd_internal *internal;
> +	struct rte_mempool *mb_pool;
> +	rte_atomic32_t allow_queuing;
> +	rte_atomic32_t while_queuing;
> +	rte_atomic64_t rx_pkts;
> +	rte_atomic64_t tx_pkts;
> +	rte_atomic64_t err_pkts;
> +};
> +
> +struct pmd_internal {
> +	TAILQ_ENTRY(pmd_internal) next;
> +	char *dev_name;
> +	char *iface_name;
> +	unsigned nb_rx_queues;
> +	unsigned nb_tx_queues;
> +
> +	struct vhost_queue rx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
> +	struct vhost_queue tx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
> +
> +	volatile uint16_t once;
> +	pthread_t session_th;
> +};
> +
> +TAILQ_HEAD(pmd_internal_head, pmd_internal);
> +static struct pmd_internal_head internals_list =
> +	TAILQ_HEAD_INITIALIZER(internals_list);
> +
> +static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
> +
> +static struct rte_eth_link pmd_link = {
> +		.link_speed = 10000,
> +		.link_duplex = ETH_LINK_FULL_DUPLEX,
> +		.link_status = 0
> +};
> +
> +static uint16_t
> +eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
> +	struct vhost_queue *r = q;
> +	uint16_t nb_rx = 0;
> +
> +	if (unlikely(r->internal == NULL))
> +		return 0;
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		return 0;
> +
> +	rte_atomic32_set(&r->while_queuing, 1);
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		goto out;
> +
> +	/* Dequeue packets from guest TX queue */
> +	nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
> +			VIRTIO_TXQ, r->mb_pool, bufs, nb_bufs);
> +
> +	rte_atomic64_add(&(r->rx_pkts), nb_rx);

Do we really need to use atomics here? It will slow things down a lot. For
other PMDs the assumption is always that only a single thread can access each
queue at a time - it's up to the app to use locks to enforce that restriction
if necessary.

> +
> +out:
> +	rte_atomic32_set(&r->while_queuing, 0);
> +
> +	return nb_rx;
> +}
> +
> +static uint16_t
> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
> +	struct vhost_queue *r = q;
> +	uint16_t i, nb_tx = 0;
> +
> +	if (unlikely(r->internal == NULL))
> +		return 0;
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		return 0;
> +
> +	rte_atomic32_set(&r->while_queuing, 1);
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		goto out;
> +
> +	/* Enqueue packets to guest RX queue */
> +	nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
> +			VIRTIO_RXQ, bufs, nb_bufs);
> +
> +	rte_atomic64_add(&(r->tx_pkts), nb_tx);
> +	rte_atomic64_add(&(r->err_pkts), nb_bufs - nb_tx);
> +
> +	for (i = 0; likely(i < nb_tx); i++)
> +		rte_pktmbuf_free(bufs[i]);
> +
> +out:
> +	rte_atomic32_set(&r->while_queuing, 0);
> +
> +	return nb_tx;
> +}
> +
> +static int
> +eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { return 0; }
> +
> +static inline struct pmd_internal *
> +find_internal_resource(char *ifname)
> +{
> +	int found = 0;
> +	struct pmd_internal *internal;
> +
> +	if (ifname == NULL)
> +		return NULL;
> +
> +	pthread_mutex_lock(&internal_list_lock);
> +
> +	TAILQ_FOREACH(internal, &internals_list, next) {
> +		if (!strcmp(internal->iface_name, ifname)) {
> +			found = 1;
> +			break;
> +		}
> +	}
> +
> +	pthread_mutex_unlock(&internal_list_lock);
> +
> +	if (!found)
> +		return NULL;
> +
> +	return internal;
> +}
> +
> +static int
> +new_device(struct virtio_net *dev)
> +{
> +	struct rte_eth_dev *eth_dev;
> +	struct pmd_internal *internal;
> +	struct vhost_queue *vq;
> +	uint16_t queues;
> +	unsigned i;
> +
> +	if (dev == NULL) {
> +		RTE_LOG(INFO, PMD, "invalid argument\n");
> +		return -1;
> +	}
> +
> +	internal = find_internal_resource(dev->ifname);
> +	if (internal == NULL) {
> +		RTE_LOG(INFO, PMD, "invalid device name\n");
> +		return -1;
> +	}
> +
> +	/*
> +	 * Todo: To support multi queue, get the number of queues here.
> +	 * So far, vhost provides only one queue.
> +	 */
> +	queues = 1;
> +
> +	if ((queues < internal->nb_rx_queues) ||
> +			(queues < internal->nb_tx_queues)) {
> +		RTE_LOG(INFO, PMD, "Not enough queues\n");
> +		return -1;
> +	}
> +
> +	eth_dev = rte_eth_dev_allocated(internal->dev_name);
> +	if (eth_dev == NULL) {
> +		RTE_LOG(INFO, PMD, "failuer to find ethdev\n");

typo "failure". Probably shoudl also be written just as "Failed to find ethdev".

> +		return -1;
> +	}
> +
> +	for (i = 0; i < internal->nb_rx_queues; i++) {
> +		vq = &internal->rx_vhost_queues[i];
> +		vq->device = dev;
> +		vq->internal = internal;
> +	}
> +	for (i = 0; i < internal->nb_tx_queues; i++) {
> +		vq = &internal->tx_vhost_queues[i];
> +		vq->device = dev;
> +		vq->internal = internal;
> +	}
> +
> +	dev->flags |= VIRTIO_DEV_RUNNING;
> +	dev->pmd_priv = eth_dev;
> +	eth_dev->data->dev_link.link_status = 1;
> +
> +	for (i = 0; i < internal->nb_rx_queues; i++) {
> +		vq = &internal->rx_vhost_queues[i];
> +		rte_atomic32_set(&vq->allow_queuing, 1);
> +	}
> +	for (i = 0; i < internal->nb_tx_queues; i++) {
> +		vq = &internal->tx_vhost_queues[i];
> +		rte_atomic32_set(&vq->allow_queuing, 1);
> +	}
> +	RTE_LOG(INFO, PMD, "New connection established\n");
> +
> +	return 0;
> +}
> +
> +static void
> +destroy_device(volatile struct virtio_net *dev)
> +{
> +	struct rte_eth_dev *eth_dev;
> +	struct pmd_internal *internal;
> +	struct vhost_queue *vq;
> +	unsigned i;
> +
> +	if (dev == NULL) {
> +		RTE_LOG(INFO, PMD, "invalid argument\n");
> +		return;
> +	}
> +
> +	eth_dev = (struct rte_eth_dev *)dev->pmd_priv;
> +	if (eth_dev == NULL) {
> +		RTE_LOG(INFO, PMD, "failuer to find a ethdev\n");
> +		return;
> +	}
> +
> +	internal = eth_dev->data->dev_private;
> +
> +	/* Wait until rx/tx_pkt_burst stops accesing vhost device */
> +	for (i = 0; i < internal->nb_rx_queues; i++) {
> +		vq = &internal->rx_vhost_queues[i];
> +		rte_atomic32_set(&vq->allow_queuing, 0);
> +		while (rte_atomic32_read(&vq->while_queuing))
> +			rte_pause();
> +	}
> +	for (i = 0; i < internal->nb_tx_queues; i++) {
> +		vq = &internal->tx_vhost_queues[i];
> +		rte_atomic32_set(&vq->allow_queuing, 0);
> +		while (rte_atomic32_read(&vq->while_queuing))
> +			rte_pause();
> +	}
> +
> +	eth_dev->data->dev_link.link_status = 0;
> +
> +	dev->pmd_priv = NULL;
> +	dev->flags &= ~VIRTIO_DEV_RUNNING;
> +
> +	for (i = 0; i < internal->nb_rx_queues; i++) {
> +		vq = &internal->rx_vhost_queues[i];
> +		vq->device = NULL;
> +	}
> +	for (i = 0; i < internal->nb_tx_queues; i++) {
> +		vq = &internal->tx_vhost_queues[i];
> +		vq->device = NULL;
> +	}
> +
> +	RTE_LOG(INFO, PMD, "Connection closed\n");
> +}
> +
> +static void *vhost_driver_session(void *param __rte_unused)
> +{
> +	static struct virtio_net_device_ops *vhost_ops;
> +
> +	vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
> +	if (vhost_ops == NULL)
> +		rte_panic("Can't allocate memory\n");
> +
> +	/* set vhost arguments */
> +	vhost_ops->new_device = new_device;
> +	vhost_ops->destroy_device = destroy_device;
> +	if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
> +		rte_panic("Can't register callbacks\n");
> +
> +	/* start event handling */
> +	rte_vhost_driver_session_start();
> +
> +	rte_free(vhost_ops);
> +	pthread_exit(0);
> +}
> +
> +static void vhost_driver_session_start(struct pmd_internal *internal)
> +{
> +	int ret;
> +
> +	ret = pthread_create(&internal->session_th,
> +			NULL, vhost_driver_session, NULL);
> +	if (ret)
> +		rte_panic("Can't create a thread\n");
> +}
> +
> +static void vhost_driver_session_stop(struct pmd_internal *internal)
> +{
> +	int ret;
> +
> +	ret = pthread_cancel(internal->session_th);
> +	if (ret)
> +		rte_panic("Can't cancel the thread\n");
> +
> +	ret = pthread_join(internal->session_th, NULL);
> +	if (ret)
> +		rte_panic("Can't join the thread\n");
> +}
> +
> +static int
> +eth_dev_start(struct rte_eth_dev *dev)
> +{
> +	int ret;
> +	struct pmd_internal *internal = dev->data->dev_private;
> +
> +	if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
> +		ret = rte_vhost_driver_register(internal->iface_name);
> +		if (ret)
> +			return ret;
> +
> +		vhost_driver_session_start(internal);
> +	}
> +	return 0;
> +}
> +
> +static void
> +eth_dev_stop(struct rte_eth_dev *dev)
> +{
> +	struct pmd_internal *internal = dev->data->dev_private;
> +
> +	if (rte_atomic16_cmpset(&internal->once, 1, 0)) {
> +		rte_vhost_driver_unregister(internal->iface_name);
> +		vhost_driver_session_stop(internal);
> +	}
> +}
> +
> +static int
> +eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
> +		   uint16_t nb_rx_desc __rte_unused,
> +		   unsigned int socket_id __rte_unused,
> +		   const struct rte_eth_rxconf *rx_conf __rte_unused,
> +		   struct rte_mempool *mb_pool)
> +{
> +	struct pmd_internal *internal = dev->data->dev_private;
> +
> +	internal->rx_vhost_queues[rx_queue_id].mb_pool = mb_pool;
> +	dev->data->rx_queues[rx_queue_id] = &internal->rx_vhost_queues[rx_queue_id];
> +	return 0;
> +}
> +
> +static int
> +eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
> +		   uint16_t nb_tx_desc __rte_unused,
> +		   unsigned int socket_id __rte_unused,
> +		   const struct rte_eth_txconf *tx_conf __rte_unused)
> +{
> +	struct pmd_internal *internal = dev->data->dev_private;
> +
> +	dev->data->tx_queues[tx_queue_id] = &internal->tx_vhost_queues[tx_queue_id];
> +	return 0;
> +}
> +
> +
> +static void
> +eth_dev_info(struct rte_eth_dev *dev,
> +	     struct rte_eth_dev_info *dev_info)
> +{
> +	struct pmd_internal *internal = dev->data->dev_private;
> +
> +	dev_info->driver_name = drivername;
> +	dev_info->max_mac_addrs = 1;
> +	dev_info->max_rx_pktlen = (uint32_t)-1;
> +	dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
> +	dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
> +	dev_info->min_rx_bufsize = 0;
> +	dev_info->pci_dev = dev->pci_dev;
> +}
> +
> +static void
> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
> +{
> +	unsigned i;
> +	unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
> +	const struct pmd_internal *internal = dev->data->dev_private;
> +
> +	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
> +	     i < internal->nb_rx_queues; i++) {
> +		igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i].rx_pkts.cnt;
> +		rx_total += igb_stats->q_ipackets[i];
> +	}
> +
> +	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
> +	     i < internal->nb_tx_queues; i++) {
> +		igb_stats->q_opackets[i] = internal->tx_vhost_queues[i].tx_pkts.cnt;
> +		igb_stats->q_errors[i] = internal->tx_vhost_queues[i].err_pkts.cnt;
> +		tx_total += igb_stats->q_opackets[i];
> +		tx_err_total += igb_stats->q_errors[i];
> +	}
> +
> +	igb_stats->ipackets = rx_total;
> +	igb_stats->opackets = tx_total;
> +	igb_stats->oerrors = tx_err_total;
> +}
> +
> +static void
> +eth_stats_reset(struct rte_eth_dev *dev)
> +{
> +	unsigned i;
> +	struct pmd_internal *internal = dev->data->dev_private;
> +
> +	for (i = 0; i < internal->nb_rx_queues; i++)
> +		internal->rx_vhost_queues[i].rx_pkts.cnt = 0;
> +	for (i = 0; i < internal->nb_tx_queues; i++) {
> +		internal->tx_vhost_queues[i].tx_pkts.cnt = 0;
> +		internal->tx_vhost_queues[i].err_pkts.cnt = 0;
> +	}
> +}
> +
> +static void
> +eth_queue_release(void *q __rte_unused) { ; }
> +static int
> +eth_link_update(struct rte_eth_dev *dev __rte_unused,
> +		int wait_to_complete __rte_unused) { return 0; }
> +
> +static const struct eth_dev_ops ops = {
> +	.dev_start = eth_dev_start,
> +	.dev_stop = eth_dev_stop,
> +	.dev_configure = eth_dev_configure,
> +	.dev_infos_get = eth_dev_info,
> +	.rx_queue_setup = eth_rx_queue_setup,
> +	.tx_queue_setup = eth_tx_queue_setup,
> +	.rx_queue_release = eth_queue_release,
> +	.tx_queue_release = eth_queue_release,
> +	.link_update = eth_link_update,
> +	.stats_get = eth_stats_get,
> +	.stats_reset = eth_stats_reset,
> +};
> +
> +static struct eth_driver rte_vhost_pmd = {
> +	.pci_drv = {
> +		.name = "rte_vhost_pmd",
> +		.drv_flags = RTE_PCI_DRV_DETACHABLE,
> +	},
> +};

If you base this patchset on top of Bernard's patchset to remove the PCI devices
then you shouldn't need these pci_dev and id_table structures.

> +
> +static struct rte_pci_id id_table;
> +
> +static int
> +eth_dev_vhost_create(const char *name, int index,
> +		     char *iface_name,
> +		     int16_t queues,
> +		     const unsigned numa_node)
> +{
> +	struct rte_eth_dev_data *data = NULL;
> +	struct rte_pci_device *pci_dev = NULL;
> +	struct pmd_internal *internal = NULL;
> +	struct rte_eth_dev *eth_dev = NULL;
> +	struct ether_addr *eth_addr = NULL;
> +
> +	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
> +		numa_node);
> +
> +	/* now do all data allocation - for eth_dev structure, dummy pci driver
> +	 * and internal (private) data
> +	 */
> +	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
> +	if (data == NULL)
> +		goto error;
> +
> +	pci_dev = rte_zmalloc_socket(name, sizeof(*pci_dev), 0, numa_node);
> +	if (pci_dev == NULL)
> +		goto error;
> +
> +	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
> +	if (internal == NULL)
> +		goto error;
> +
> +	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
> +	if (eth_addr == NULL)
> +		goto error;
> +	*eth_addr = base_eth_addr;
> +	eth_addr->addr_bytes[5] = index;
> +
> +	/* reserve an ethdev entry */
> +	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
> +	if (eth_dev == NULL)
> +		goto error;
> +
> +	/* now put it all together
> +	 * - store queue data in internal,
> +	 * - store numa_node info in pci_driver
> +	 * - point eth_dev_data to internal and pci_driver
> +	 * - and point eth_dev structure to new eth_dev_data structure
> +	 */
> +	internal->nb_rx_queues = queues;
> +	internal->nb_tx_queues = queues;
> +	internal->dev_name = strdup(name);
> +	if (internal->dev_name == NULL)
> +		goto error;
> +	internal->iface_name = strdup(iface_name);
> +	if (internal->iface_name == NULL)
> +		goto error;
> +
> +	pthread_mutex_lock(&internal_list_lock);
> +	TAILQ_INSERT_TAIL(&internals_list, internal, next);
> +	pthread_mutex_unlock(&internal_list_lock);
> +
> +	rte_vhost_pmd.pci_drv.name = drivername;
> +	rte_vhost_pmd.pci_drv.id_table = &id_table;
> +
> +	pci_dev->numa_node = numa_node;
> +	pci_dev->driver = &rte_vhost_pmd.pci_drv;
> +
> +	data->dev_private = internal;
> +	data->port_id = eth_dev->data->port_id;
> +	memmove(data->name, eth_dev->data->name, sizeof(data->name));
> +	data->nb_rx_queues = queues;
> +	data->nb_tx_queues = queues;
> +	data->dev_link = pmd_link;
> +	data->mac_addrs = eth_addr;
> +
> +	/* We'll replace the 'data' originally allocated by eth_dev. So the
> +	 * vhost PMD resources won't be shared between multi processes.
> +	 */
> +	eth_dev->data = data;
> +	eth_dev->driver = &rte_vhost_pmd;
> +	eth_dev->dev_ops = &ops;
> +	eth_dev->pci_dev = pci_dev;
> +
> +	/* finally assign rx and tx ops */
> +	eth_dev->rx_pkt_burst = eth_vhost_rx;
> +	eth_dev->tx_pkt_burst = eth_vhost_tx;
> +
> +	return data->port_id;
> +
> +error:
> +	rte_free(data);
> +	rte_free(pci_dev);
> +	rte_free(internal);
> +	rte_free(eth_addr);
> +
> +	return -1;
> +}
> +
> +static inline int
> +open_iface(const char *key __rte_unused, const char *value, void *extra_args)
> +{
> +	const char **iface_name = extra_args;
> +
> +	if (value == NULL)
> +		return -1;
> +
> +	*iface_name = value;
> +
> +	return 0;
> +}
> +
> +static inline int
> +open_queues(const char *key __rte_unused, const char *value, void *extra_args)
> +{
> +	uint16_t *q = extra_args;
> +
> +	if ((value == NULL) || (extra_args == NULL))
> +		return -EINVAL;
> +
> +	*q = (uint16_t)strtoul(value, NULL, 0);
> +	if ((*q == USHRT_MAX) && (errno == ERANGE))
> +		return -1;
> +
> +	if (*q > RTE_MAX_QUEUES_PER_PORT)
> +		return -1;
> +
> +	return 0;
> +}
> +
> +static int
> +rte_pmd_vhost_devinit(const char *name, const char *params)
> +{
> +	struct rte_kvargs *kvlist = NULL;
> +	int ret = 0;
> +	int index;
> +	char *iface_name;
> +	uint16_t queues;
> +
> +	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
> +
> +	kvlist = rte_kvargs_parse(params, valid_arguments);
> +	if (kvlist == NULL)
> +		return -1;
> +
> +	if (strlen(name) < strlen("eth_vhost"))
> +		return -1;
> +
> +	index = strtol(name + strlen("eth_vhost"), NULL, 0);
> +	if (errno == ERANGE)
> +		return -1;
> +
> +	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
> +		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
> +					 &open_iface, &iface_name);
> +		if (ret < 0)
> +			goto out_free;
> +	}
> +
> +	if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
> +		ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
> +					 &open_queues, &queues);
> +		if (ret < 0)
> +			goto out_free;
> +
> +	} else
> +		queues = 1;
> +
> +	eth_dev_vhost_create(name, index,
> +			iface_name, queues, rte_socket_id());
> +
> +out_free:
> +	rte_kvargs_free(kvlist);
> +	return ret;
> +}
> +
> +static int
> +rte_pmd_vhost_devuninit(const char *name)
> +{
> +	struct rte_eth_dev *eth_dev = NULL;
> +	struct pmd_internal *internal;
> +
> +	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
> +
> +	if (name == NULL)
> +		return -EINVAL;
> +
> +	/* find an ethdev entry */
> +	eth_dev = rte_eth_dev_allocated(name);
> +	if (eth_dev == NULL)
> +		return -ENODEV;
> +
> +	internal = eth_dev->data->dev_private;
> +
> +	pthread_mutex_lock(&internal_list_lock);
> +	TAILQ_REMOVE(&internals_list, internal, next);
> +	pthread_mutex_unlock(&internal_list_lock);
> +
> +	eth_dev_stop(eth_dev);
> +
> +	if ((internal) && (internal->dev_name))
> +		free(internal->dev_name);
> +	if ((internal) && (internal->iface_name))
> +		free(internal->iface_name);
> +	rte_free(eth_dev->data->dev_private);
> +	rte_free(eth_dev->data);
> +	rte_free(eth_dev->pci_dev);
> +
> +	rte_eth_dev_release_port(eth_dev);
> +	return 0;
> +}
> +
> +static struct rte_driver pmd_vhost_drv = {
> +	.name = "eth_vhost",
> +	.type = PMD_VDEV,
> +	.init = rte_pmd_vhost_devinit,
> +	.uninit = rte_pmd_vhost_devuninit,
> +};
> +
> +struct
> +virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
> +{
> +	struct rte_eth_dev *eth_dev;
> +
> +	if (rte_eth_dev_is_valid_port(port_id) == 0)
> +		return NULL;
> +
> +	eth_dev = &rte_eth_devices[port_id];
> +	if (eth_dev->driver == &rte_vhost_pmd) {
> +		struct pmd_internal *internal;
> +		struct vhost_queue *vq;
> +
> +		internal = eth_dev->data->dev_private;
> +		vq = &internal->rx_vhost_queues[0];
> +		if (vq->device)
> +			return vq->device;
> +	}
> +
> +	return NULL;
> +}
> +
> +PMD_REGISTER_DRIVER(pmd_vhost_drv);
> diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
> new file mode 100644
> index 0000000..0c4d4b5
> --- /dev/null
> +++ b/drivers/net/vhost/rte_eth_vhost.h
> @@ -0,0 +1,65 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2015 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_ETH_AF_PACKET_H_
> +#define _RTE_ETH_AF_PACKET_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <rte_virtio_net.h>
> +
> +/**
> + * The function convert specified port_id to virtio device structure.
> + * The retured device can be used for vhost library APIs.
> + * To use vhost library APIs and vhost PMD parallely, below API should
> + * not be called, because the API will be called by vhost PMD.
> + * - rte_vhost_driver_session_start()
> + * Once a device is managed by vhost PMD, below API should not be called.
> + * - rte_vhost_driver_unregister()
> + * To unregister the device, call Port Hotplug APIs.
> + *
> + * @param port_id
> + *  port number
> + * @return
> + *  virtio net device structure corresponding to the specified port
> + *  NULL will be returned in error cases.
> + */
> +struct virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif
> diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
> new file mode 100644
> index 0000000..bf0361a
> --- /dev/null
> +++ b/drivers/net/vhost/rte_pmd_vhost_version.map
> @@ -0,0 +1,8 @@
> +DPDK_2.2 {
> +
> +	global:
> +
> +	rte_eth_vhost_portid2vdev;
> +
> +	local: *;
> +};
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index 3871205..1c42fb1 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -144,7 +144,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP)       += -lrte_pmd_pcap
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL)       += -lrte_pmd_null
>  
> -endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
> +ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
> +
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
> +
> +endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
> +
> +endif # $(CONFIG_RTE_BUILD_SHARED_LIB)
>  
>  endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
>  
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v3 2/2] vhost: Add VHOST PMD
  2015-10-22 12:49       ` Bruce Richardson
@ 2015-10-23  3:48         ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-10-23  3:48 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, ann.zhuangyanying

On 2015/10/22 21:49, Bruce Richardson wrote:
> On Thu, Oct 22, 2015 at 06:45:50PM +0900, Tetsuya Mukawa wrote:
>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
>> The vhost messages will be handled only when a port is started. So start
>> a port first, then invoke QEMU.
>>
>> The PMD has 2 parameters.
>>  - iface:  The parameter is used to specify a path connect to a
>>            virtio-net device.
>>  - queues: The parameter is used to specify the number of the queues
>>            virtio-net device has.
>>            (Default: 1)
>>
>> Here is an example.
>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=3' -- -i
>>
>> To connect above testpmd, here is qemu command example.
>>
>> $ qemu-system-x86_64 \
>>         <snip>
>>         -chardev socket,id=chr0,path=/tmp/sock0 \
>>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
>>         -device virtio-net-pci,netdev=net0
>>
>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> Hi Tetsuya,
>
> a few comments inline below.
>
> /Bruce
>
>> ---
>>  config/common_linuxapp                      |   6 +
> <snip>
>> index 0000000..66bfc2b
>> --- /dev/null
>> +++ b/drivers/net/vhost/rte_eth_vhost.c
>> @@ -0,0 +1,735 @@
>> +/*-
>> + *   BSD LICENSE
>> + *
>> + *   Copyright (c) 2010-2015 Intel Corporation.
> This is probably not the copyright line you want on your new files.

Hi Bruce,

I appreciate your comments.
Yes, I will change above.

>> +
>> +static uint16_t
>> +eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
>> +{
>> +	struct vhost_queue *r = q;
>> +	uint16_t nb_rx = 0;
>> +
>> +	if (unlikely(r->internal == NULL))
>> +		return 0;
>> +
>> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
>> +		return 0;
>> +
>> +	rte_atomic32_set(&r->while_queuing, 1);
>> +
>> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
>> +		goto out;
>> +
>> +	/* Dequeue packets from guest TX queue */
>> +	nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
>> +			VIRTIO_TXQ, r->mb_pool, bufs, nb_bufs);
>> +
>> +	rte_atomic64_add(&(r->rx_pkts), nb_rx);
> Do we really need to use atomics here? It will slow things down a lot. For
> other PMDs the assumption is always that only a single thread can access each
> queue at a time - it's up to the app to use locks to enforce that restriction
> if necessary.

I agree we don't need to use atomic here.
I will change it in next patches.

>> +static int
>> +new_device(struct virtio_net *dev)
>> +{
>> +	struct rte_eth_dev *eth_dev;
>> +	struct pmd_internal *internal;
>> +	struct vhost_queue *vq;
>> +	uint16_t queues;
>> +	unsigned i;
>> +
>> +	if (dev == NULL) {
>> +		RTE_LOG(INFO, PMD, "invalid argument\n");
>> +		return -1;
>> +	}
>> +
>> +	internal = find_internal_resource(dev->ifname);
>> +	if (internal == NULL) {
>> +		RTE_LOG(INFO, PMD, "invalid device name\n");
>> +		return -1;
>> +	}
>> +
>> +	/*
>> +	 * Todo: To support multi queue, get the number of queues here.
>> +	 * So far, vhost provides only one queue.
>> +	 */
>> +	queues = 1;
>> +
>> +	if ((queues < internal->nb_rx_queues) ||
>> +			(queues < internal->nb_tx_queues)) {
>> +		RTE_LOG(INFO, PMD, "Not enough queues\n");
>> +		return -1;
>> +	}
>> +
>> +	eth_dev = rte_eth_dev_allocated(internal->dev_name);
>> +	if (eth_dev == NULL) {
>> +		RTE_LOG(INFO, PMD, "failuer to find ethdev\n");
> typo "failure". Probably shoudl also be written just as "Failed to find ethdev".

Thanks, I will fix it.

>> +static struct eth_driver rte_vhost_pmd = {
>> +	.pci_drv = {
>> +		.name = "rte_vhost_pmd",
>> +		.drv_flags = RTE_PCI_DRV_DETACHABLE,
>> +	},
>> +};
> If you base this patchset on top of Bernard's patchset to remove the PCI devices
> then you shouldn't need these pci_dev and id_table structures.

Sure, I will check him latest patches. And will rebase on it.

Regards,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 0/3] Add VHOST PMD
  2015-10-22  9:45     ` [RFC PATCH v3 1/2] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
@ 2015-10-27  6:12       ` Tetsuya Mukawa
  2015-10-27  6:12         ` [PATCH 1/3] vhost: Fix wrong handling of virtqueue array index Tetsuya Mukawa
                           ` (3 more replies)
  0 siblings, 4 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-10-27  6:12 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. The patch will work on below patch series.
 - [PATCH v5 00/28] remove pci driver from vdevs

* Known issue.
We may see issues while handling RESET_OWNER message.
These handlings are done in vhost library, so not a part of vhost PMD.
So far, we are waiting for QEMU fixing.

PATCH v4 changes:
 - Support vhost multiple queues.
 - Rebase on "remove pci driver from vdevs".
 - Optimize RX/TX functions.
 - Fix resource leaks.
 - Fix compile issue.
 - Add patch to fix vhost library.

PATCH v3 changes:
 - Optimize performance.
   In RX/TX functions, change code to access only per core data.
 - Add below API to allow user to use vhost library APIs for a port managed
   by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
    - rte_eth_vhost_portid2vdev()
   To support this functionality, vhost library is also changed.
   Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
 - Add code to support vhost multiple queues.
   Actually, multiple queues functionality is not enabled so far.

PATCH v2 changes:
 - Fix issues reported by checkpatch.pl
   (Thanks to Stephen Hemminger)


Tetsuya Mukawa (3):
  vhost: Fix wrong handling of virtqueue array index
  vhost: Add callback and private data for vhost PMD
  vhost: Add VHOST PMD

 config/common_linuxapp                        |   6 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/nics/vhost.rst                     |  82 +++
 doc/guides/rel_notes/release_2_2.rst          |   2 +
 drivers/net/Makefile                          |   4 +
 drivers/net/vhost/Makefile                    |  62 +++
 drivers/net/vhost/rte_eth_vhost.c             | 765 ++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h             |  65 +++
 drivers/net/vhost/rte_pmd_vhost_version.map   |   8 +
 lib/librte_vhost/rte_vhost_version.map        |   6 +
 lib/librte_vhost/rte_virtio_net.h             |   3 +
 lib/librte_vhost/vhost_user/virtio-net-user.c |  33 +-
 lib/librte_vhost/virtio-net.c                 |  61 +-
 lib/librte_vhost/virtio-net.h                 |   4 +-
 mk/rte.app.mk                                 |   8 +-
 15 files changed, 1085 insertions(+), 25 deletions(-)
 create mode 100644 doc/guides/nics/vhost.rst
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

-- 
2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 1/3] vhost: Fix wrong handling of virtqueue array index
  2015-10-27  6:12       ` [PATCH 0/3] Add VHOST PMD Tetsuya Mukawa
@ 2015-10-27  6:12         ` Tetsuya Mukawa
  2015-10-27  6:29           ` Yuanhan Liu
  2015-10-27  6:47           ` Yuanhan Liu
  2015-10-27  6:12         ` [PATCH 2/3] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
                           ` (2 subsequent siblings)
  3 siblings, 2 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-10-27  6:12 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying

The patch fixes wrong handling of virtqueue array index.

GET_VRING_BASE:
The vhost backend will receive the message per virtqueue.
Also we should call a destroy callback when both RXQ and TXQ receives
the message.

SET_BACKEND:
Because vhost library supports multiple queue, the index may be over 2.
Also a vhost frontend(QEMU) may send such a index.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 lib/librte_vhost/vhost_user/virtio-net-user.c | 22 +++++++++++-----------
 lib/librte_vhost/virtio-net.c                 |  5 +++--
 2 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index a998ad8..3e8dfea 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -283,12 +283,10 @@ user_get_vring_base(struct vhost_device_ctx ctx,
 	struct vhost_vring_state *state)
 {
 	struct virtio_net *dev = get_device(ctx);
+	uint16_t base_idx = state->index / VIRTIO_QNUM;
 
 	if (dev == NULL)
 		return -1;
-	/* We have to stop the queue (virtio) if it is running. */
-	if (dev->flags & VIRTIO_DEV_RUNNING)
-		notify_ops->destroy_device(dev);
 
 	/* Here we are safe to get the last used index */
 	ops->get_vring_base(ctx, state->index, state);
@@ -300,15 +298,17 @@ user_get_vring_base(struct vhost_device_ctx ctx,
 	 * sent and only sent in vhost_vring_stop.
 	 * TODO: cleanup the vring, it isn't usable since here.
 	 */
-	if (dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd >= 0) {
-		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
-		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
-	}
-	if (dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd >= 0) {
-		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
-		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
+	if (dev->virtqueue[state->index]->kickfd >= 0) {
+		close(dev->virtqueue[state->index]->kickfd);
+		dev->virtqueue[state->index]->kickfd = -1;
 	}
 
+	/* We have to stop the queue (virtio) if it is running. */
+	if ((dev->flags & VIRTIO_DEV_RUNNING) &&
+			(dev->virtqueue[base_idx + VIRTIO_RXQ]->kickfd == -1) &&
+			(dev->virtqueue[base_idx + VIRTIO_TXQ]->kickfd == -1))
+		notify_ops->destroy_device(dev);
+
 	return 0;
 }
 
@@ -321,7 +321,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
 		      struct vhost_vring_state *state)
 {
 	struct virtio_net *dev = get_device(ctx);
-	uint16_t base_idx = state->index;
+	uint16_t base_idx = state->index / VIRTIO_QNUM;
 	int enable = (int)state->num;
 
 	RTE_LOG(INFO, VHOST_CONFIG,
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 97213c5..ee2e84d 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -778,6 +778,7 @@ static int
 set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 {
 	struct virtio_net *dev;
+	uint32_t base_idx = file->index / VIRTIO_QNUM;
 
 	dev = get_device(ctx);
 	if (dev == NULL)
@@ -791,8 +792,8 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 	 * we add the device.
 	 */
 	if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
-		if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
-			((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
+		if (((int)dev->virtqueue[base_idx + VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED) &&
+			((int)dev->virtqueue[base_idx + VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED)) {
 			return notify_ops->new_device(dev);
 		}
 	/* Otherwise we remove it. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH 2/3] vhost: Add callback and private data for vhost PMD
  2015-10-27  6:12       ` [PATCH 0/3] Add VHOST PMD Tetsuya Mukawa
  2015-10-27  6:12         ` [PATCH 1/3] vhost: Fix wrong handling of virtqueue array index Tetsuya Mukawa
@ 2015-10-27  6:12         ` Tetsuya Mukawa
  2015-10-30 17:49           ` Loftus, Ciara
  2015-10-27  6:12         ` [PATCH 3/3] vhost: Add VHOST PMD Tetsuya Mukawa
  2015-10-27  7:54         ` [PATCH 0/3] " Tetsuya Mukawa
  3 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-10-27  6:12 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying

These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs. To avoid it, callback and private
data for vhost PMD are needed.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 lib/librte_vhost/rte_vhost_version.map        |  6 +++
 lib/librte_vhost/rte_virtio_net.h             |  3 ++
 lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
 lib/librte_vhost/virtio-net.c                 | 56 +++++++++++++++++++++++++--
 lib/librte_vhost/virtio-net.h                 |  4 +-
 5 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 3d8709e..00a9ce5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -20,3 +20,9 @@ DPDK_2.1 {
 	rte_vhost_driver_unregister;
 
 } DPDK_2.0;
+
+DPDK_2.2 {
+	global:
+
+	rte_vhost_driver_pmd_callback_register;
+} DPDK_2.1;
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 426a70d..08e77af 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -106,6 +106,7 @@ struct virtio_net {
 	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
 	uint32_t		virt_qp_nb;	/**< number of queue pair we have allocated */
 	void			*priv;		/**< private context */
+	void			*pmd_priv;	/**< private context for vhost PMD */
 	struct vhost_virtqueue	*virtqueue[VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX];	/**< Contains all virtqueue information. */
 } __rte_cache_aligned;
 
@@ -202,6 +203,8 @@ int rte_vhost_driver_unregister(const char *dev_name);
 
 /* Register callbacks. */
 int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const);
+/* Register callbacks for vhost PMD (Only for internal). */
+int rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const);
 /* Start vhost driver session blocking loop. */
 int rte_vhost_driver_session_start(void);
 
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index 3e8dfea..dad083b 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -111,7 +111,7 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
 
 	/* Remove from the data plane. */
 	if (dev->flags & VIRTIO_DEV_RUNNING)
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	if (dev->mem) {
 		free_mem_region(dev);
@@ -272,7 +272,7 @@ user_set_vring_kick(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
 
 	if (virtio_is_ready(dev) &&
 		!(dev->flags & VIRTIO_DEV_RUNNING))
-			notify_ops->new_device(dev);
+			notify_new_device(dev);
 }
 
 /*
@@ -307,7 +307,7 @@ user_get_vring_base(struct vhost_device_ctx ctx,
 	if ((dev->flags & VIRTIO_DEV_RUNNING) &&
 			(dev->virtqueue[base_idx + VIRTIO_RXQ]->kickfd == -1) &&
 			(dev->virtqueue[base_idx + VIRTIO_TXQ]->kickfd == -1))
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	return 0;
 }
@@ -328,10 +328,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
 		"set queue enable: %d to qp idx: %d\n",
 		enable, state->index);
 
-	if (notify_ops->vring_state_changed) {
-		notify_ops->vring_state_changed(dev, base_idx / VIRTIO_QNUM,
-						enable);
-	}
+	notify_vring_state_changed(dev, base_idx / VIRTIO_QNUM, enable);
 
 	dev->virtqueue[base_idx + VIRTIO_RXQ]->enabled = enable;
 	dev->virtqueue[base_idx + VIRTIO_TXQ]->enabled = enable;
@@ -345,7 +342,7 @@ user_destroy_device(struct vhost_device_ctx ctx)
 	struct virtio_net *dev = get_device(ctx);
 
 	if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	if (dev && dev->mem) {
 		free_mem_region(dev);
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index ee2e84d..de5d8ff 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -65,6 +65,8 @@ struct virtio_net_config_ll {
 
 /* device ops to add/remove device to/from data core. */
 struct virtio_net_device_ops const *notify_ops;
+/* device ops for vhost PMD to add/remove device to/from data core. */
+struct virtio_net_device_ops const *pmd_notify_ops;
 /* root address of the linked list of managed virtio devices */
 static struct virtio_net_config_ll *ll_root;
 
@@ -80,6 +82,43 @@ static struct virtio_net_config_ll *ll_root;
 static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
 
 
+int
+notify_new_device(struct virtio_net *dev)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->new_device != NULL)) {
+		int ret = pmd_notify_ops->new_device(dev);
+		if (ret != 0)
+			return ret;
+	}
+	if ((notify_ops != NULL) && (notify_ops->new_device != NULL))
+		return notify_ops->new_device(dev);
+
+	return 0;
+}
+
+void
+notify_destroy_device(volatile struct virtio_net *dev)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->destroy_device != NULL))
+		pmd_notify_ops->destroy_device(dev);
+	if ((notify_ops != NULL) && (notify_ops->destroy_device != NULL))
+		notify_ops->destroy_device(dev);
+}
+
+int
+notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->vring_state_changed != NULL)) {
+		int ret = pmd_notify_ops->vring_state_changed(dev, queue_id, enable);
+		if (ret != 0)
+			return ret;
+	}
+	if ((notify_ops != NULL) && (notify_ops->vring_state_changed != NULL))
+		return notify_ops->vring_state_changed(dev, queue_id, enable);
+
+	return 0;
+}
+
 /*
  * Converts QEMU virtual address to Vhost virtual address. This function is
  * used to convert the ring addresses to our address space.
@@ -377,7 +416,7 @@ destroy_device(struct vhost_device_ctx ctx)
 			 * the function to remove it from the data core.
 			 */
 			if ((ll_dev_cur->dev.flags & VIRTIO_DEV_RUNNING))
-				notify_ops->destroy_device(&(ll_dev_cur->dev));
+				notify_destroy_device(&(ll_dev_cur->dev));
 			ll_dev_cur = rm_config_ll_entry(ll_dev_cur,
 					ll_dev_last);
 		} else {
@@ -794,12 +833,12 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 	if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
 		if (((int)dev->virtqueue[base_idx + VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED) &&
 			((int)dev->virtqueue[base_idx + VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED)) {
-			return notify_ops->new_device(dev);
+			return notify_new_device(dev);
 		}
 	/* Otherwise we remove it. */
 	} else
 		if (file->fd == VIRTIO_DEV_STOPPED)
-			notify_ops->destroy_device(dev);
+			notify_destroy_device(dev);
 	return 0;
 }
 
@@ -883,3 +922,14 @@ rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const op
 
 	return 0;
 }
+
+/*
+ * Register ops so that we can add/remove device to data core.
+ */
+int
+rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const ops)
+{
+	pmd_notify_ops = ops;
+
+	return 0;
+}
diff --git a/lib/librte_vhost/virtio-net.h b/lib/librte_vhost/virtio-net.h
index 75fb57e..0816e71 100644
--- a/lib/librte_vhost/virtio-net.h
+++ b/lib/librte_vhost/virtio-net.h
@@ -37,7 +37,9 @@
 #include "vhost-net.h"
 #include "rte_virtio_net.h"
 
-struct virtio_net_device_ops const *notify_ops;
 struct virtio_net *get_device(struct vhost_device_ctx ctx);
 
+int notify_new_device(struct virtio_net *dev);
+void notify_destroy_device(volatile struct virtio_net *dev);
+int notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable);
 #endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH 3/3] vhost: Add VHOST PMD
  2015-10-27  6:12       ` [PATCH 0/3] Add VHOST PMD Tetsuya Mukawa
  2015-10-27  6:12         ` [PATCH 1/3] vhost: Fix wrong handling of virtqueue array index Tetsuya Mukawa
  2015-10-27  6:12         ` [PATCH 2/3] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
@ 2015-10-27  6:12         ` Tetsuya Mukawa
  2015-11-02  3:58           ` [PATCH v2 0/2] " Tetsuya Mukawa
  2015-11-09 22:25           ` [PATCH 3/3] vhost: " Stephen Hemminger
  2015-10-27  7:54         ` [PATCH 0/3] " Tetsuya Mukawa
  3 siblings, 2 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-10-27  6:12 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
 - iface:  The parameter is used to specify a path to connect to a
           virtio-net device.
 - queues: The parameter is used to specify the number of the queues
           virtio-net device has.
           (Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
        <snip>
        -chardev socket,id=chr0,path=/tmp/sock0 \
        -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
        -device virtio-net-pci,netdev=net0

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 config/common_linuxapp                      |   6 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/nics/vhost.rst                   |  82 +++
 doc/guides/rel_notes/release_2_2.rst        |   2 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 +++
 drivers/net/vhost/rte_eth_vhost.c           | 765 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           |  65 +++
 drivers/net/vhost/rte_pmd_vhost_version.map |   8 +
 mk/rte.app.mk                               |   8 +-
 10 files changed, 1002 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/nics/vhost.rst
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0de43d5..7310240 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -446,6 +446,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 
 #
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index d1a92f8..44792fe 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -46,6 +46,7 @@ Network Interface Controller Drivers
     intel_vf
     mlx4
     virtio
+    vhost
     vmxnet3
     pcap_ring
 
diff --git a/doc/guides/nics/vhost.rst b/doc/guides/nics/vhost.rst
new file mode 100644
index 0000000..2ec8d79
--- /dev/null
+++ b/doc/guides/nics/vhost.rst
@@ -0,0 +1,82 @@
+..  BSD LICENSE
+    Copyright(c) 2015 IGEL Co., Ltd.. All rights reserved.
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of IGEL Co., Ltd. nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Poll Mode Driver that wraps vhost library
+=========================================
+
+This PMD is a thin wrapper of the DPDK vhost library.
+The User can handle virtqueues as one of normal DPDK port.
+
+Vhost Implementation in DPDK
+----------------------------
+
+Please refer to Chapter "Vhost Library" of Programmer's Guide to know detail of vhost.
+
+Features and Limitations of vhost PMD
+-------------------------------------
+
+In this release, the vhost PMD provides the basic functionality of packet reception and transmission.
+
+*   It provides the function to convert port_id to a pointer of virtio_net device.
+    It allows the user to use vhost library with the PMD in parallel.
+
+*   It has multiple queues support.
+
+*   It supports Port Hotplug functionality.
+
+*   Don't need to stop RX/TX, when the user wants to stop a guest or a virtio-net driver on guest.
+
+Vhost PMD with testpmd application
+----------------------------------
+
+This section demonstrates vhost PMD with testpmd DPDK sample application.
+
+#.  Launch the testpmd with vhost PMD:
+
+    .. code-block:: console
+
+        ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
+
+    Other basic DPDK preparations like hugepage enabling here.
+    Please refer to the *DPDK Getting Started Guide* for detailed instructions.
+
+#.  Launch the QEMU:
+
+    .. code-block:: console
+
+       qemu-system-x86_64 <snip>
+                   -chardev socket,id=chr0,path=/tmp/sock0 \
+                   -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
+                   -device virtio-net-pci,netdev=net0
+
+    This command generates one virtio-net device for QEMU.
+    Once device is recognized by guest, The user can handle the device as normal
+    virtio-net device.
+    When initialization processes between virtio-net driver and vhost library are done, Port status of the testpmd will be linked up.
diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 639f129..5930e70 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -11,6 +11,8 @@ New Features
 
 * **Added vhost-user multiple queue support.**
 
+* **Added vhost PMD.**
+
 * **Removed the PCI device from vdev PMD's.**
 
   * This change required modifications to librte_ether and all vdev and pdev PMD's.
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 5ebf963..e46a38e 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -49,5 +49,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 include $(RTE_SDK)/mk/rte.sharelib.mk
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..8186a80
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+#   BSD LICENSE
+#
+#   Copyright (c) 2010-2015 Intel Corporation.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..004fdaf
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,765 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2015 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co.,Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+
+#include "rte_eth_vhost.h"
+
+#define ETH_VHOST_IFACE_ARG		"iface"
+#define ETH_VHOST_QUEUES_ARG		"queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+	ETH_VHOST_IFACE_ARG,
+	ETH_VHOST_QUEUES_ARG,
+	NULL
+};
+
+static struct ether_addr base_eth_addr = {
+	.addr_bytes = {
+		0x56 /* V */,
+		0x48 /* H */,
+		0x4F /* O */,
+		0x53 /* S */,
+		0x54 /* T */,
+		0x00
+	}
+};
+
+struct vhost_queue {
+	struct virtio_net *device;
+	struct pmd_internal *internal;
+	struct rte_mempool *mb_pool;
+	rte_atomic32_t allow_queuing;
+	rte_atomic32_t while_queuing;
+	uint64_t rx_pkts;
+	uint64_t tx_pkts;
+	uint64_t err_pkts;
+};
+
+struct pmd_internal {
+	TAILQ_ENTRY(pmd_internal) next;
+	char *dev_name;
+	char *iface_name;
+	unsigned nb_rx_queues;
+	unsigned nb_tx_queues;
+
+	struct vhost_queue *rx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+	struct vhost_queue *tx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+
+	volatile uint16_t once;
+	pthread_t session_th;
+};
+
+TAILQ_HEAD(pmd_internal_head, pmd_internal);
+static struct pmd_internal_head internals_list =
+	TAILQ_HEAD_INITIALIZER(internals_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static struct rte_eth_link pmd_link = {
+		.link_speed = 10000,
+		.link_duplex = ETH_LINK_FULL_DUPLEX,
+		.link_status = 0
+};
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t nb_rx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Dequeue packets from guest TX queue */
+	nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
+			VIRTIO_TXQ, r->mb_pool, bufs, nb_bufs);
+
+	r->rx_pkts += nb_rx;
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_tx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Enqueue packets to guest RX queue */
+	nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
+			VIRTIO_RXQ, bufs, nb_bufs);
+
+	r->tx_pkts += nb_tx;
+	r->err_pkts += nb_bufs - nb_tx;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		rte_pktmbuf_free(bufs[i]);
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { return 0; }
+
+static inline struct pmd_internal *
+find_internal_resource(char *ifname)
+{
+	int found = 0;
+	struct pmd_internal *internal;
+
+	if (ifname == NULL)
+		return NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(internal, &internals_list, next) {
+		if (!strcmp(internal->iface_name, ifname)) {
+			found = 1;
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	if (!found)
+		return NULL;
+
+	return internal;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "invalid argument\n");
+		return -1;
+	}
+
+	internal = find_internal_resource(dev->ifname);
+	if (internal == NULL) {
+		RTE_LOG(INFO, PMD, "invalid device name\n");
+		return -1;
+	}
+
+	if ((dev->virt_qp_nb < internal->nb_rx_queues) ||
+			(dev->virt_qp_nb < internal->nb_tx_queues)) {
+		RTE_LOG(INFO, PMD, "Not enough queues\n");
+		return -1;
+	}
+
+	eth_dev = rte_eth_dev_allocated(internal->dev_name);
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "Failed to find ethdev\n");
+		return -1;
+	}
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+	}
+
+	dev->flags |= VIRTIO_DEV_RUNNING;
+	dev->pmd_priv = eth_dev;
+	eth_dev->data->dev_link.link_status = 1;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	RTE_LOG(INFO, PMD, "New connection established\n");
+
+	return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "invalid argument\n");
+		return;
+	}
+
+	eth_dev = (struct rte_eth_dev *)dev->pmd_priv;
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "failuer to find a ethdev\n");
+		return;
+	}
+
+	internal = eth_dev->data->dev_private;
+
+	/* Wait until rx/tx_pkt_burst stops accesing vhost device */
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+
+	eth_dev->data->dev_link.link_status = 0;
+
+	dev->pmd_priv = NULL;
+	dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+
+	RTE_LOG(INFO, PMD, "Connection closed\n");
+}
+
+static void *vhost_driver_session(void *param __rte_unused)
+{
+	static struct virtio_net_device_ops *vhost_ops;
+
+	vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
+	if (vhost_ops == NULL)
+		rte_panic("Can't allocate memory\n");
+
+	/* set vhost arguments */
+	vhost_ops->new_device = new_device;
+	vhost_ops->destroy_device = destroy_device;
+	if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
+		rte_panic("Can't register callbacks\n");
+
+	/* start event handling */
+	rte_vhost_driver_session_start();
+
+	rte_free(vhost_ops);
+	pthread_exit(0);
+}
+
+static void vhost_driver_session_start(struct pmd_internal *internal)
+{
+	int ret;
+
+	ret = pthread_create(&internal->session_th,
+			NULL, vhost_driver_session, NULL);
+	if (ret)
+		rte_panic("Can't create a thread\n");
+}
+
+static void vhost_driver_session_stop(struct pmd_internal *internal)
+{
+	int ret;
+
+	ret = pthread_cancel(internal->session_th);
+	if (ret)
+		rte_panic("Can't cancel the thread\n");
+
+	ret = pthread_join(internal->session_th, NULL);
+	if (ret)
+		rte_panic("Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	int ret;
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+		ret = rte_vhost_driver_register(internal->iface_name);
+		if (ret)
+			return ret;
+
+		vhost_driver_session_start(internal);
+	}
+	return 0;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 1, 0)) {
+		rte_vhost_driver_unregister(internal->iface_name);
+		vhost_driver_session_stop(internal);
+	}
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+	struct vhost_queue *vq;
+
+	if (internal->rx_vhost_queues[rx_queue_id] != NULL)
+		rte_free(internal->rx_vhost_queues[rx_queue_id]);
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->mb_pool = mb_pool;
+	internal->rx_vhost_queues[rx_queue_id] = vq;
+	dev->data->rx_queues[rx_queue_id] = vq;
+	return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+	struct vhost_queue *vq;
+
+	if (internal->tx_vhost_queues[tx_queue_id] != NULL)
+		rte_free(internal->tx_vhost_queues[tx_queue_id]);
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+		return -ENOMEM;
+	}
+
+	internal->tx_vhost_queues[tx_queue_id] = vq;
+	dev->data->tx_queues[tx_queue_id] = vq;
+	return 0;
+}
+
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+	     struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	dev_info->driver_name = drivername;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = (uint32_t)-1;
+	dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
+	dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
+	dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+	unsigned i;
+	unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
+	const struct pmd_internal *internal = dev->data->dev_private;
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+	     i < internal->nb_rx_queues; i++) {
+		if (internal->rx_vhost_queues[i] == NULL)
+			continue;
+		igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i]->rx_pkts;
+		rx_total += igb_stats->q_ipackets[i];
+	}
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+	     i < internal->nb_tx_queues; i++) {
+		if (internal->tx_vhost_queues[i] == NULL)
+			continue;
+		igb_stats->q_opackets[i] = internal->tx_vhost_queues[i]->tx_pkts;
+		igb_stats->q_errors[i] = internal->tx_vhost_queues[i]->err_pkts;
+		tx_total += igb_stats->q_opackets[i];
+		tx_err_total += igb_stats->q_errors[i];
+	}
+
+	igb_stats->ipackets = rx_total;
+	igb_stats->opackets = tx_total;
+	igb_stats->oerrors = tx_err_total;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	unsigned i;
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		if (internal->rx_vhost_queues[i] == NULL)
+			continue;
+		internal->rx_vhost_queues[i]->rx_pkts = 0;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		if (internal->tx_vhost_queues[i] == NULL)
+			continue;
+		internal->tx_vhost_queues[i]->tx_pkts = 0;
+		internal->tx_vhost_queues[i]->err_pkts = 0;
+	}
+}
+
+static void
+eth_queue_release(void *q __rte_unused) { ; }
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused) { return 0; }
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, int index,
+		     char *iface_name,
+		     int16_t queues,
+		     const unsigned numa_node)
+{
+	struct rte_eth_dev_data *data = NULL;
+	struct pmd_internal *internal = NULL;
+	struct rte_eth_dev *eth_dev = NULL;
+	struct ether_addr *eth_addr = NULL;
+
+	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+		numa_node);
+
+	/* now do all data allocation - for eth_dev structure, dummy pci driver
+	 * and internal (private) data
+	 */
+	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+	if (data == NULL)
+		goto error;
+
+	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+	if (internal == NULL)
+		goto error;
+
+	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+	if (eth_addr == NULL)
+		goto error;
+	*eth_addr = base_eth_addr;
+	eth_addr->addr_bytes[5] = index;
+
+	/* reserve an ethdev entry */
+	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+	if (eth_dev == NULL)
+		goto error;
+
+	/* now put it all together
+	 * - store queue data in internal,
+	 * - store numa_node info in ethdev data
+	 * - point eth_dev_data to internals
+	 * - and point eth_dev structure to new eth_dev_data structure
+	 */
+	internal->nb_rx_queues = queues;
+	internal->nb_tx_queues = queues;
+	internal->dev_name = strdup(name);
+	if (internal->dev_name == NULL)
+		goto error;
+	internal->iface_name = strdup(iface_name);
+	if (internal->iface_name == NULL)
+		goto error;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internals_list, internal, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	data->dev_private = internal;
+	data->port_id = eth_dev->data->port_id;
+	memmove(data->name, eth_dev->data->name, sizeof(data->name));
+	data->nb_rx_queues = queues;
+	data->nb_tx_queues = queues;
+	data->dev_link = pmd_link;
+	data->mac_addrs = eth_addr;
+
+	/* We'll replace the 'data' originally allocated by eth_dev. So the
+	 * vhost PMD resources won't be shared between multi processes.
+	 */
+	eth_dev->data = data;
+	eth_dev->dev_ops = &ops;
+	eth_dev->driver = NULL;
+	eth_dev->data->dev_flags = RTE_ETH_DEV_DETACHABLE;
+	eth_dev->data->kdrv = RTE_KDRV_NONE;
+	eth_dev->data->drv_name = internal->dev_name;
+	eth_dev->data->numa_node = numa_node;
+
+	/* finally assign rx and tx ops */
+	eth_dev->rx_pkt_burst = eth_vhost_rx;
+	eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+	return data->port_id;
+
+error:
+	rte_free(data);
+	rte_free(internal);
+	rte_free(eth_addr);
+
+	return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	const char **iface_name = extra_args;
+
+	if (value == NULL)
+		return -1;
+
+	*iface_name = value;
+
+	return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *q = extra_args;
+
+	if ((value == NULL) || (extra_args == NULL))
+		return -EINVAL;
+
+	*q = (uint16_t)strtoul(value, NULL, 0);
+	if ((*q == USHRT_MAX) && (errno == ERANGE))
+		return -1;
+
+	if (*q > RTE_MAX_QUEUES_PER_PORT)
+		return -1;
+
+	return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
+	int index;
+	char *iface_name;
+	uint16_t queues;
+
+	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+		return -1;
+
+	if (strlen(name) < strlen("eth_vhost"))
+		return -1;
+
+	index = strtol(name + strlen("eth_vhost"), NULL, 0);
+	if (errno == ERANGE)
+		return -1;
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+					 &open_iface, &iface_name);
+		if (ret < 0)
+			goto out_free;
+	}
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+					 &open_queues, &queues);
+		if (ret < 0)
+			goto out_free;
+
+	} else
+		queues = 1;
+
+	eth_dev_vhost_create(name, index,
+			iface_name, queues, rte_socket_id());
+
+out_free:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internal *internal;
+	unsigned int i;
+
+	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+	if (name == NULL)
+		return -EINVAL;
+
+	/* find an ethdev entry */
+	eth_dev = rte_eth_dev_allocated(name);
+	if (eth_dev == NULL)
+		return -ENODEV;
+
+	internal = eth_dev->data->dev_private;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_REMOVE(&internals_list, internal, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	eth_dev_stop(eth_dev);
+
+	if ((internal) && (internal->dev_name))
+		free(internal->dev_name);
+	if ((internal) && (internal->iface_name))
+		free(internal->iface_name);
+
+	rte_free(eth_dev->data->mac_addrs);
+	rte_free(eth_dev->data);
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		if (internal->rx_vhost_queues[i] != NULL)
+			rte_free(internal->rx_vhost_queues[i]);
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		if (internal->tx_vhost_queues[i] != NULL)
+			rte_free(internal->tx_vhost_queues[i]);
+	}
+	rte_free(internal);
+
+	rte_eth_dev_release_port(eth_dev);
+
+	return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+	.name = "eth_vhost",
+	.type = PMD_VDEV,
+	.init = rte_pmd_vhost_devinit,
+	.uninit = rte_pmd_vhost_devuninit,
+};
+
+struct
+virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
+{
+	struct rte_eth_dev *eth_dev;
+
+	if (rte_eth_dev_is_valid_port(port_id) == 0)
+		return NULL;
+
+	eth_dev = &rte_eth_devices[port_id];
+	if (strncmp("eth_vhost", eth_dev->data->drv_name,
+				strlen("eth_vhost")) == 0) {
+		struct pmd_internal *internal;
+		struct vhost_queue *vq;
+
+		internal = eth_dev->data->dev_private;
+		vq = internal->rx_vhost_queues[0];
+		if ((vq != NULL) && (vq->device != NULL))
+			return vq->device;
+	}
+
+	return NULL;
+}
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..22a880f
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,65 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co., Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_AF_PACKET_H_
+#define _RTE_ETH_AF_PACKET_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_virtio_net.h>
+
+/**
+ * The function convert specified port_id to virtio device structure.
+ * The retured device can be used for vhost library APIs.
+ * To use vhost library APIs and vhost PMD parallely, below API should
+ * not be called, because the API will be called by vhost PMD.
+ * - rte_vhost_driver_session_start()
+ * Once a device is managed by vhost PMD, below API should not be called.
+ * - rte_vhost_driver_unregister()
+ * To unregister the device, call Port Hotplug APIs.
+ *
+ * @param port_id
+ *  port number
+ * @return
+ *  virtio net device structure corresponding to the specified port
+ *  NULL will be returned in error cases.
+ */
+struct virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..bf0361a
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,8 @@
+DPDK_2.2 {
+
+	global:
+
+	rte_eth_vhost_portid2vdev;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 9e1909e..806e45c 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -143,7 +143,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP)       += -lrte_pmd_pcap
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL)       += -lrte_pmd_null
 
-endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
+
+endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
+
+endif # $(CONFIG_RTE_BUILD_SHARED_LIB)
 
 endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [PATCH 1/3] vhost: Fix wrong handling of virtqueue array index
  2015-10-27  6:12         ` [PATCH 1/3] vhost: Fix wrong handling of virtqueue array index Tetsuya Mukawa
@ 2015-10-27  6:29           ` Yuanhan Liu
  2015-10-27  6:33             ` Yuanhan Liu
  2015-10-27  6:47           ` Yuanhan Liu
  1 sibling, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-10-27  6:29 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Tue, Oct 27, 2015 at 03:12:53PM +0900, Tetsuya Mukawa wrote:
> The patch fixes wrong handling of virtqueue array index.
> 
> GET_VRING_BASE:
> The vhost backend will receive the message per virtqueue.

No, that's not right, we will get GET_VRING_BASE for each queue pair,
including RX and TX virt queue, but not each virt queue.

> Also we should call a destroy callback when both RXQ and TXQ receives
> the message.
> 
> SET_BACKEND:
> Because vhost library supports multiple queue, the index may be over 2.
> Also a vhost frontend(QEMU) may send such a index.
> 
> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> ---
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 22 +++++++++++-----------
>  lib/librte_vhost/virtio-net.c                 |  5 +++--
>  2 files changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
> index a998ad8..3e8dfea 100644
> --- a/lib/librte_vhost/vhost_user/virtio-net-user.c
> +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
> @@ -283,12 +283,10 @@ user_get_vring_base(struct vhost_device_ctx ctx,
>  	struct vhost_vring_state *state)
>  {
>  	struct virtio_net *dev = get_device(ctx);
> +	uint16_t base_idx = state->index / VIRTIO_QNUM;

For the Nth queue (pair), the "state->index" equals to  N * 2.
So, dividing it by VIRTIO_QNUM (2) is wrong here.

	--yliu

>  
>  	if (dev == NULL)
>  		return -1;
> -	/* We have to stop the queue (virtio) if it is running. */
> -	if (dev->flags & VIRTIO_DEV_RUNNING)
> -		notify_ops->destroy_device(dev);
>  
>  	/* Here we are safe to get the last used index */
>  	ops->get_vring_base(ctx, state->index, state);
> @@ -300,15 +298,17 @@ user_get_vring_base(struct vhost_device_ctx ctx,
>  	 * sent and only sent in vhost_vring_stop.
>  	 * TODO: cleanup the vring, it isn't usable since here.
>  	 */
> -	if (dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd >= 0) {
> -		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
> -		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
> -	}
> -	if (dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd >= 0) {
> -		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
> -		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
> +	if (dev->virtqueue[state->index]->kickfd >= 0) {
> +		close(dev->virtqueue[state->index]->kickfd);
> +		dev->virtqueue[state->index]->kickfd = -1;
>  	}
>  
> +	/* We have to stop the queue (virtio) if it is running. */
> +	if ((dev->flags & VIRTIO_DEV_RUNNING) &&
> +			(dev->virtqueue[base_idx + VIRTIO_RXQ]->kickfd == -1) &&
> +			(dev->virtqueue[base_idx + VIRTIO_TXQ]->kickfd == -1))
> +		notify_ops->destroy_device(dev);
> +
>  	return 0;
>  }
>  
> @@ -321,7 +321,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
>  		      struct vhost_vring_state *state)
>  {
>  	struct virtio_net *dev = get_device(ctx);
> -	uint16_t base_idx = state->index;
> +	uint16_t base_idx = state->index / VIRTIO_QNUM;
>  	int enable = (int)state->num;
>  
>  	RTE_LOG(INFO, VHOST_CONFIG,
> diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
> index 97213c5..ee2e84d 100644
> --- a/lib/librte_vhost/virtio-net.c
> +++ b/lib/librte_vhost/virtio-net.c
> @@ -778,6 +778,7 @@ static int
>  set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
>  {
>  	struct virtio_net *dev;
> +	uint32_t base_idx = file->index / VIRTIO_QNUM;
>  
>  	dev = get_device(ctx);
>  	if (dev == NULL)
> @@ -791,8 +792,8 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
>  	 * we add the device.
>  	 */
>  	if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
> -		if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
> -			((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
> +		if (((int)dev->virtqueue[base_idx + VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED) &&
> +			((int)dev->virtqueue[base_idx + VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED)) {
>  			return notify_ops->new_device(dev);
>  		}
>  	/* Otherwise we remove it. */
> -- 
> 2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 1/3] vhost: Fix wrong handling of virtqueue array index
  2015-10-27  6:29           ` Yuanhan Liu
@ 2015-10-27  6:33             ` Yuanhan Liu
  0 siblings, 0 replies; 200+ messages in thread
From: Yuanhan Liu @ 2015-10-27  6:33 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Tue, Oct 27, 2015 at 02:29:25PM +0800, Yuanhan Liu wrote:
> On Tue, Oct 27, 2015 at 03:12:53PM +0900, Tetsuya Mukawa wrote:
> > The patch fixes wrong handling of virtqueue array index.
> > 
> > GET_VRING_BASE:
> > The vhost backend will receive the message per virtqueue.
> 
> No, that's not right, we will get GET_VRING_BASE for each queue pair,
> including RX and TX virt queue, but not each virt queue.

Oops, you are right, and I was right in the first time till
Huawei pointted out that I made some unexpected change, and
I then checked the code, I then made a wrong decison (a
bit too tired recently :(

So, I will look at this patch, again.

Sorry for that.

	--yliu
> 
> > Also we should call a destroy callback when both RXQ and TXQ receives
> > the message.
> > 
> > SET_BACKEND:
> > Because vhost library supports multiple queue, the index may be over 2.
> > Also a vhost frontend(QEMU) may send such a index.
> > 
> > Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> > ---
> >  lib/librte_vhost/vhost_user/virtio-net-user.c | 22 +++++++++++-----------
> >  lib/librte_vhost/virtio-net.c                 |  5 +++--
> >  2 files changed, 14 insertions(+), 13 deletions(-)
> > 
> > diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
> > index a998ad8..3e8dfea 100644
> > --- a/lib/librte_vhost/vhost_user/virtio-net-user.c
> > +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
> > @@ -283,12 +283,10 @@ user_get_vring_base(struct vhost_device_ctx ctx,
> >  	struct vhost_vring_state *state)
> >  {
> >  	struct virtio_net *dev = get_device(ctx);
> > +	uint16_t base_idx = state->index / VIRTIO_QNUM;
> 
> For the Nth queue (pair), the "state->index" equals to  N * 2.
> So, dividing it by VIRTIO_QNUM (2) is wrong here.
> 
> 	--yliu
> 
> >  
> >  	if (dev == NULL)
> >  		return -1;
> > -	/* We have to stop the queue (virtio) if it is running. */
> > -	if (dev->flags & VIRTIO_DEV_RUNNING)
> > -		notify_ops->destroy_device(dev);
> >  
> >  	/* Here we are safe to get the last used index */
> >  	ops->get_vring_base(ctx, state->index, state);
> > @@ -300,15 +298,17 @@ user_get_vring_base(struct vhost_device_ctx ctx,
> >  	 * sent and only sent in vhost_vring_stop.
> >  	 * TODO: cleanup the vring, it isn't usable since here.
> >  	 */
> > -	if (dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd >= 0) {
> > -		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
> > -		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
> > -	}
> > -	if (dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd >= 0) {
> > -		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
> > -		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
> > +	if (dev->virtqueue[state->index]->kickfd >= 0) {
> > +		close(dev->virtqueue[state->index]->kickfd);
> > +		dev->virtqueue[state->index]->kickfd = -1;
> >  	}
> >  
> > +	/* We have to stop the queue (virtio) if it is running. */
> > +	if ((dev->flags & VIRTIO_DEV_RUNNING) &&
> > +			(dev->virtqueue[base_idx + VIRTIO_RXQ]->kickfd == -1) &&
> > +			(dev->virtqueue[base_idx + VIRTIO_TXQ]->kickfd == -1))
> > +		notify_ops->destroy_device(dev);
> > +
> >  	return 0;
> >  }
> >  
> > @@ -321,7 +321,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
> >  		      struct vhost_vring_state *state)
> >  {
> >  	struct virtio_net *dev = get_device(ctx);
> > -	uint16_t base_idx = state->index;
> > +	uint16_t base_idx = state->index / VIRTIO_QNUM;
> >  	int enable = (int)state->num;
> >  
> >  	RTE_LOG(INFO, VHOST_CONFIG,
> > diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
> > index 97213c5..ee2e84d 100644
> > --- a/lib/librte_vhost/virtio-net.c
> > +++ b/lib/librte_vhost/virtio-net.c
> > @@ -778,6 +778,7 @@ static int
> >  set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
> >  {
> >  	struct virtio_net *dev;
> > +	uint32_t base_idx = file->index / VIRTIO_QNUM;
> >  
> >  	dev = get_device(ctx);
> >  	if (dev == NULL)
> > @@ -791,8 +792,8 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
> >  	 * we add the device.
> >  	 */
> >  	if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
> > -		if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
> > -			((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
> > +		if (((int)dev->virtqueue[base_idx + VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED) &&
> > +			((int)dev->virtqueue[base_idx + VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED)) {
> >  			return notify_ops->new_device(dev);
> >  		}
> >  	/* Otherwise we remove it. */
> > -- 
> > 2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 1/3] vhost: Fix wrong handling of virtqueue array index
  2015-10-27  6:12         ` [PATCH 1/3] vhost: Fix wrong handling of virtqueue array index Tetsuya Mukawa
  2015-10-27  6:29           ` Yuanhan Liu
@ 2015-10-27  6:47           ` Yuanhan Liu
  2015-10-27  7:28             ` Tetsuya Mukawa
  1 sibling, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-10-27  6:47 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Tue, Oct 27, 2015 at 03:12:53PM +0900, Tetsuya Mukawa wrote:
> The patch fixes wrong handling of virtqueue array index.
> 
> GET_VRING_BASE:
> The vhost backend will receive the message per virtqueue.
> Also we should call a destroy callback when both RXQ and TXQ receives
> the message.
> 
> SET_BACKEND:
> Because vhost library supports multiple queue, the index may be over 2.
> Also a vhost frontend(QEMU) may send such a index.

Note that only vhost-user supports MQ. vhost-cuse does not.

> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> ---
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 22 +++++++++++-----------
>  lib/librte_vhost/virtio-net.c                 |  5 +++--
>  2 files changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
> index a998ad8..3e8dfea 100644
> --- a/lib/librte_vhost/vhost_user/virtio-net-user.c
> +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
> @@ -283,12 +283,10 @@ user_get_vring_base(struct vhost_device_ctx ctx,
>  	struct vhost_vring_state *state)
>  {
>  	struct virtio_net *dev = get_device(ctx);
> +	uint16_t base_idx = state->index / VIRTIO_QNUM;

So, fixing what my 1st reply said, for Nth queue pair, state->index
is "N * 2 + is_tx". So, the base should be "state->index / 2 * 2".

>  
>  	if (dev == NULL)
>  		return -1;
> -	/* We have to stop the queue (virtio) if it is running. */
> -	if (dev->flags & VIRTIO_DEV_RUNNING)
> -		notify_ops->destroy_device(dev);
>  
>  	/* Here we are safe to get the last used index */
>  	ops->get_vring_base(ctx, state->index, state);
> @@ -300,15 +298,17 @@ user_get_vring_base(struct vhost_device_ctx ctx,
>  	 * sent and only sent in vhost_vring_stop.
>  	 * TODO: cleanup the vring, it isn't usable since here.
>  	 */
> -	if (dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd >= 0) {
> -		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
> -		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
> -	}
> -	if (dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd >= 0) {
> -		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
> -		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
> +	if (dev->virtqueue[state->index]->kickfd >= 0) {
> +		close(dev->virtqueue[state->index]->kickfd);
> +		dev->virtqueue[state->index]->kickfd = -1;
>  	}
>  
> +	/* We have to stop the queue (virtio) if it is running. */
> +	if ((dev->flags & VIRTIO_DEV_RUNNING) &&
> +			(dev->virtqueue[base_idx + VIRTIO_RXQ]->kickfd == -1) &&
> +			(dev->virtqueue[base_idx + VIRTIO_TXQ]->kickfd == -1))
> +		notify_ops->destroy_device(dev);

This is a proper fix then. (You just need fix base_idx).

>  	return 0;
>  }
>  
> @@ -321,7 +321,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
>  		      struct vhost_vring_state *state)
>  {
>  	struct virtio_net *dev = get_device(ctx);
> -	uint16_t base_idx = state->index;
> +	uint16_t base_idx = state->index / VIRTIO_QNUM;

user_set_vring_enable is sent per queue pair (I'm sure this time), so
base_idx equals to state->index. No need fix here.

>  	int enable = (int)state->num;
>  
>  	RTE_LOG(INFO, VHOST_CONFIG,
> diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
> index 97213c5..ee2e84d 100644
> --- a/lib/librte_vhost/virtio-net.c
> +++ b/lib/librte_vhost/virtio-net.c
> @@ -778,6 +778,7 @@ static int
>  set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
>  {
>  	struct virtio_net *dev;
> +	uint32_t base_idx = file->index / VIRTIO_QNUM;

As stated, vhost-cuse doesn't not support MQ.

	--yliu
>  
>  	dev = get_device(ctx);
>  	if (dev == NULL)
> @@ -791,8 +792,8 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
>  	 * we add the device.
>  	 */
>  	if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
> -		if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
> -			((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
> +		if (((int)dev->virtqueue[base_idx + VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED) &&
> +			((int)dev->virtqueue[base_idx + VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED)) {
>  			return notify_ops->new_device(dev);
>  		}
>  	/* Otherwise we remove it. */
> -- 
> 2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 1/3] vhost: Fix wrong handling of virtqueue array index
  2015-10-27  6:47           ` Yuanhan Liu
@ 2015-10-27  7:28             ` Tetsuya Mukawa
  2015-10-27  7:34               ` Yuanhan Liu
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-10-27  7:28 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, ann.zhuangyanying

Hi Yuanhan,

I appreciate your checking.
I haven't noticed SET_BACKEND is only supported by vhost-cuse. :-(
I will follow your comments, then submit again.

Thanks,
Tetsuya

On 2015/10/27 15:47, Yuanhan Liu wrote:
> On Tue, Oct 27, 2015 at 03:12:53PM +0900, Tetsuya Mukawa wrote:
>> The patch fixes wrong handling of virtqueue array index.
>>
>> GET_VRING_BASE:
>> The vhost backend will receive the message per virtqueue.
>> Also we should call a destroy callback when both RXQ and TXQ receives
>> the message.
>>
>> SET_BACKEND:
>> Because vhost library supports multiple queue, the index may be over 2.
>> Also a vhost frontend(QEMU) may send such a index.
> Note that only vhost-user supports MQ. vhost-cuse does not.
>
>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
>> ---
>>  lib/librte_vhost/vhost_user/virtio-net-user.c | 22 +++++++++++-----------
>>  lib/librte_vhost/virtio-net.c                 |  5 +++--
>>  2 files changed, 14 insertions(+), 13 deletions(-)
>>
>> diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
>> index a998ad8..3e8dfea 100644
>> --- a/lib/librte_vhost/vhost_user/virtio-net-user.c
>> +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
>> @@ -283,12 +283,10 @@ user_get_vring_base(struct vhost_device_ctx ctx,
>>  	struct vhost_vring_state *state)
>>  {
>>  	struct virtio_net *dev = get_device(ctx);
>> +	uint16_t base_idx = state->index / VIRTIO_QNUM;
> So, fixing what my 1st reply said, for Nth queue pair, state->index
> is "N * 2 + is_tx". So, the base should be "state->index / 2 * 2".
>
>>  
>>  	if (dev == NULL)
>>  		return -1;
>> -	/* We have to stop the queue (virtio) if it is running. */
>> -	if (dev->flags & VIRTIO_DEV_RUNNING)
>> -		notify_ops->destroy_device(dev);
>>  
>>  	/* Here we are safe to get the last used index */
>>  	ops->get_vring_base(ctx, state->index, state);
>> @@ -300,15 +298,17 @@ user_get_vring_base(struct vhost_device_ctx ctx,
>>  	 * sent and only sent in vhost_vring_stop.
>>  	 * TODO: cleanup the vring, it isn't usable since here.
>>  	 */
>> -	if (dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd >= 0) {
>> -		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
>> -		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
>> -	}
>> -	if (dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd >= 0) {
>> -		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
>> -		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
>> +	if (dev->virtqueue[state->index]->kickfd >= 0) {
>> +		close(dev->virtqueue[state->index]->kickfd);
>> +		dev->virtqueue[state->index]->kickfd = -1;
>>  	}
>>  
>> +	/* We have to stop the queue (virtio) if it is running. */
>> +	if ((dev->flags & VIRTIO_DEV_RUNNING) &&
>> +			(dev->virtqueue[base_idx + VIRTIO_RXQ]->kickfd == -1) &&
>> +			(dev->virtqueue[base_idx + VIRTIO_TXQ]->kickfd == -1))
>> +		notify_ops->destroy_device(dev);
> This is a proper fix then. (You just need fix base_idx).
>
>>  	return 0;
>>  }
>>  
>> @@ -321,7 +321,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
>>  		      struct vhost_vring_state *state)
>>  {
>>  	struct virtio_net *dev = get_device(ctx);
>> -	uint16_t base_idx = state->index;
>> +	uint16_t base_idx = state->index / VIRTIO_QNUM;
> user_set_vring_enable is sent per queue pair (I'm sure this time), so
> base_idx equals to state->index. No need fix here.
>
>>  	int enable = (int)state->num;
>>  
>>  	RTE_LOG(INFO, VHOST_CONFIG,
>> diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
>> index 97213c5..ee2e84d 100644
>> --- a/lib/librte_vhost/virtio-net.c
>> +++ b/lib/librte_vhost/virtio-net.c
>> @@ -778,6 +778,7 @@ static int
>>  set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
>>  {
>>  	struct virtio_net *dev;
>> +	uint32_t base_idx = file->index / VIRTIO_QNUM;
> As stated, vhost-cuse doesn't not support MQ.
>
> 	--yliu
>>  
>>  	dev = get_device(ctx);
>>  	if (dev == NULL)
>> @@ -791,8 +792,8 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
>>  	 * we add the device.
>>  	 */
>>  	if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
>> -		if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
>> -			((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
>> +		if (((int)dev->virtqueue[base_idx + VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED) &&
>> +			((int)dev->virtqueue[base_idx + VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED)) {
>>  			return notify_ops->new_device(dev);
>>  		}
>>  	/* Otherwise we remove it. */
>> -- 
>> 2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 1/3] vhost: Fix wrong handling of virtqueue array index
  2015-10-27  7:28             ` Tetsuya Mukawa
@ 2015-10-27  7:34               ` Yuanhan Liu
  0 siblings, 0 replies; 200+ messages in thread
From: Yuanhan Liu @ 2015-10-27  7:34 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Tue, Oct 27, 2015 at 04:28:58PM +0900, Tetsuya Mukawa wrote:
> Hi Yuanhan,
> 
> I appreciate your checking.

Welcome! And thank you for catching out my faults.

	--yliu
> I haven't noticed SET_BACKEND is only supported by vhost-cuse. :-(
> I will follow your comments, then submit again.
> 
> Thanks,
> Tetsuya
> 
> On 2015/10/27 15:47, Yuanhan Liu wrote:
> > On Tue, Oct 27, 2015 at 03:12:53PM +0900, Tetsuya Mukawa wrote:
> >> The patch fixes wrong handling of virtqueue array index.
> >>
> >> GET_VRING_BASE:
> >> The vhost backend will receive the message per virtqueue.
> >> Also we should call a destroy callback when both RXQ and TXQ receives
> >> the message.
> >>
> >> SET_BACKEND:
> >> Because vhost library supports multiple queue, the index may be over 2.
> >> Also a vhost frontend(QEMU) may send such a index.
> > Note that only vhost-user supports MQ. vhost-cuse does not.
> >
> >> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> >> ---
> >>  lib/librte_vhost/vhost_user/virtio-net-user.c | 22 +++++++++++-----------
> >>  lib/librte_vhost/virtio-net.c                 |  5 +++--
> >>  2 files changed, 14 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
> >> index a998ad8..3e8dfea 100644
> >> --- a/lib/librte_vhost/vhost_user/virtio-net-user.c
> >> +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
> >> @@ -283,12 +283,10 @@ user_get_vring_base(struct vhost_device_ctx ctx,
> >>  	struct vhost_vring_state *state)
> >>  {
> >>  	struct virtio_net *dev = get_device(ctx);
> >> +	uint16_t base_idx = state->index / VIRTIO_QNUM;
> > So, fixing what my 1st reply said, for Nth queue pair, state->index
> > is "N * 2 + is_tx". So, the base should be "state->index / 2 * 2".
> >
> >>  
> >>  	if (dev == NULL)
> >>  		return -1;
> >> -	/* We have to stop the queue (virtio) if it is running. */
> >> -	if (dev->flags & VIRTIO_DEV_RUNNING)
> >> -		notify_ops->destroy_device(dev);
> >>  
> >>  	/* Here we are safe to get the last used index */
> >>  	ops->get_vring_base(ctx, state->index, state);
> >> @@ -300,15 +298,17 @@ user_get_vring_base(struct vhost_device_ctx ctx,
> >>  	 * sent and only sent in vhost_vring_stop.
> >>  	 * TODO: cleanup the vring, it isn't usable since here.
> >>  	 */
> >> -	if (dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd >= 0) {
> >> -		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
> >> -		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
> >> -	}
> >> -	if (dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd >= 0) {
> >> -		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
> >> -		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
> >> +	if (dev->virtqueue[state->index]->kickfd >= 0) {
> >> +		close(dev->virtqueue[state->index]->kickfd);
> >> +		dev->virtqueue[state->index]->kickfd = -1;
> >>  	}
> >>  
> >> +	/* We have to stop the queue (virtio) if it is running. */
> >> +	if ((dev->flags & VIRTIO_DEV_RUNNING) &&
> >> +			(dev->virtqueue[base_idx + VIRTIO_RXQ]->kickfd == -1) &&
> >> +			(dev->virtqueue[base_idx + VIRTIO_TXQ]->kickfd == -1))
> >> +		notify_ops->destroy_device(dev);
> > This is a proper fix then. (You just need fix base_idx).
> >
> >>  	return 0;
> >>  }
> >>  
> >> @@ -321,7 +321,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
> >>  		      struct vhost_vring_state *state)
> >>  {
> >>  	struct virtio_net *dev = get_device(ctx);
> >> -	uint16_t base_idx = state->index;
> >> +	uint16_t base_idx = state->index / VIRTIO_QNUM;
> > user_set_vring_enable is sent per queue pair (I'm sure this time), so
> > base_idx equals to state->index. No need fix here.
> >
> >>  	int enable = (int)state->num;
> >>  
> >>  	RTE_LOG(INFO, VHOST_CONFIG,
> >> diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
> >> index 97213c5..ee2e84d 100644
> >> --- a/lib/librte_vhost/virtio-net.c
> >> +++ b/lib/librte_vhost/virtio-net.c
> >> @@ -778,6 +778,7 @@ static int
> >>  set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
> >>  {
> >>  	struct virtio_net *dev;
> >> +	uint32_t base_idx = file->index / VIRTIO_QNUM;
> > As stated, vhost-cuse doesn't not support MQ.
> >
> > 	--yliu
> >>  
> >>  	dev = get_device(ctx);
> >>  	if (dev == NULL)
> >> @@ -791,8 +792,8 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
> >>  	 * we add the device.
> >>  	 */
> >>  	if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
> >> -		if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
> >> -			((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
> >> +		if (((int)dev->virtqueue[base_idx + VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED) &&
> >> +			((int)dev->virtqueue[base_idx + VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED)) {
> >>  			return notify_ops->new_device(dev);
> >>  		}
> >>  	/* Otherwise we remove it. */
> >> -- 
> >> 2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 0/3] Add VHOST PMD
  2015-10-27  6:12       ` [PATCH 0/3] Add VHOST PMD Tetsuya Mukawa
                           ` (2 preceding siblings ...)
  2015-10-27  6:12         ` [PATCH 3/3] vhost: Add VHOST PMD Tetsuya Mukawa
@ 2015-10-27  7:54         ` Tetsuya Mukawa
  2015-10-30 18:30           ` Thomas Monjalon
  3 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-10-27  7:54 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying

Below patch has been submitted as a separate patch.

-  [dpdk-dev,1/3] vhost: Fix wrong handling of virtqueue array index
    (http://dpdk.org/dev/patchwork/patch/8038/)

Tetsuya

On 2015/10/27 15:12, Tetsuya Mukawa wrote:
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost. The patch will work on below patch series.
>  - [PATCH v5 00/28] remove pci driver from vdevs
>
> * Known issue.
> We may see issues while handling RESET_OWNER message.
> These handlings are done in vhost library, so not a part of vhost PMD.
> So far, we are waiting for QEMU fixing.
>
> PATCH v4 changes:
>  - Support vhost multiple queues.
>  - Rebase on "remove pci driver from vdevs".
>  - Optimize RX/TX functions.
>  - Fix resource leaks.
>  - Fix compile issue.
>  - Add patch to fix vhost library.
>
> PATCH v3 changes:
>  - Optimize performance.
>    In RX/TX functions, change code to access only per core data.
>  - Add below API to allow user to use vhost library APIs for a port managed
>    by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
>     - rte_eth_vhost_portid2vdev()
>    To support this functionality, vhost library is also changed.
>    Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
>  - Add code to support vhost multiple queues.
>    Actually, multiple queues functionality is not enabled so far.
>
> PATCH v2 changes:
>  - Fix issues reported by checkpatch.pl
>    (Thanks to Stephen Hemminger)
>
>
> Tetsuya Mukawa (3):
>   vhost: Fix wrong handling of virtqueue array index
>   vhost: Add callback and private data for vhost PMD
>   vhost: Add VHOST PMD
>
>  config/common_linuxapp                        |   6 +
>  doc/guides/nics/index.rst                     |   1 +
>  doc/guides/nics/vhost.rst                     |  82 +++
>  doc/guides/rel_notes/release_2_2.rst          |   2 +
>  drivers/net/Makefile                          |   4 +
>  drivers/net/vhost/Makefile                    |  62 +++
>  drivers/net/vhost/rte_eth_vhost.c             | 765 ++++++++++++++++++++++++++
>  drivers/net/vhost/rte_eth_vhost.h             |  65 +++
>  drivers/net/vhost/rte_pmd_vhost_version.map   |   8 +
>  lib/librte_vhost/rte_vhost_version.map        |   6 +
>  lib/librte_vhost/rte_virtio_net.h             |   3 +
>  lib/librte_vhost/vhost_user/virtio-net-user.c |  33 +-
>  lib/librte_vhost/virtio-net.c                 |  61 +-
>  lib/librte_vhost/virtio-net.h                 |   4 +-
>  mk/rte.app.mk                                 |   8 +-
>  15 files changed, 1085 insertions(+), 25 deletions(-)
>  create mode 100644 doc/guides/nics/vhost.rst
>  create mode 100644 drivers/net/vhost/Makefile
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.h
>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v2] vhost: Add VHOST PMD
  2015-10-22  9:50                       ` Tetsuya Mukawa
@ 2015-10-27 13:44                         ` Traynor, Kevin
  2015-10-28  2:24                           ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Traynor, Kevin @ 2015-10-27 13:44 UTC (permalink / raw)
  To: Tetsuya Mukawa, Richardson, Bruce, Panu Matilainen; +Cc: dev, ann.zhuangyanying


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tetsuya Mukawa

[snip]

> 
> Hi,
> 
> I have submitted latest patches.
> I will keep vhost library until we will have agreement to merge it to
> vhost PMD.

Longer term there are pros and cons to keeping the vhost library. Personally
I think it would make sense to remove sometime as trying to maintain two API's
has a cost, but I think adding a deprecation notice in DPDK 2.2 for removal in
DPDK 2.3 is very premature. Until it's proven *in the field* that the vhost PMD
is a suitable fully functioning replacement for the vhost library and users
have time to migrate, then please don't remove.

> 
> Regards,
> Testuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v2] vhost: Add VHOST PMD
  2015-10-27 13:44                         ` Traynor, Kevin
@ 2015-10-28  2:24                           ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-10-28  2:24 UTC (permalink / raw)
  To: Traynor, Kevin; +Cc: dev, ann.zhuangyanying

On 2015/10/27 22:44, Traynor, Kevin wrote:
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tetsuya Mukawa
> [snip]
>
>> Hi,
>>
>> I have submitted latest patches.
>> I will keep vhost library until we will have agreement to merge it to
>> vhost PMD.
> Longer term there are pros and cons to keeping the vhost library. Personally
> I think it would make sense to remove sometime as trying to maintain two API's
> has a cost, but I think adding a deprecation notice in DPDK 2.2 for removal in
> DPDK 2.3 is very premature. Until it's proven *in the field* that the vhost PMD
> is a suitable fully functioning replacement for the vhost library and users
> have time to migrate, then please don't remove.

Hi Kevin,

Thanks for commenting. I agree it's not the time to add deprecation notice.
(I haven't included it in the vhost PMD patches)

Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v3 2/2] vhost: Add VHOST PMD
  2015-10-22  9:45     ` [RFC PATCH v3 2/2] vhost: " Tetsuya Mukawa
  2015-10-22 12:49       ` Bruce Richardson
@ 2015-10-29 14:25       ` Xie, Huawei
  2015-10-30  1:18         ` Tetsuya Mukawa
  1 sibling, 1 reply; 200+ messages in thread
From: Xie, Huawei @ 2015-10-29 14:25 UTC (permalink / raw)
  To: Tetsuya Mukawa, dev, Loftus, Ciara; +Cc: ann.zhuangyanying

On 10/22/2015 5:48 PM, Tetsuya Mukawa wrote:
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> The vhost messages will be handled only when a port is started. So start
> a port first, then invoke QEMU.
Hi Tetsuya:
I haven't got bandwidth to review the details of this patch but i think
it is the very right thing to do. It is still RFC patch.Is your goal to
make it into 2.2?


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH v3 2/2] vhost: Add VHOST PMD
  2015-10-29 14:25       ` Xie, Huawei
@ 2015-10-30  1:18         ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-10-30  1:18 UTC (permalink / raw)
  To: Xie, Huawei, dev, Loftus, Ciara; +Cc: ann.zhuangyanying

On 2015/10/29 23:25, Xie, Huawei wrote:
> On 10/22/2015 5:48 PM, Tetsuya Mukawa wrote:
>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
>> The vhost messages will be handled only when a port is started. So start
>> a port first, then invoke QEMU.
> Hi Tetsuya:
> I haven't got bandwidth to review the details of this patch but i think
> it is the very right thing to do. It is still RFC patch.Is your goal to
> make it into 2.2?
>

Hi Xie,

Thanks for caring it. Yes, I want to merge it to DPDK-2.2.
I've already sent not RFC patches. Could you please check below?

Subject: [PATCH 0/3] Add VHOST PMD
Date: Tue, 27 Oct 2015 15:12:52 +0900
Message-Id: <1445926375-18986-1-git-send-email-mukawa@igel.co.jp>

Following patch involved in above patch series was submitted as a separate patch.
So please ignore it.
 - [PATCH 1/3] vhost: Fix wrong handling of virtqueue array index 

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 2/3] vhost: Add callback and private data for vhost PMD
  2015-10-27  6:12         ` [PATCH 2/3] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
@ 2015-10-30 17:49           ` Loftus, Ciara
  2015-11-02  3:15             ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Loftus, Ciara @ 2015-10-30 17:49 UTC (permalink / raw)
  To: Tetsuya Mukawa, dev; +Cc: ann.zhuangyanying

> 
> These variables are needed to be able to manage one of virtio devices
> using both vhost library APIs and vhost PMD.
> For example, if vhost PMD uses current callback handler and private data
> provided by vhost library, A DPDK application that links vhost library
> cannot use some of vhost library APIs. To avoid it, callback and private
> data for vhost PMD are needed.
> 
> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> ---
>  lib/librte_vhost/rte_vhost_version.map        |  6 +++
>  lib/librte_vhost/rte_virtio_net.h             |  3 ++
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
>  lib/librte_vhost/virtio-net.c                 | 56 +++++++++++++++++++++++++--
>  lib/librte_vhost/virtio-net.h                 |  4 +-
>  5 files changed, 70 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/librte_vhost/rte_vhost_version.map
> b/lib/librte_vhost/rte_vhost_version.map
> index 3d8709e..00a9ce5 100644
> --- a/lib/librte_vhost/rte_vhost_version.map
> +++ b/lib/librte_vhost/rte_vhost_version.map
> @@ -20,3 +20,9 @@ DPDK_2.1 {
>  	rte_vhost_driver_unregister;
> 
>  } DPDK_2.0;
> +
> +DPDK_2.2 {
> +	global:
> +
> +	rte_vhost_driver_pmd_callback_register;
> +} DPDK_2.1;
> diff --git a/lib/librte_vhost/rte_virtio_net.h
> b/lib/librte_vhost/rte_virtio_net.h
> index 426a70d..08e77af 100644
> --- a/lib/librte_vhost/rte_virtio_net.h
> +++ b/lib/librte_vhost/rte_virtio_net.h
> @@ -106,6 +106,7 @@ struct virtio_net {
>  	char			ifname[IF_NAME_SZ];	/**< Name of the tap
> device or socket path. */
>  	uint32_t		virt_qp_nb;	/**< number of queue pair
> we have allocated */
>  	void			*priv;		/**< private context */
> +	void			*pmd_priv;	/**< private context for
> vhost PMD */
>  	struct vhost_virtqueue
> 	*virtqueue[VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX];	/**< Contains
> all virtqueue information. */
>  } __rte_cache_aligned;
> 
> @@ -202,6 +203,8 @@ int rte_vhost_driver_unregister(const char
> *dev_name);
> 
>  /* Register callbacks. */
>  int rte_vhost_driver_callback_register(struct virtio_net_device_ops const *
> const);
> +/* Register callbacks for vhost PMD (Only for internal). */
> +int rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops
> const * const);
>  /* Start vhost driver session blocking loop. */
>  int rte_vhost_driver_session_start(void);
> 
> diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c
> b/lib/librte_vhost/vhost_user/virtio-net-user.c
> index 3e8dfea..dad083b 100644
> --- a/lib/librte_vhost/vhost_user/virtio-net-user.c
> +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
> @@ -111,7 +111,7 @@ user_set_mem_table(struct vhost_device_ctx ctx,
> struct VhostUserMsg *pmsg)
> 
>  	/* Remove from the data plane. */
>  	if (dev->flags & VIRTIO_DEV_RUNNING)
> -		notify_ops->destroy_device(dev);
> +		notify_destroy_device(dev);
> 
>  	if (dev->mem) {
>  		free_mem_region(dev);
> @@ -272,7 +272,7 @@ user_set_vring_kick(struct vhost_device_ctx ctx,
> struct VhostUserMsg *pmsg)
> 
>  	if (virtio_is_ready(dev) &&
>  		!(dev->flags & VIRTIO_DEV_RUNNING))
> -			notify_ops->new_device(dev);
> +			notify_new_device(dev);
>  }
> 
>  /*
> @@ -307,7 +307,7 @@ user_get_vring_base(struct vhost_device_ctx ctx,
>  	if ((dev->flags & VIRTIO_DEV_RUNNING) &&
>  			(dev->virtqueue[base_idx + VIRTIO_RXQ]->kickfd ==
> -1) &&
>  			(dev->virtqueue[base_idx + VIRTIO_TXQ]->kickfd ==
> -1))
> -		notify_ops->destroy_device(dev);
> +		notify_destroy_device(dev);
> 
>  	return 0;
>  }
> @@ -328,10 +328,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
>  		"set queue enable: %d to qp idx: %d\n",
>  		enable, state->index);
> 
> -	if (notify_ops->vring_state_changed) {
> -		notify_ops->vring_state_changed(dev, base_idx /
> VIRTIO_QNUM,
> -						enable);
> -	}
> +	notify_vring_state_changed(dev, base_idx / VIRTIO_QNUM,
> enable);
> 
>  	dev->virtqueue[base_idx + VIRTIO_RXQ]->enabled = enable;
>  	dev->virtqueue[base_idx + VIRTIO_TXQ]->enabled = enable;
> @@ -345,7 +342,7 @@ user_destroy_device(struct vhost_device_ctx ctx)
>  	struct virtio_net *dev = get_device(ctx);
> 
>  	if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
> -		notify_ops->destroy_device(dev);
> +		notify_destroy_device(dev);
> 
>  	if (dev && dev->mem) {
>  		free_mem_region(dev);
> diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
> index ee2e84d..de5d8ff 100644
> --- a/lib/librte_vhost/virtio-net.c
> +++ b/lib/librte_vhost/virtio-net.c
> @@ -65,6 +65,8 @@ struct virtio_net_config_ll {
> 
>  /* device ops to add/remove device to/from data core. */
>  struct virtio_net_device_ops const *notify_ops;
> +/* device ops for vhost PMD to add/remove device to/from data core. */
> +struct virtio_net_device_ops const *pmd_notify_ops;
>  /* root address of the linked list of managed virtio devices */
>  static struct virtio_net_config_ll *ll_root;
> 
> @@ -80,6 +82,43 @@ static struct virtio_net_config_ll *ll_root;
>  static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
> 
> 
> +int
> +notify_new_device(struct virtio_net *dev)
> +{
> +	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->new_device !=
> NULL)) {
> +		int ret = pmd_notify_ops->new_device(dev);
> +		if (ret != 0)
> +			return ret;
> +	}
> +	if ((notify_ops != NULL) && (notify_ops->new_device != NULL))
> +		return notify_ops->new_device(dev);
> +
> +	return 0;
> +}
> +
> +void
> +notify_destroy_device(volatile struct virtio_net *dev)
> +{
> +	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->destroy_device
> != NULL))
> +		pmd_notify_ops->destroy_device(dev);
> +	if ((notify_ops != NULL) && (notify_ops->destroy_device != NULL))
> +		notify_ops->destroy_device(dev);
> +}
> +
> +int
> +notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int
> enable)
> +{
> +	if ((pmd_notify_ops != NULL) && (pmd_notify_ops-
> >vring_state_changed != NULL)) {
> +		int ret = pmd_notify_ops->vring_state_changed(dev,
> queue_id, enable);
> +		if (ret != 0)
> +			return ret;
> +	}
> +	if ((notify_ops != NULL) && (notify_ops->vring_state_changed !=
> NULL))
> +		return notify_ops->vring_state_changed(dev, queue_id,
> enable);
> +
> +	return 0;
> +}
> +
>  /*
>   * Converts QEMU virtual address to Vhost virtual address. This function is
>   * used to convert the ring addresses to our address space.
> @@ -377,7 +416,7 @@ destroy_device(struct vhost_device_ctx ctx)
>  			 * the function to remove it from the data core.
>  			 */
>  			if ((ll_dev_cur->dev.flags & VIRTIO_DEV_RUNNING))
> -				notify_ops->destroy_device(&(ll_dev_cur-
> >dev));
> +				notify_destroy_device(&(ll_dev_cur->dev));
>  			ll_dev_cur = rm_config_ll_entry(ll_dev_cur,
>  					ll_dev_last);
>  		} else {
> @@ -794,12 +833,12 @@ set_backend(struct vhost_device_ctx ctx, struct
> vhost_vring_file *file)
>  	if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
>  		if (((int)dev->virtqueue[base_idx + VIRTIO_RXQ]->backend
> != VIRTIO_DEV_STOPPED) &&
>  			((int)dev->virtqueue[base_idx + VIRTIO_TXQ]-
> >backend != VIRTIO_DEV_STOPPED)) {
> -			return notify_ops->new_device(dev);
> +			return notify_new_device(dev);
>  		}
>  	/* Otherwise we remove it. */
>  	} else
>  		if (file->fd == VIRTIO_DEV_STOPPED)
> -			notify_ops->destroy_device(dev);
> +			notify_destroy_device(dev);
>  	return 0;
>  }
> 
> @@ -883,3 +922,14 @@ rte_vhost_driver_callback_register(struct
> virtio_net_device_ops const * const op
> 
>  	return 0;
>  }
> +
> +/*
> + * Register ops so that we can add/remove device to data core.
> + */
> +int
> +rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops
> const * const ops)
> +{
> +	pmd_notify_ops = ops;
> +
> +	return 0;
> +}
> diff --git a/lib/librte_vhost/virtio-net.h b/lib/librte_vhost/virtio-net.h
> index 75fb57e..0816e71 100644
> --- a/lib/librte_vhost/virtio-net.h
> +++ b/lib/librte_vhost/virtio-net.h
> @@ -37,7 +37,9 @@
>  #include "vhost-net.h"
>  #include "rte_virtio_net.h"
> 
> -struct virtio_net_device_ops const *notify_ops;
>  struct virtio_net *get_device(struct vhost_device_ctx ctx);
> 
> +int notify_new_device(struct virtio_net *dev);
> +void notify_destroy_device(volatile struct virtio_net *dev);
> +int notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id,
> int enable);
>  #endif
> --
> 2.1.4

Hi Tetsuya,

Thanks for implementing this. I haven't had a chance to actually test it, but if these changes allow users of the PMD to implement their own new_ and destroy_ device functions etc, that's good news.

Thanks,
Ciara

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 0/3] Add VHOST PMD
  2015-10-27  7:54         ` [PATCH 0/3] " Tetsuya Mukawa
@ 2015-10-30 18:30           ` Thomas Monjalon
  2015-11-02  3:15             ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Thomas Monjalon @ 2015-10-30 18:30 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

2015-10-27 16:54, Tetsuya Mukawa:
> Below patch has been submitted as a separate patch.
> 
> -  [dpdk-dev,1/3] vhost: Fix wrong handling of virtqueue array index
>     (http://dpdk.org/dev/patchwork/patch/8038/)

Please could you rebase only the two last patches?
Thanks

PS:
WARNING:TYPO_SPELLING: 'failuer' may be misspelled - perhaps 'failure'?
#606: FILE: drivers/net/vhost/rte_eth_vhost.c:272:
+               RTE_LOG(INFO, PMD, "failuer to find a ethdev\n");
WARNING:TYPO_SPELLING: 'accesing' may be misspelled - perhaps 'accessing'?
#612: FILE: drivers/net/vhost/rte_eth_vhost.c:278:
+       /* Wait until rx/tx_pkt_burst stops accesing vhost device */

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 2/3] vhost: Add callback and private data for vhost PMD
  2015-10-30 17:49           ` Loftus, Ciara
@ 2015-11-02  3:15             ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-02  3:15 UTC (permalink / raw)
  To: Loftus, Ciara, dev; +Cc: ann.zhuangyanying

On 2015/10/31 2:49, Loftus, Ciara wrote:
>> These variables are needed to be able to manage one of virtio devices
>> using both vhost library APIs and vhost PMD.
>> For example, if vhost PMD uses current callback handler and private data
>> provided by vhost library, A DPDK application that links vhost library
>> cannot use some of vhost library APIs. To avoid it, callback and private
>> data for vhost PMD are needed.
>>
>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
>> ---
>>  lib/librte_vhost/rte_vhost_version.map        |  6 +++
>>  lib/librte_vhost/rte_virtio_net.h             |  3 ++
>>  lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
>>  lib/librte_vhost/virtio-net.c                 | 56 +++++++++++++++++++++++++--
>>  lib/librte_vhost/virtio-net.h                 |  4 +-
>>  5 files changed, 70 insertions(+), 12 deletions(-)
>>
> Hi Tetsuya,
>
> Thanks for implementing this. I haven't had a chance to actually test it, but if these changes allow users of the PMD to implement their own new_ and destroy_ device functions etc, that's good news.
>
> Thanks,
> Ciara

Hi Ciara,

Yes, the patch works like you said.

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 0/3] Add VHOST PMD
  2015-10-30 18:30           ` Thomas Monjalon
@ 2015-11-02  3:15             ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-02  3:15 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, ann.zhuangyanying

On 2015/10/31 3:30, Thomas Monjalon wrote:
> 2015-10-27 16:54, Tetsuya Mukawa:
>> Below patch has been submitted as a separate patch.
>>
>> -  [dpdk-dev,1/3] vhost: Fix wrong handling of virtqueue array index
>>     (http://dpdk.org/dev/patchwork/patch/8038/)
> Please could you rebase only the two last patches?
> Thanks
>
> PS:
> WARNING:TYPO_SPELLING: 'failuer' may be misspelled - perhaps 'failure'?
> #606: FILE: drivers/net/vhost/rte_eth_vhost.c:272:
> +               RTE_LOG(INFO, PMD, "failuer to find a ethdev\n");
> WARNING:TYPO_SPELLING: 'accesing' may be misspelled - perhaps 'accessing'?
> #612: FILE: drivers/net/vhost/rte_eth_vhost.c:278:
> +       /* Wait until rx/tx_pkt_burst stops accesing vhost device */
>

Hi Thomas,

Thank you so much for checking my patches.
I have fixed a few typos, and rebased on latest tree (with Bernard's patch).
I will submit again soon.

Regards,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v2 0/2] Add VHOST PMD
  2015-10-27  6:12         ` [PATCH 3/3] vhost: Add VHOST PMD Tetsuya Mukawa
@ 2015-11-02  3:58           ` Tetsuya Mukawa
  2015-11-02  3:58             ` [PATCH v2 1/2] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
                               ` (2 more replies)
  2015-11-09 22:25           ` [PATCH 3/3] vhost: " Stephen Hemminger
  1 sibling, 3 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-02  3:58 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. The patch will work on below patch series.
 - [PATCH v7 00/28] remove pci driver from vdevs

* Known issue.
We may see issues while handling RESET_OWNER message.
These handlings are done in vhost library, so not a part of vhost PMD.
So far, we are waiting for QEMU fixing.

PATCH v2 changes:
 - Remove a below patch that fixes vhost library.
   The patch was applied as a separate patch.
   - vhost: fix crash with multiqueue enabled
 - Fix typos.
   (Thanks to Thomas, Monjalon)
 - Rebase on latest tree with above bernard's patches.

PATCH v1 changes:
 - Support vhost multiple queues.
 - Rebase on "remove pci driver from vdevs".
 - Optimize RX/TX functions.
 - Fix resource leaks.
 - Fix compile issue.
 - Add patch to fix vhost library.

RFC PATCH v3 changes:
 - Optimize performance.
   In RX/TX functions, change code to access only per core data.
 - Add below API to allow user to use vhost library APIs for a port managed
   by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
    - rte_eth_vhost_portid2vdev()
   To support this functionality, vhost library is also changed.
   Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
 - Add code to support vhost multiple queues.
   Actually, multiple queues functionality is not enabled so far.

RFC PATCH v2 changes:
 - Fix issues reported by checkpatch.pl
   (Thanks to Stephen Hemminger)


Tetsuya Mukawa (2):
  vhost: Add callback and private data for vhost PMD
  vhost: Add VHOST PMD

 config/common_linuxapp                        |   6 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/nics/vhost.rst                     |  82 +++
 doc/guides/rel_notes/release_2_2.rst          |   2 +
 drivers/net/Makefile                          |   4 +
 drivers/net/vhost/Makefile                    |  62 +++
 drivers/net/vhost/rte_eth_vhost.c             | 765 ++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h             |  65 +++
 drivers/net/vhost/rte_pmd_vhost_version.map   |   8 +
 lib/librte_vhost/rte_vhost_version.map        |   6 +
 lib/librte_vhost/rte_virtio_net.h             |   3 +
 lib/librte_vhost/vhost_user/virtio-net-user.c |  13 +-
 lib/librte_vhost/virtio-net.c                 |  56 +-
 lib/librte_vhost/virtio-net.h                 |   4 +-
 mk/rte.app.mk                                 |   8 +-
 15 files changed, 1072 insertions(+), 13 deletions(-)
 create mode 100644 doc/guides/nics/vhost.rst
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

-- 
2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v2 1/2] vhost: Add callback and private data for vhost PMD
  2015-11-02  3:58           ` [PATCH v2 0/2] " Tetsuya Mukawa
@ 2015-11-02  3:58             ` Tetsuya Mukawa
  2015-11-09  5:16               ` [PATCH v3 0/2] Add VHOST PMD Tetsuya Mukawa
  2015-11-02  3:58             ` [PATCH v2 2/2] vhost: " Tetsuya Mukawa
  2015-11-05  2:17             ` [PATCH v2 0/2] " Tetsuya Mukawa
  2 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-02  3:58 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying

These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs. To avoid it, callback and private
data for vhost PMD are needed.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 lib/librte_vhost/rte_vhost_version.map        |  6 +++
 lib/librte_vhost/rte_virtio_net.h             |  3 ++
 lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
 lib/librte_vhost/virtio-net.c                 | 56 +++++++++++++++++++++++++--
 lib/librte_vhost/virtio-net.h                 |  4 +-
 5 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 3d8709e..00a9ce5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -20,3 +20,9 @@ DPDK_2.1 {
 	rte_vhost_driver_unregister;
 
 } DPDK_2.0;
+
+DPDK_2.2 {
+	global:
+
+	rte_vhost_driver_pmd_callback_register;
+} DPDK_2.1;
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index b6386f9..033edde 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -121,6 +121,7 @@ struct virtio_net {
 	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
 	uint32_t		virt_qp_nb;	/**< number of queue pair we have allocated */
 	void			*priv;		/**< private context */
+	void			*pmd_priv;	/**< private context for vhost PMD */
 	struct vhost_virtqueue	*virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];	/**< Contains all virtqueue information. */
 } __rte_cache_aligned;
 
@@ -217,6 +218,8 @@ int rte_vhost_driver_unregister(const char *dev_name);
 
 /* Register callbacks. */
 int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const);
+/* Register callbacks for vhost PMD (Only for internal). */
+int rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const);
 /* Start vhost driver session blocking loop. */
 int rte_vhost_driver_session_start(void);
 
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index d07452a..d8ae2fc 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -111,7 +111,7 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
 
 	/* Remove from the data plane. */
 	if (dev->flags & VIRTIO_DEV_RUNNING)
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	if (dev->mem) {
 		free_mem_region(dev);
@@ -272,7 +272,7 @@ user_set_vring_kick(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
 
 	if (virtio_is_ready(dev) &&
 		!(dev->flags & VIRTIO_DEV_RUNNING))
-			notify_ops->new_device(dev);
+			notify_new_device(dev);
 }
 
 /*
@@ -288,7 +288,7 @@ user_get_vring_base(struct vhost_device_ctx ctx,
 		return -1;
 	/* We have to stop the queue (virtio) if it is running. */
 	if (dev->flags & VIRTIO_DEV_RUNNING)
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	/* Here we are safe to get the last used index */
 	ops->get_vring_base(ctx, state->index, state);
@@ -324,10 +324,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
 		"set queue enable: %d to qp idx: %d\n",
 		enable, state->index);
 
-	if (notify_ops->vring_state_changed) {
-		notify_ops->vring_state_changed(dev, base_idx / VIRTIO_QNUM,
-						enable);
-	}
+	notify_vring_state_changed(dev, base_idx / VIRTIO_QNUM, enable);
 
 	dev->virtqueue[base_idx + VIRTIO_RXQ]->enabled = enable;
 	dev->virtqueue[base_idx + VIRTIO_TXQ]->enabled = enable;
@@ -341,7 +338,7 @@ user_destroy_device(struct vhost_device_ctx ctx)
 	struct virtio_net *dev = get_device(ctx);
 
 	if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	if (dev && dev->mem) {
 		free_mem_region(dev);
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 3e82605..ee54beb 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -65,6 +65,8 @@ struct virtio_net_config_ll {
 
 /* device ops to add/remove device to/from data core. */
 struct virtio_net_device_ops const *notify_ops;
+/* device ops for vhost PMD to add/remove device to/from data core. */
+struct virtio_net_device_ops const *pmd_notify_ops;
 /* root address of the linked list of managed virtio devices */
 static struct virtio_net_config_ll *ll_root;
 
@@ -80,6 +82,43 @@ static struct virtio_net_config_ll *ll_root;
 static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
 
 
+int
+notify_new_device(struct virtio_net *dev)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->new_device != NULL)) {
+		int ret = pmd_notify_ops->new_device(dev);
+		if (ret != 0)
+			return ret;
+	}
+	if ((notify_ops != NULL) && (notify_ops->new_device != NULL))
+		return notify_ops->new_device(dev);
+
+	return 0;
+}
+
+void
+notify_destroy_device(volatile struct virtio_net *dev)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->destroy_device != NULL))
+		pmd_notify_ops->destroy_device(dev);
+	if ((notify_ops != NULL) && (notify_ops->destroy_device != NULL))
+		notify_ops->destroy_device(dev);
+}
+
+int
+notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->vring_state_changed != NULL)) {
+		int ret = pmd_notify_ops->vring_state_changed(dev, queue_id, enable);
+		if (ret != 0)
+			return ret;
+	}
+	if ((notify_ops != NULL) && (notify_ops->vring_state_changed != NULL))
+		return notify_ops->vring_state_changed(dev, queue_id, enable);
+
+	return 0;
+}
+
 /*
  * Converts QEMU virtual address to Vhost virtual address. This function is
  * used to convert the ring addresses to our address space.
@@ -377,7 +416,7 @@ destroy_device(struct vhost_device_ctx ctx)
 			 * the function to remove it from the data core.
 			 */
 			if ((ll_dev_cur->dev.flags & VIRTIO_DEV_RUNNING))
-				notify_ops->destroy_device(&(ll_dev_cur->dev));
+				notify_destroy_device(&(ll_dev_cur->dev));
 			ll_dev_cur = rm_config_ll_entry(ll_dev_cur,
 					ll_dev_last);
 		} else {
@@ -793,12 +832,12 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 	if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
 		if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
 			((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
-			return notify_ops->new_device(dev);
+			return notify_new_device(dev);
 		}
 	/* Otherwise we remove it. */
 	} else
 		if (file->fd == VIRTIO_DEV_STOPPED)
-			notify_ops->destroy_device(dev);
+			notify_destroy_device(dev);
 	return 0;
 }
 
@@ -882,3 +921,14 @@ rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const op
 
 	return 0;
 }
+
+/*
+ * Register ops so that we can add/remove device to data core.
+ */
+int
+rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const ops)
+{
+	pmd_notify_ops = ops;
+
+	return 0;
+}
diff --git a/lib/librte_vhost/virtio-net.h b/lib/librte_vhost/virtio-net.h
index 75fb57e..0816e71 100644
--- a/lib/librte_vhost/virtio-net.h
+++ b/lib/librte_vhost/virtio-net.h
@@ -37,7 +37,9 @@
 #include "vhost-net.h"
 #include "rte_virtio_net.h"
 
-struct virtio_net_device_ops const *notify_ops;
 struct virtio_net *get_device(struct vhost_device_ctx ctx);
 
+int notify_new_device(struct virtio_net *dev);
+void notify_destroy_device(volatile struct virtio_net *dev);
+int notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable);
 #endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v2 2/2] vhost: Add VHOST PMD
  2015-11-02  3:58           ` [PATCH v2 0/2] " Tetsuya Mukawa
  2015-11-02  3:58             ` [PATCH v2 1/2] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
@ 2015-11-02  3:58             ` Tetsuya Mukawa
  2015-11-06  2:22               ` Yuanhan Liu
  2015-11-05  2:17             ` [PATCH v2 0/2] " Tetsuya Mukawa
  2 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-02  3:58 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
 - iface:  The parameter is used to specify a path to connect to a
           virtio-net device.
 - queues: The parameter is used to specify the number of the queues
           virtio-net device has.
           (Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
        <snip>
        -chardev socket,id=chr0,path=/tmp/sock0 \
        -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
        -device virtio-net-pci,netdev=net0

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 config/common_linuxapp                      |   6 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/nics/vhost.rst                   |  82 +++
 doc/guides/rel_notes/release_2_2.rst        |   2 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 +++
 drivers/net/vhost/rte_eth_vhost.c           | 765 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           |  65 +++
 drivers/net/vhost/rte_pmd_vhost_version.map |   8 +
 mk/rte.app.mk                               |   8 +-
 10 files changed, 1002 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/nics/vhost.rst
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/config/common_linuxapp b/config/common_linuxapp
index c1d4bbd..fd103e7 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -457,6 +457,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 
 #
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 2d4936d..57d1041 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -47,6 +47,7 @@ Network Interface Controller Drivers
     mlx4
     mlx5
     virtio
+    vhost
     vmxnet3
     pcap_ring
 
diff --git a/doc/guides/nics/vhost.rst b/doc/guides/nics/vhost.rst
new file mode 100644
index 0000000..2ec8d79
--- /dev/null
+++ b/doc/guides/nics/vhost.rst
@@ -0,0 +1,82 @@
+..  BSD LICENSE
+    Copyright(c) 2015 IGEL Co., Ltd.. All rights reserved.
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of IGEL Co., Ltd. nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Poll Mode Driver that wraps vhost library
+=========================================
+
+This PMD is a thin wrapper of the DPDK vhost library.
+The User can handle virtqueues as one of normal DPDK port.
+
+Vhost Implementation in DPDK
+----------------------------
+
+Please refer to Chapter "Vhost Library" of Programmer's Guide to know detail of vhost.
+
+Features and Limitations of vhost PMD
+-------------------------------------
+
+In this release, the vhost PMD provides the basic functionality of packet reception and transmission.
+
+*   It provides the function to convert port_id to a pointer of virtio_net device.
+    It allows the user to use vhost library with the PMD in parallel.
+
+*   It has multiple queues support.
+
+*   It supports Port Hotplug functionality.
+
+*   Don't need to stop RX/TX, when the user wants to stop a guest or a virtio-net driver on guest.
+
+Vhost PMD with testpmd application
+----------------------------------
+
+This section demonstrates vhost PMD with testpmd DPDK sample application.
+
+#.  Launch the testpmd with vhost PMD:
+
+    .. code-block:: console
+
+        ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
+
+    Other basic DPDK preparations like hugepage enabling here.
+    Please refer to the *DPDK Getting Started Guide* for detailed instructions.
+
+#.  Launch the QEMU:
+
+    .. code-block:: console
+
+       qemu-system-x86_64 <snip>
+                   -chardev socket,id=chr0,path=/tmp/sock0 \
+                   -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
+                   -device virtio-net-pci,netdev=net0
+
+    This command generates one virtio-net device for QEMU.
+    Once device is recognized by guest, The user can handle the device as normal
+    virtio-net device.
+    When initialization processes between virtio-net driver and vhost library are done, Port status of the testpmd will be linked up.
diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 429dfe6..466c1de 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -58,6 +58,8 @@ New Features
 * **Added port hotplug support to xenvirt.**
 
 
+* **Added vhost PMD.**
+
 * **Removed the PCI device from vdev PMD's.**
 
   * This change required modifications to librte_ether and all vdev and pdev PMD's.
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 6da1ce2..66eb63d 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -50,5 +50,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 include $(RTE_SDK)/mk/rte.sharelib.mk
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..8186a80
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+#   BSD LICENSE
+#
+#   Copyright (c) 2010-2015 Intel Corporation.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..5e6da9a
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,765 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2015 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co.,Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+
+#include "rte_eth_vhost.h"
+
+#define ETH_VHOST_IFACE_ARG		"iface"
+#define ETH_VHOST_QUEUES_ARG		"queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+	ETH_VHOST_IFACE_ARG,
+	ETH_VHOST_QUEUES_ARG,
+	NULL
+};
+
+static struct ether_addr base_eth_addr = {
+	.addr_bytes = {
+		0x56 /* V */,
+		0x48 /* H */,
+		0x4F /* O */,
+		0x53 /* S */,
+		0x54 /* T */,
+		0x00
+	}
+};
+
+struct vhost_queue {
+	struct virtio_net *device;
+	struct pmd_internal *internal;
+	struct rte_mempool *mb_pool;
+	rte_atomic32_t allow_queuing;
+	rte_atomic32_t while_queuing;
+	uint64_t rx_pkts;
+	uint64_t tx_pkts;
+	uint64_t err_pkts;
+};
+
+struct pmd_internal {
+	TAILQ_ENTRY(pmd_internal) next;
+	char *dev_name;
+	char *iface_name;
+	unsigned nb_rx_queues;
+	unsigned nb_tx_queues;
+
+	struct vhost_queue *rx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+	struct vhost_queue *tx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+
+	volatile uint16_t once;
+	pthread_t session_th;
+};
+
+TAILQ_HEAD(pmd_internal_head, pmd_internal);
+static struct pmd_internal_head internals_list =
+	TAILQ_HEAD_INITIALIZER(internals_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static struct rte_eth_link pmd_link = {
+		.link_speed = 10000,
+		.link_duplex = ETH_LINK_FULL_DUPLEX,
+		.link_status = 0
+};
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t nb_rx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Dequeue packets from guest TX queue */
+	nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
+			VIRTIO_TXQ, r->mb_pool, bufs, nb_bufs);
+
+	r->rx_pkts += nb_rx;
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_tx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Enqueue packets to guest RX queue */
+	nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
+			VIRTIO_RXQ, bufs, nb_bufs);
+
+	r->tx_pkts += nb_tx;
+	r->err_pkts += nb_bufs - nb_tx;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		rte_pktmbuf_free(bufs[i]);
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { return 0; }
+
+static inline struct pmd_internal *
+find_internal_resource(char *ifname)
+{
+	int found = 0;
+	struct pmd_internal *internal;
+
+	if (ifname == NULL)
+		return NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(internal, &internals_list, next) {
+		if (!strcmp(internal->iface_name, ifname)) {
+			found = 1;
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	if (!found)
+		return NULL;
+
+	return internal;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	internal = find_internal_resource(dev->ifname);
+	if (internal == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid device name\n");
+		return -1;
+	}
+
+	if ((dev->virt_qp_nb < internal->nb_rx_queues) ||
+			(dev->virt_qp_nb < internal->nb_tx_queues)) {
+		RTE_LOG(INFO, PMD, "Not enough queues\n");
+		return -1;
+	}
+
+	eth_dev = rte_eth_dev_allocated(internal->dev_name);
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+		return -1;
+	}
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+	}
+
+	dev->flags |= VIRTIO_DEV_RUNNING;
+	dev->pmd_priv = eth_dev;
+	eth_dev->data->dev_link.link_status = 1;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	RTE_LOG(INFO, PMD, "New connection established\n");
+
+	return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return;
+	}
+
+	eth_dev = (struct rte_eth_dev *)dev->pmd_priv;
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+		return;
+	}
+
+	internal = eth_dev->data->dev_private;
+
+	/* Wait until rx/tx_pkt_burst stops accessing vhost device */
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+
+	eth_dev->data->dev_link.link_status = 0;
+
+	dev->pmd_priv = NULL;
+	dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+
+	RTE_LOG(INFO, PMD, "Connection closed\n");
+}
+
+static void *vhost_driver_session(void *param __rte_unused)
+{
+	static struct virtio_net_device_ops *vhost_ops;
+
+	vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
+	if (vhost_ops == NULL)
+		rte_panic("Can't allocate memory\n");
+
+	/* set vhost arguments */
+	vhost_ops->new_device = new_device;
+	vhost_ops->destroy_device = destroy_device;
+	if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
+		rte_panic("Can't register callbacks\n");
+
+	/* start event handling */
+	rte_vhost_driver_session_start();
+
+	rte_free(vhost_ops);
+	pthread_exit(0);
+}
+
+static void vhost_driver_session_start(struct pmd_internal *internal)
+{
+	int ret;
+
+	ret = pthread_create(&internal->session_th,
+			NULL, vhost_driver_session, NULL);
+	if (ret)
+		rte_panic("Can't create a thread\n");
+}
+
+static void vhost_driver_session_stop(struct pmd_internal *internal)
+{
+	int ret;
+
+	ret = pthread_cancel(internal->session_th);
+	if (ret)
+		rte_panic("Can't cancel the thread\n");
+
+	ret = pthread_join(internal->session_th, NULL);
+	if (ret)
+		rte_panic("Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	int ret;
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+		ret = rte_vhost_driver_register(internal->iface_name);
+		if (ret)
+			return ret;
+
+		vhost_driver_session_start(internal);
+	}
+	return 0;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 1, 0)) {
+		rte_vhost_driver_unregister(internal->iface_name);
+		vhost_driver_session_stop(internal);
+	}
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+	struct vhost_queue *vq;
+
+	if (internal->rx_vhost_queues[rx_queue_id] != NULL)
+		rte_free(internal->rx_vhost_queues[rx_queue_id]);
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->mb_pool = mb_pool;
+	internal->rx_vhost_queues[rx_queue_id] = vq;
+	dev->data->rx_queues[rx_queue_id] = vq;
+	return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+	struct vhost_queue *vq;
+
+	if (internal->tx_vhost_queues[tx_queue_id] != NULL)
+		rte_free(internal->tx_vhost_queues[tx_queue_id]);
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+		return -ENOMEM;
+	}
+
+	internal->tx_vhost_queues[tx_queue_id] = vq;
+	dev->data->tx_queues[tx_queue_id] = vq;
+	return 0;
+}
+
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+	     struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	dev_info->driver_name = drivername;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = (uint32_t)-1;
+	dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
+	dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
+	dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+	unsigned i;
+	unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
+	const struct pmd_internal *internal = dev->data->dev_private;
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+	     i < internal->nb_rx_queues; i++) {
+		if (internal->rx_vhost_queues[i] == NULL)
+			continue;
+		igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i]->rx_pkts;
+		rx_total += igb_stats->q_ipackets[i];
+	}
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+	     i < internal->nb_tx_queues; i++) {
+		if (internal->tx_vhost_queues[i] == NULL)
+			continue;
+		igb_stats->q_opackets[i] = internal->tx_vhost_queues[i]->tx_pkts;
+		igb_stats->q_errors[i] = internal->tx_vhost_queues[i]->err_pkts;
+		tx_total += igb_stats->q_opackets[i];
+		tx_err_total += igb_stats->q_errors[i];
+	}
+
+	igb_stats->ipackets = rx_total;
+	igb_stats->opackets = tx_total;
+	igb_stats->oerrors = tx_err_total;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	unsigned i;
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		if (internal->rx_vhost_queues[i] == NULL)
+			continue;
+		internal->rx_vhost_queues[i]->rx_pkts = 0;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		if (internal->tx_vhost_queues[i] == NULL)
+			continue;
+		internal->tx_vhost_queues[i]->tx_pkts = 0;
+		internal->tx_vhost_queues[i]->err_pkts = 0;
+	}
+}
+
+static void
+eth_queue_release(void *q __rte_unused) { ; }
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused) { return 0; }
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, int index,
+		     char *iface_name,
+		     int16_t queues,
+		     const unsigned numa_node)
+{
+	struct rte_eth_dev_data *data = NULL;
+	struct pmd_internal *internal = NULL;
+	struct rte_eth_dev *eth_dev = NULL;
+	struct ether_addr *eth_addr = NULL;
+
+	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+		numa_node);
+
+	/* now do all data allocation - for eth_dev structure, dummy pci driver
+	 * and internal (private) data
+	 */
+	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+	if (data == NULL)
+		goto error;
+
+	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+	if (internal == NULL)
+		goto error;
+
+	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+	if (eth_addr == NULL)
+		goto error;
+	*eth_addr = base_eth_addr;
+	eth_addr->addr_bytes[5] = index;
+
+	/* reserve an ethdev entry */
+	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+	if (eth_dev == NULL)
+		goto error;
+
+	/* now put it all together
+	 * - store queue data in internal,
+	 * - store numa_node info in ethdev data
+	 * - point eth_dev_data to internals
+	 * - and point eth_dev structure to new eth_dev_data structure
+	 */
+	internal->nb_rx_queues = queues;
+	internal->nb_tx_queues = queues;
+	internal->dev_name = strdup(name);
+	if (internal->dev_name == NULL)
+		goto error;
+	internal->iface_name = strdup(iface_name);
+	if (internal->iface_name == NULL)
+		goto error;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internals_list, internal, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	data->dev_private = internal;
+	data->port_id = eth_dev->data->port_id;
+	memmove(data->name, eth_dev->data->name, sizeof(data->name));
+	data->nb_rx_queues = queues;
+	data->nb_tx_queues = queues;
+	data->dev_link = pmd_link;
+	data->mac_addrs = eth_addr;
+
+	/* We'll replace the 'data' originally allocated by eth_dev. So the
+	 * vhost PMD resources won't be shared between multi processes.
+	 */
+	eth_dev->data = data;
+	eth_dev->dev_ops = &ops;
+	eth_dev->driver = NULL;
+	eth_dev->data->dev_flags = RTE_ETH_DEV_DETACHABLE;
+	eth_dev->data->kdrv = RTE_KDRV_NONE;
+	eth_dev->data->drv_name = internal->dev_name;
+	eth_dev->data->numa_node = numa_node;
+
+	/* finally assign rx and tx ops */
+	eth_dev->rx_pkt_burst = eth_vhost_rx;
+	eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+	return data->port_id;
+
+error:
+	rte_free(data);
+	rte_free(internal);
+	rte_free(eth_addr);
+
+	return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	const char **iface_name = extra_args;
+
+	if (value == NULL)
+		return -1;
+
+	*iface_name = value;
+
+	return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *q = extra_args;
+
+	if ((value == NULL) || (extra_args == NULL))
+		return -EINVAL;
+
+	*q = (uint16_t)strtoul(value, NULL, 0);
+	if ((*q == USHRT_MAX) && (errno == ERANGE))
+		return -1;
+
+	if (*q > RTE_MAX_QUEUES_PER_PORT)
+		return -1;
+
+	return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
+	int index;
+	char *iface_name;
+	uint16_t queues;
+
+	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+		return -1;
+
+	if (strlen(name) < strlen("eth_vhost"))
+		return -1;
+
+	index = strtol(name + strlen("eth_vhost"), NULL, 0);
+	if (errno == ERANGE)
+		return -1;
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+					 &open_iface, &iface_name);
+		if (ret < 0)
+			goto out_free;
+	}
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+					 &open_queues, &queues);
+		if (ret < 0)
+			goto out_free;
+
+	} else
+		queues = 1;
+
+	eth_dev_vhost_create(name, index,
+			iface_name, queues, rte_socket_id());
+
+out_free:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internal *internal;
+	unsigned int i;
+
+	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+	if (name == NULL)
+		return -EINVAL;
+
+	/* find an ethdev entry */
+	eth_dev = rte_eth_dev_allocated(name);
+	if (eth_dev == NULL)
+		return -ENODEV;
+
+	internal = eth_dev->data->dev_private;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_REMOVE(&internals_list, internal, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	eth_dev_stop(eth_dev);
+
+	if ((internal) && (internal->dev_name))
+		free(internal->dev_name);
+	if ((internal) && (internal->iface_name))
+		free(internal->iface_name);
+
+	rte_free(eth_dev->data->mac_addrs);
+	rte_free(eth_dev->data);
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		if (internal->rx_vhost_queues[i] != NULL)
+			rte_free(internal->rx_vhost_queues[i]);
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		if (internal->tx_vhost_queues[i] != NULL)
+			rte_free(internal->tx_vhost_queues[i]);
+	}
+	rte_free(internal);
+
+	rte_eth_dev_release_port(eth_dev);
+
+	return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+	.name = "eth_vhost",
+	.type = PMD_VDEV,
+	.init = rte_pmd_vhost_devinit,
+	.uninit = rte_pmd_vhost_devuninit,
+};
+
+struct
+virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
+{
+	struct rte_eth_dev *eth_dev;
+
+	if (rte_eth_dev_is_valid_port(port_id) == 0)
+		return NULL;
+
+	eth_dev = &rte_eth_devices[port_id];
+	if (strncmp("eth_vhost", eth_dev->data->drv_name,
+				strlen("eth_vhost")) == 0) {
+		struct pmd_internal *internal;
+		struct vhost_queue *vq;
+
+		internal = eth_dev->data->dev_private;
+		vq = internal->rx_vhost_queues[0];
+		if ((vq != NULL) && (vq->device != NULL))
+			return vq->device;
+	}
+
+	return NULL;
+}
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..22a880f
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,65 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co., Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_AF_PACKET_H_
+#define _RTE_ETH_AF_PACKET_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_virtio_net.h>
+
+/**
+ * The function convert specified port_id to virtio device structure.
+ * The retured device can be used for vhost library APIs.
+ * To use vhost library APIs and vhost PMD parallely, below API should
+ * not be called, because the API will be called by vhost PMD.
+ * - rte_vhost_driver_session_start()
+ * Once a device is managed by vhost PMD, below API should not be called.
+ * - rte_vhost_driver_unregister()
+ * To unregister the device, call Port Hotplug APIs.
+ *
+ * @param port_id
+ *  port number
+ * @return
+ *  virtio net device structure corresponding to the specified port
+ *  NULL will be returned in error cases.
+ */
+struct virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..bf0361a
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,8 @@
+DPDK_2.2 {
+
+	global:
+
+	rte_eth_vhost_portid2vdev;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 724efa7..1af4bb3 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -148,7 +148,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP)       += -lrte_pmd_pcap
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL)       += -lrte_pmd_null
 
-endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
+
+endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
+
+endif # $(CONFIG_RTE_BUILD_SHARED_LIB)
 
 endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 0/2] Add VHOST PMD
  2015-11-02  3:58           ` [PATCH v2 0/2] " Tetsuya Mukawa
  2015-11-02  3:58             ` [PATCH v2 1/2] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
  2015-11-02  3:58             ` [PATCH v2 2/2] vhost: " Tetsuya Mukawa
@ 2015-11-05  2:17             ` Tetsuya Mukawa
  2 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-05  2:17 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying

Hi,

Could someone please review below patch series?

Regards,
Tetsuya

On 2015/11/02 12:58, Tetsuya Mukawa wrote:
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost. The patch will work on below patch series.
>  - [PATCH v7 00/28] remove pci driver from vdevs
>
> * Known issue.
> We may see issues while handling RESET_OWNER message.
> These handlings are done in vhost library, so not a part of vhost PMD.
> So far, we are waiting for QEMU fixing.
>
> PATCH v2 changes:
>  - Remove a below patch that fixes vhost library.
>    The patch was applied as a separate patch.
>    - vhost: fix crash with multiqueue enabled
>  - Fix typos.
>    (Thanks to Thomas, Monjalon)
>  - Rebase on latest tree with above bernard's patches.
>
> PATCH v1 changes:
>  - Support vhost multiple queues.
>  - Rebase on "remove pci driver from vdevs".
>  - Optimize RX/TX functions.
>  - Fix resource leaks.
>  - Fix compile issue.
>  - Add patch to fix vhost library.
>
> RFC PATCH v3 changes:
>  - Optimize performance.
>    In RX/TX functions, change code to access only per core data.
>  - Add below API to allow user to use vhost library APIs for a port managed
>    by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
>     - rte_eth_vhost_portid2vdev()
>    To support this functionality, vhost library is also changed.
>    Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
>  - Add code to support vhost multiple queues.
>    Actually, multiple queues functionality is not enabled so far.
>
> RFC PATCH v2 changes:
>  - Fix issues reported by checkpatch.pl
>    (Thanks to Stephen Hemminger)
>
>
> Tetsuya Mukawa (2):
>   vhost: Add callback and private data for vhost PMD
>   vhost: Add VHOST PMD
>
>  config/common_linuxapp                        |   6 +
>  doc/guides/nics/index.rst                     |   1 +
>  doc/guides/nics/vhost.rst                     |  82 +++
>  doc/guides/rel_notes/release_2_2.rst          |   2 +
>  drivers/net/Makefile                          |   4 +
>  drivers/net/vhost/Makefile                    |  62 +++
>  drivers/net/vhost/rte_eth_vhost.c             | 765 ++++++++++++++++++++++++++
>  drivers/net/vhost/rte_eth_vhost.h             |  65 +++
>  drivers/net/vhost/rte_pmd_vhost_version.map   |   8 +
>  lib/librte_vhost/rte_vhost_version.map        |   6 +
>  lib/librte_vhost/rte_virtio_net.h             |   3 +
>  lib/librte_vhost/vhost_user/virtio-net-user.c |  13 +-
>  lib/librte_vhost/virtio-net.c                 |  56 +-
>  lib/librte_vhost/virtio-net.h                 |   4 +-
>  mk/rte.app.mk                                 |   8 +-
>  15 files changed, 1072 insertions(+), 13 deletions(-)
>  create mode 100644 doc/guides/nics/vhost.rst
>  create mode 100644 drivers/net/vhost/Makefile
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.h
>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 2/2] vhost: Add VHOST PMD
  2015-11-02  3:58             ` [PATCH v2 2/2] vhost: " Tetsuya Mukawa
@ 2015-11-06  2:22               ` Yuanhan Liu
  2015-11-06  3:54                 ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-11-06  2:22 UTC (permalink / raw)
  To: Tetsuya Mukawa, Michael S. Tsirkin; +Cc: dev, ann.zhuangyanying

On Mon, Nov 02, 2015 at 12:58:57PM +0900, Tetsuya Mukawa wrote:
...
> +
> +static uint16_t
> +eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
> +	struct vhost_queue *r = q;
> +	uint16_t nb_rx = 0;
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		return 0;
> +
> +	rte_atomic32_set(&r->while_queuing, 1);
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		goto out;
> +
> +	/* Dequeue packets from guest TX queue */
> +	nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
> +			VIRTIO_TXQ, r->mb_pool, bufs, nb_bufs);
> +
> +	r->rx_pkts += nb_rx;
> +
> +out:
> +	rte_atomic32_set(&r->while_queuing, 0);
> +
> +	return nb_rx;
> +}
> +
> +static uint16_t
> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
> +	struct vhost_queue *r = q;
> +	uint16_t i, nb_tx = 0;
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		return 0;
> +
> +	rte_atomic32_set(&r->while_queuing, 1);
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		goto out;
> +
> +	/* Enqueue packets to guest RX queue */
> +	nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
> +			VIRTIO_RXQ, bufs, nb_bufs);
> +

Michael, I'm wondering here might be the better place to do "automatic
receive steering in multiqueue mode". I mean, as a library function,
queueing/dequeueing packets to/from a specific virt queue is reasonable
to me. It's upto the caller to pick the right queue, doing the queue
steering.

As an eth dev, I guess that's the proper place to do things like that.

Or, I'm thinking we could introduce another vhost function, for not
breaking current API, to do that, returning the right queue, so that
other applications (instead of the vhost pmd only) can use that as well.

Tetsuya, just in case you missed the early discussion about automic
receive steering, here is a link:

  http://dpdk.org/ml/archives/dev/2015-October/025779.html


	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v2 2/2] vhost: Add VHOST PMD
  2015-11-06  2:22               ` Yuanhan Liu
@ 2015-11-06  3:54                 ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-06  3:54 UTC (permalink / raw)
  To: Yuanhan Liu, Michael S. Tsirkin; +Cc: dev, ann.zhuangyanying

On 2015/11/06 11:22, Yuanhan Liu wrote:
> On Mon, Nov 02, 2015 at 12:58:57PM +0900, Tetsuya Mukawa wrote:
> ...
>> +
>> +static uint16_t
>> +eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
>> +{
>> +	struct vhost_queue *r = q;
>> +	uint16_t nb_rx = 0;
>> +
>> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
>> +		return 0;
>> +
>> +	rte_atomic32_set(&r->while_queuing, 1);
>> +
>> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
>> +		goto out;
>> +
>> +	/* Dequeue packets from guest TX queue */
>> +	nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
>> +			VIRTIO_TXQ, r->mb_pool, bufs, nb_bufs);
>> +
>> +	r->rx_pkts += nb_rx;
>> +
>> +out:
>> +	rte_atomic32_set(&r->while_queuing, 0);
>> +
>> +	return nb_rx;
>> +}
>> +
>> +static uint16_t
>> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
>> +{
>> +	struct vhost_queue *r = q;
>> +	uint16_t i, nb_tx = 0;
>> +
>> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
>> +		return 0;
>> +
>> +	rte_atomic32_set(&r->while_queuing, 1);
>> +
>> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
>> +		goto out;
>> +
>> +	/* Enqueue packets to guest RX queue */
>> +	nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
>> +			VIRTIO_RXQ, bufs, nb_bufs);
>> +
> Michael, I'm wondering here might be the better place to do "automatic
> receive steering in multiqueue mode". I mean, as a library function,
> queueing/dequeueing packets to/from a specific virt queue is reasonable
> to me. It's upto the caller to pick the right queue, doing the queue
> steering.

Hi Liu,

Oops, I've found a bug here.
To support multiple queues in vhost PMD, I needed to store "queue_id" in
"vhost_queue" structure.
Then, I should call rte_vhost_enqueue_burst() with the value.

> As an eth dev, I guess that's the proper place to do things like that.
>
> Or, I'm thinking we could introduce another vhost function, for not
> breaking current API, to do that, returning the right queue, so that
> other applications (instead of the vhost pmd only) can use that as well.

I may not understand the steering function enough, but If we support the
steering function in vhost library or vhost PMD, how can we handle
"queue_id" parameter of TX functions?
Probably, we need to ignore the value In some cases.
This may confuse the users because they cannot observe the packets in
their specified queue.

So I guess it may be application responsibility to return packets to the
correct queue.
(But we should write a correct documentation about it)

> Tetsuya, just in case you missed the early discussion about automic
> receive steering, here is a link:
>
>   http://dpdk.org/ml/archives/dev/2015-October/025779.html
>

Thanks, I've checked it!

Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v3 0/2] Add VHOST PMD
  2015-11-02  3:58             ` [PATCH v2 1/2] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
@ 2015-11-09  5:16               ` Tetsuya Mukawa
  2015-11-09  5:17                 ` [PATCH v3 1/2] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
                                   ` (2 more replies)
  0 siblings, 3 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-09  5:16 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.

* Known issue.
We may see issues while handling RESET_OWNER message.
These handlings are done in vhost library, so not a part of vhost PMD.
So far, we are waiting for QEMU fixing.

PATCH v3 changes:
 - Rebase on latest matser
 - Specify correct queue_id in RX/TX function.

PATCH v2 changes:
 - Remove a below patch that fixes vhost library.
   The patch was applied as a separate patch.
   - vhost: fix crash with multiqueue enabled
 - Fix typos.
   (Thanks to Thomas, Monjalon)
 - Rebase on latest tree with above bernard's patches.

PATCH v1 changes:
 - Support vhost multiple queues.
 - Rebase on "remove pci driver from vdevs".
 - Optimize RX/TX functions.
 - Fix resource leaks.
 - Fix compile issue.
 - Add patch to fix vhost library.

RFC PATCH v3 changes:
 - Optimize performance.
   In RX/TX functions, change code to access only per core data.
 - Add below API to allow user to use vhost library APIs for a port managed
   by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
    - rte_eth_vhost_portid2vdev()
   To support this functionality, vhost library is also changed.
   Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
 - Add code to support vhost multiple queues.
   Actually, multiple queues functionality is not enabled so far.

RFC PATCH v2 changes:
 - Fix issues reported by checkpatch.pl
   (Thanks to Stephen Hemminger)


Tetsuya Mukawa (2):
  vhost: Add callback and private data for vhost PMD
  vhost: Add VHOST PMD

 config/common_linuxapp                        |   6 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/rel_notes/release_2_2.rst          |   2 +
 drivers/net/Makefile                          |   4 +
 drivers/net/vhost/Makefile                    |  62 +++
 drivers/net/vhost/rte_eth_vhost.c             | 768 ++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h             |  65 +++
 drivers/net/vhost/rte_pmd_vhost_version.map   |   8 +
 lib/librte_vhost/rte_vhost_version.map        |   6 +
 lib/librte_vhost/rte_virtio_net.h             |   3 +
 lib/librte_vhost/vhost_user/virtio-net-user.c |  13 +-
 lib/librte_vhost/virtio-net.c                 |  56 +-
 lib/librte_vhost/virtio-net.h                 |   4 +-
 mk/rte.app.mk                                 |   8 +-
 14 files changed, 993 insertions(+), 13 deletions(-)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

-- 
2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v3 1/2] vhost: Add callback and private data for vhost PMD
  2015-11-09  5:16               ` [PATCH v3 0/2] Add VHOST PMD Tetsuya Mukawa
@ 2015-11-09  5:17                 ` Tetsuya Mukawa
  2015-11-09 18:16                   ` Aaron Conole
  2015-11-09  5:17                 ` [PATCH v3 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  2015-11-09  5:42                 ` [PATCH v3 " Yuanhan Liu
  2 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-09  5:17 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying

These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs. To avoid it, callback and private
data for vhost PMD are needed.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 lib/librte_vhost/rte_vhost_version.map        |  6 +++
 lib/librte_vhost/rte_virtio_net.h             |  3 ++
 lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
 lib/librte_vhost/virtio-net.c                 | 56 +++++++++++++++++++++++++--
 lib/librte_vhost/virtio-net.h                 |  4 +-
 5 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 3d8709e..00a9ce5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -20,3 +20,9 @@ DPDK_2.1 {
 	rte_vhost_driver_unregister;
 
 } DPDK_2.0;
+
+DPDK_2.2 {
+	global:
+
+	rte_vhost_driver_pmd_callback_register;
+} DPDK_2.1;
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 5687452..3ef6e58 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -128,6 +128,7 @@ struct virtio_net {
 	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
 	uint32_t		virt_qp_nb;	/**< number of queue pair we have allocated */
 	void			*priv;		/**< private context */
+	void			*pmd_priv;	/**< private context for vhost PMD */
 	struct vhost_virtqueue	*virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];	/**< Contains all virtqueue information. */
 } __rte_cache_aligned;
 
@@ -224,6 +225,8 @@ int rte_vhost_driver_unregister(const char *dev_name);
 
 /* Register callbacks. */
 int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const);
+/* Register callbacks for vhost PMD (Only for internal). */
+int rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const);
 /* Start vhost driver session blocking loop. */
 int rte_vhost_driver_session_start(void);
 
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index d07452a..d8ae2fc 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -111,7 +111,7 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
 
 	/* Remove from the data plane. */
 	if (dev->flags & VIRTIO_DEV_RUNNING)
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	if (dev->mem) {
 		free_mem_region(dev);
@@ -272,7 +272,7 @@ user_set_vring_kick(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
 
 	if (virtio_is_ready(dev) &&
 		!(dev->flags & VIRTIO_DEV_RUNNING))
-			notify_ops->new_device(dev);
+			notify_new_device(dev);
 }
 
 /*
@@ -288,7 +288,7 @@ user_get_vring_base(struct vhost_device_ctx ctx,
 		return -1;
 	/* We have to stop the queue (virtio) if it is running. */
 	if (dev->flags & VIRTIO_DEV_RUNNING)
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	/* Here we are safe to get the last used index */
 	ops->get_vring_base(ctx, state->index, state);
@@ -324,10 +324,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
 		"set queue enable: %d to qp idx: %d\n",
 		enable, state->index);
 
-	if (notify_ops->vring_state_changed) {
-		notify_ops->vring_state_changed(dev, base_idx / VIRTIO_QNUM,
-						enable);
-	}
+	notify_vring_state_changed(dev, base_idx / VIRTIO_QNUM, enable);
 
 	dev->virtqueue[base_idx + VIRTIO_RXQ]->enabled = enable;
 	dev->virtqueue[base_idx + VIRTIO_TXQ]->enabled = enable;
@@ -341,7 +338,7 @@ user_destroy_device(struct vhost_device_ctx ctx)
 	struct virtio_net *dev = get_device(ctx);
 
 	if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	if (dev && dev->mem) {
 		free_mem_region(dev);
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 14278de..a5aef08 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -65,6 +65,8 @@ struct virtio_net_config_ll {
 
 /* device ops to add/remove device to/from data core. */
 struct virtio_net_device_ops const *notify_ops;
+/* device ops for vhost PMD to add/remove device to/from data core. */
+struct virtio_net_device_ops const *pmd_notify_ops;
 /* root address of the linked list of managed virtio devices */
 static struct virtio_net_config_ll *ll_root;
 
@@ -81,6 +83,43 @@ static struct virtio_net_config_ll *ll_root;
 static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
 
 
+int
+notify_new_device(struct virtio_net *dev)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->new_device != NULL)) {
+		int ret = pmd_notify_ops->new_device(dev);
+		if (ret != 0)
+			return ret;
+	}
+	if ((notify_ops != NULL) && (notify_ops->new_device != NULL))
+		return notify_ops->new_device(dev);
+
+	return 0;
+}
+
+void
+notify_destroy_device(volatile struct virtio_net *dev)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->destroy_device != NULL))
+		pmd_notify_ops->destroy_device(dev);
+	if ((notify_ops != NULL) && (notify_ops->destroy_device != NULL))
+		notify_ops->destroy_device(dev);
+}
+
+int
+notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->vring_state_changed != NULL)) {
+		int ret = pmd_notify_ops->vring_state_changed(dev, queue_id, enable);
+		if (ret != 0)
+			return ret;
+	}
+	if ((notify_ops != NULL) && (notify_ops->vring_state_changed != NULL))
+		return notify_ops->vring_state_changed(dev, queue_id, enable);
+
+	return 0;
+}
+
 /*
  * Converts QEMU virtual address to Vhost virtual address. This function is
  * used to convert the ring addresses to our address space.
@@ -378,7 +417,7 @@ destroy_device(struct vhost_device_ctx ctx)
 			 * the function to remove it from the data core.
 			 */
 			if ((ll_dev_cur->dev.flags & VIRTIO_DEV_RUNNING))
-				notify_ops->destroy_device(&(ll_dev_cur->dev));
+				notify_destroy_device(&(ll_dev_cur->dev));
 			ll_dev_cur = rm_config_ll_entry(ll_dev_cur,
 					ll_dev_last);
 		} else {
@@ -794,12 +833,12 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 	if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
 		if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
 			((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
-			return notify_ops->new_device(dev);
+			return notify_new_device(dev);
 		}
 	/* Otherwise we remove it. */
 	} else
 		if (file->fd == VIRTIO_DEV_STOPPED)
-			notify_ops->destroy_device(dev);
+			notify_destroy_device(dev);
 	return 0;
 }
 
@@ -883,3 +922,14 @@ rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const op
 
 	return 0;
 }
+
+/*
+ * Register ops so that we can add/remove device to data core.
+ */
+int
+rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const ops)
+{
+	pmd_notify_ops = ops;
+
+	return 0;
+}
diff --git a/lib/librte_vhost/virtio-net.h b/lib/librte_vhost/virtio-net.h
index 75fb57e..0816e71 100644
--- a/lib/librte_vhost/virtio-net.h
+++ b/lib/librte_vhost/virtio-net.h
@@ -37,7 +37,9 @@
 #include "vhost-net.h"
 #include "rte_virtio_net.h"
 
-struct virtio_net_device_ops const *notify_ops;
 struct virtio_net *get_device(struct vhost_device_ctx ctx);
 
+int notify_new_device(struct virtio_net *dev);
+void notify_destroy_device(volatile struct virtio_net *dev);
+int notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable);
 #endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v3 2/2] vhost: Add VHOST PMD
  2015-11-09  5:16               ` [PATCH v3 0/2] Add VHOST PMD Tetsuya Mukawa
  2015-11-09  5:17                 ` [PATCH v3 1/2] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
@ 2015-11-09  5:17                 ` Tetsuya Mukawa
  2015-11-09  6:21                   ` Yuanhan Liu
                                     ` (4 more replies)
  2015-11-09  5:42                 ` [PATCH v3 " Yuanhan Liu
  2 siblings, 5 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-09  5:17 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
 - iface:  The parameter is used to specify a path to connect to a
           virtio-net device.
 - queues: The parameter is used to specify the number of the queues
           virtio-net device has.
           (Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
        <snip>
        -chardev socket,id=chr0,path=/tmp/sock0 \
        -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
        -device virtio-net-pci,netdev=net0

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 config/common_linuxapp                      |   6 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/rel_notes/release_2_2.rst        |   2 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 +++
 drivers/net/vhost/rte_eth_vhost.c           | 768 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           |  65 +++
 drivers/net/vhost/rte_pmd_vhost_version.map |   8 +
 mk/rte.app.mk                               |   8 +-
 9 files changed, 923 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 7248262..a264c11 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -458,6 +458,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 
 #
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 2d4936d..57d1041 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -47,6 +47,7 @@ Network Interface Controller Drivers
     mlx4
     mlx5
     virtio
+    vhost
     vmxnet3
     pcap_ring
 
diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 59dda59..4b5644d 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -90,6 +90,8 @@ New Features
 
 * **Added vhost-user multiple queue support.**
 
+* **Added vhost PMD.**
+
 * **Added port hotplug support to vmxnet3.**
 
 * **Added port hotplug support to xenvirt.**
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 6da1ce2..66eb63d 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -50,5 +50,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 include $(RTE_SDK)/mk/rte.sharelib.mk
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..8186a80
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+#   BSD LICENSE
+#
+#   Copyright (c) 2010-2015 Intel Corporation.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..ff983b5
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,768 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2015 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co.,Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+
+#include "rte_eth_vhost.h"
+
+#define ETH_VHOST_IFACE_ARG		"iface"
+#define ETH_VHOST_QUEUES_ARG		"queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+	ETH_VHOST_IFACE_ARG,
+	ETH_VHOST_QUEUES_ARG,
+	NULL
+};
+
+static struct ether_addr base_eth_addr = {
+	.addr_bytes = {
+		0x56 /* V */,
+		0x48 /* H */,
+		0x4F /* O */,
+		0x53 /* S */,
+		0x54 /* T */,
+		0x00
+	}
+};
+
+struct vhost_queue {
+	rte_atomic32_t allow_queuing;
+	rte_atomic32_t while_queuing;
+	struct virtio_net *device;
+	struct pmd_internal *internal;
+	struct rte_mempool *mb_pool;
+	uint16_t virtqueue_id;
+	uint64_t rx_pkts;
+	uint64_t tx_pkts;
+	uint64_t err_pkts;
+};
+
+struct pmd_internal {
+	TAILQ_ENTRY(pmd_internal) next;
+	char *dev_name;
+	char *iface_name;
+	unsigned nb_rx_queues;
+	unsigned nb_tx_queues;
+
+	struct vhost_queue *rx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+	struct vhost_queue *tx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+
+	volatile uint16_t once;
+	pthread_t session_th;
+};
+
+TAILQ_HEAD(pmd_internal_head, pmd_internal);
+static struct pmd_internal_head internals_list =
+	TAILQ_HEAD_INITIALIZER(internals_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static struct rte_eth_link pmd_link = {
+		.link_speed = 10000,
+		.link_duplex = ETH_LINK_FULL_DUPLEX,
+		.link_status = 0
+};
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t nb_rx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Dequeue packets from guest TX queue */
+	nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
+			r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
+
+	r->rx_pkts += nb_rx;
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_tx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Enqueue packets to guest RX queue */
+	nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
+			r->virtqueue_id, bufs, nb_bufs);
+
+	r->tx_pkts += nb_tx;
+	r->err_pkts += nb_bufs - nb_tx;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		rte_pktmbuf_free(bufs[i]);
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { return 0; }
+
+static inline struct pmd_internal *
+find_internal_resource(char *ifname)
+{
+	int found = 0;
+	struct pmd_internal *internal;
+
+	if (ifname == NULL)
+		return NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(internal, &internals_list, next) {
+		if (!strcmp(internal->iface_name, ifname)) {
+			found = 1;
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	if (!found)
+		return NULL;
+
+	return internal;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	internal = find_internal_resource(dev->ifname);
+	if (internal == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid device name\n");
+		return -1;
+	}
+
+	if ((dev->virt_qp_nb < internal->nb_rx_queues) ||
+			(dev->virt_qp_nb < internal->nb_tx_queues)) {
+		RTE_LOG(INFO, PMD, "Not enough queues\n");
+		return -1;
+	}
+
+	eth_dev = rte_eth_dev_allocated(internal->dev_name);
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+		return -1;
+	}
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+	}
+
+	dev->flags |= VIRTIO_DEV_RUNNING;
+	dev->pmd_priv = eth_dev;
+	eth_dev->data->dev_link.link_status = 1;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	RTE_LOG(INFO, PMD, "New connection established\n");
+
+	return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return;
+	}
+
+	eth_dev = (struct rte_eth_dev *)dev->pmd_priv;
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+		return;
+	}
+
+	internal = eth_dev->data->dev_private;
+
+	/* Wait until rx/tx_pkt_burst stops accessing vhost device */
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+
+	eth_dev->data->dev_link.link_status = 0;
+
+	dev->pmd_priv = NULL;
+	dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+
+	RTE_LOG(INFO, PMD, "Connection closed\n");
+}
+
+static void *vhost_driver_session(void *param __rte_unused)
+{
+	static struct virtio_net_device_ops *vhost_ops;
+
+	vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
+	if (vhost_ops == NULL)
+		rte_panic("Can't allocate memory\n");
+
+	/* set vhost arguments */
+	vhost_ops->new_device = new_device;
+	vhost_ops->destroy_device = destroy_device;
+	if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
+		rte_panic("Can't register callbacks\n");
+
+	/* start event handling */
+	rte_vhost_driver_session_start();
+
+	rte_free(vhost_ops);
+	pthread_exit(0);
+}
+
+static void vhost_driver_session_start(struct pmd_internal *internal)
+{
+	int ret;
+
+	ret = pthread_create(&internal->session_th,
+			NULL, vhost_driver_session, NULL);
+	if (ret)
+		rte_panic("Can't create a thread\n");
+}
+
+static void vhost_driver_session_stop(struct pmd_internal *internal)
+{
+	int ret;
+
+	ret = pthread_cancel(internal->session_th);
+	if (ret)
+		rte_panic("Can't cancel the thread\n");
+
+	ret = pthread_join(internal->session_th, NULL);
+	if (ret)
+		rte_panic("Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	int ret;
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+		ret = rte_vhost_driver_register(internal->iface_name);
+		if (ret)
+			return ret;
+
+		vhost_driver_session_start(internal);
+	}
+	return 0;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 1, 0)) {
+		rte_vhost_driver_unregister(internal->iface_name);
+		vhost_driver_session_stop(internal);
+	}
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+	struct vhost_queue *vq;
+
+	if (internal->rx_vhost_queues[rx_queue_id] != NULL)
+		rte_free(internal->rx_vhost_queues[rx_queue_id]);
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->mb_pool = mb_pool;
+	vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
+	internal->rx_vhost_queues[rx_queue_id] = vq;
+	dev->data->rx_queues[rx_queue_id] = vq;
+	return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+	struct vhost_queue *vq;
+
+	if (internal->tx_vhost_queues[tx_queue_id] != NULL)
+		rte_free(internal->tx_vhost_queues[tx_queue_id]);
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
+	internal->tx_vhost_queues[tx_queue_id] = vq;
+	dev->data->tx_queues[tx_queue_id] = vq;
+	return 0;
+}
+
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+	     struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	dev_info->driver_name = drivername;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = (uint32_t)-1;
+	dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
+	dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
+	dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+	unsigned i;
+	unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
+	const struct pmd_internal *internal = dev->data->dev_private;
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+	     i < internal->nb_rx_queues; i++) {
+		if (internal->rx_vhost_queues[i] == NULL)
+			continue;
+		igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i]->rx_pkts;
+		rx_total += igb_stats->q_ipackets[i];
+	}
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+	     i < internal->nb_tx_queues; i++) {
+		if (internal->tx_vhost_queues[i] == NULL)
+			continue;
+		igb_stats->q_opackets[i] = internal->tx_vhost_queues[i]->tx_pkts;
+		igb_stats->q_errors[i] = internal->tx_vhost_queues[i]->err_pkts;
+		tx_total += igb_stats->q_opackets[i];
+		tx_err_total += igb_stats->q_errors[i];
+	}
+
+	igb_stats->ipackets = rx_total;
+	igb_stats->opackets = tx_total;
+	igb_stats->oerrors = tx_err_total;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	unsigned i;
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		if (internal->rx_vhost_queues[i] == NULL)
+			continue;
+		internal->rx_vhost_queues[i]->rx_pkts = 0;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		if (internal->tx_vhost_queues[i] == NULL)
+			continue;
+		internal->tx_vhost_queues[i]->tx_pkts = 0;
+		internal->tx_vhost_queues[i]->err_pkts = 0;
+	}
+}
+
+static void
+eth_queue_release(void *q __rte_unused) { ; }
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused) { return 0; }
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, int index,
+		     char *iface_name,
+		     int16_t queues,
+		     const unsigned numa_node)
+{
+	struct rte_eth_dev_data *data = NULL;
+	struct pmd_internal *internal = NULL;
+	struct rte_eth_dev *eth_dev = NULL;
+	struct ether_addr *eth_addr = NULL;
+
+	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+		numa_node);
+
+	/* now do all data allocation - for eth_dev structure, dummy pci driver
+	 * and internal (private) data
+	 */
+	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+	if (data == NULL)
+		goto error;
+
+	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+	if (internal == NULL)
+		goto error;
+
+	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+	if (eth_addr == NULL)
+		goto error;
+	*eth_addr = base_eth_addr;
+	eth_addr->addr_bytes[5] = index;
+
+	/* reserve an ethdev entry */
+	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+	if (eth_dev == NULL)
+		goto error;
+
+	/* now put it all together
+	 * - store queue data in internal,
+	 * - store numa_node info in ethdev data
+	 * - point eth_dev_data to internals
+	 * - and point eth_dev structure to new eth_dev_data structure
+	 */
+	internal->nb_rx_queues = queues;
+	internal->nb_tx_queues = queues;
+	internal->dev_name = strdup(name);
+	if (internal->dev_name == NULL)
+		goto error;
+	internal->iface_name = strdup(iface_name);
+	if (internal->iface_name == NULL)
+		goto error;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internals_list, internal, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	data->dev_private = internal;
+	data->port_id = eth_dev->data->port_id;
+	memmove(data->name, eth_dev->data->name, sizeof(data->name));
+	data->nb_rx_queues = queues;
+	data->nb_tx_queues = queues;
+	data->dev_link = pmd_link;
+	data->mac_addrs = eth_addr;
+
+	/* We'll replace the 'data' originally allocated by eth_dev. So the
+	 * vhost PMD resources won't be shared between multi processes.
+	 */
+	eth_dev->data = data;
+	eth_dev->dev_ops = &ops;
+	eth_dev->driver = NULL;
+	eth_dev->data->dev_flags = RTE_ETH_DEV_DETACHABLE;
+	eth_dev->data->kdrv = RTE_KDRV_NONE;
+	eth_dev->data->drv_name = internal->dev_name;
+	eth_dev->data->numa_node = numa_node;
+
+	/* finally assign rx and tx ops */
+	eth_dev->rx_pkt_burst = eth_vhost_rx;
+	eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+	return data->port_id;
+
+error:
+	rte_free(data);
+	rte_free(internal);
+	rte_free(eth_addr);
+
+	return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	const char **iface_name = extra_args;
+
+	if (value == NULL)
+		return -1;
+
+	*iface_name = value;
+
+	return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *q = extra_args;
+
+	if ((value == NULL) || (extra_args == NULL))
+		return -EINVAL;
+
+	*q = (uint16_t)strtoul(value, NULL, 0);
+	if ((*q == USHRT_MAX) && (errno == ERANGE))
+		return -1;
+
+	if (*q > RTE_MAX_QUEUES_PER_PORT)
+		return -1;
+
+	return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
+	int index;
+	char *iface_name;
+	uint16_t queues;
+
+	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+		return -1;
+
+	if (strlen(name) < strlen("eth_vhost"))
+		return -1;
+
+	index = strtol(name + strlen("eth_vhost"), NULL, 0);
+	if (errno == ERANGE)
+		return -1;
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+					 &open_iface, &iface_name);
+		if (ret < 0)
+			goto out_free;
+	}
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+					 &open_queues, &queues);
+		if (ret < 0)
+			goto out_free;
+
+	} else
+		queues = 1;
+
+	eth_dev_vhost_create(name, index,
+			iface_name, queues, rte_socket_id());
+
+out_free:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internal *internal;
+	unsigned int i;
+
+	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+	if (name == NULL)
+		return -EINVAL;
+
+	/* find an ethdev entry */
+	eth_dev = rte_eth_dev_allocated(name);
+	if (eth_dev == NULL)
+		return -ENODEV;
+
+	internal = eth_dev->data->dev_private;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_REMOVE(&internals_list, internal, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	eth_dev_stop(eth_dev);
+
+	if ((internal) && (internal->dev_name))
+		free(internal->dev_name);
+	if ((internal) && (internal->iface_name))
+		free(internal->iface_name);
+
+	rte_free(eth_dev->data->mac_addrs);
+	rte_free(eth_dev->data);
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		if (internal->rx_vhost_queues[i] != NULL)
+			rte_free(internal->rx_vhost_queues[i]);
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		if (internal->tx_vhost_queues[i] != NULL)
+			rte_free(internal->tx_vhost_queues[i]);
+	}
+	rte_free(internal);
+
+	rte_eth_dev_release_port(eth_dev);
+
+	return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+	.name = "eth_vhost",
+	.type = PMD_VDEV,
+	.init = rte_pmd_vhost_devinit,
+	.uninit = rte_pmd_vhost_devuninit,
+};
+
+struct
+virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
+{
+	struct rte_eth_dev *eth_dev;
+
+	if (rte_eth_dev_is_valid_port(port_id) == 0)
+		return NULL;
+
+	eth_dev = &rte_eth_devices[port_id];
+	if (strncmp("eth_vhost", eth_dev->data->drv_name,
+				strlen("eth_vhost")) == 0) {
+		struct pmd_internal *internal;
+		struct vhost_queue *vq;
+
+		internal = eth_dev->data->dev_private;
+		vq = internal->rx_vhost_queues[0];
+		if ((vq != NULL) && (vq->device != NULL))
+			return vq->device;
+	}
+
+	return NULL;
+}
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..22a880f
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,65 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co., Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_AF_PACKET_H_
+#define _RTE_ETH_AF_PACKET_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_virtio_net.h>
+
+/**
+ * The function convert specified port_id to virtio device structure.
+ * The retured device can be used for vhost library APIs.
+ * To use vhost library APIs and vhost PMD parallely, below API should
+ * not be called, because the API will be called by vhost PMD.
+ * - rte_vhost_driver_session_start()
+ * Once a device is managed by vhost PMD, below API should not be called.
+ * - rte_vhost_driver_unregister()
+ * To unregister the device, call Port Hotplug APIs.
+ *
+ * @param port_id
+ *  port number
+ * @return
+ *  virtio net device structure corresponding to the specified port
+ *  NULL will be returned in error cases.
+ */
+struct virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..bf0361a
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,8 @@
+DPDK_2.2 {
+
+	global:
+
+	rte_eth_vhost_portid2vdev;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 724efa7..1af4bb3 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -148,7 +148,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP)       += -lrte_pmd_pcap
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL)       += -lrte_pmd_null
 
-endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
+
+endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
+
+endif # $(CONFIG_RTE_BUILD_SHARED_LIB)
 
 endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [PATCH v3 0/2] Add VHOST PMD
  2015-11-09  5:16               ` [PATCH v3 0/2] Add VHOST PMD Tetsuya Mukawa
  2015-11-09  5:17                 ` [PATCH v3 1/2] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
  2015-11-09  5:17                 ` [PATCH v3 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
@ 2015-11-09  5:42                 ` Yuanhan Liu
  2 siblings, 0 replies; 200+ messages in thread
From: Yuanhan Liu @ 2015-11-09  5:42 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Mon, Nov 09, 2015 at 02:16:59PM +0900, Tetsuya Mukawa wrote:
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost.
> 
> * Known issue.
> We may see issues while handling RESET_OWNER message.
> These handlings are done in vhost library, so not a part of vhost PMD.
> So far, we are waiting for QEMU fixing.

I will try to fix them in this week.

	--yliu
> 
> PATCH v3 changes:
>  - Rebase on latest matser
>  - Specify correct queue_id in RX/TX function.
> 
> PATCH v2 changes:
>  - Remove a below patch that fixes vhost library.
>    The patch was applied as a separate patch.
>    - vhost: fix crash with multiqueue enabled
>  - Fix typos.
>    (Thanks to Thomas, Monjalon)
>  - Rebase on latest tree with above bernard's patches.
> 
> PATCH v1 changes:
>  - Support vhost multiple queues.
>  - Rebase on "remove pci driver from vdevs".
>  - Optimize RX/TX functions.
>  - Fix resource leaks.
>  - Fix compile issue.
>  - Add patch to fix vhost library.
> 
> RFC PATCH v3 changes:
>  - Optimize performance.
>    In RX/TX functions, change code to access only per core data.
>  - Add below API to allow user to use vhost library APIs for a port managed
>    by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
>     - rte_eth_vhost_portid2vdev()
>    To support this functionality, vhost library is also changed.
>    Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
>  - Add code to support vhost multiple queues.
>    Actually, multiple queues functionality is not enabled so far.
> 
> RFC PATCH v2 changes:
>  - Fix issues reported by checkpatch.pl
>    (Thanks to Stephen Hemminger)
> 
> 
> Tetsuya Mukawa (2):
>   vhost: Add callback and private data for vhost PMD
>   vhost: Add VHOST PMD
> 
>  config/common_linuxapp                        |   6 +
>  doc/guides/nics/index.rst                     |   1 +
>  doc/guides/rel_notes/release_2_2.rst          |   2 +
>  drivers/net/Makefile                          |   4 +
>  drivers/net/vhost/Makefile                    |  62 +++
>  drivers/net/vhost/rte_eth_vhost.c             | 768 ++++++++++++++++++++++++++
>  drivers/net/vhost/rte_eth_vhost.h             |  65 +++
>  drivers/net/vhost/rte_pmd_vhost_version.map   |   8 +
>  lib/librte_vhost/rte_vhost_version.map        |   6 +
>  lib/librte_vhost/rte_virtio_net.h             |   3 +
>  lib/librte_vhost/vhost_user/virtio-net-user.c |  13 +-
>  lib/librte_vhost/virtio-net.c                 |  56 +-
>  lib/librte_vhost/virtio-net.h                 |   4 +-
>  mk/rte.app.mk                                 |   8 +-
>  14 files changed, 993 insertions(+), 13 deletions(-)
>  create mode 100644 drivers/net/vhost/Makefile
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.h
>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
> 
> -- 
> 2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v3 2/2] vhost: Add VHOST PMD
  2015-11-09  5:17                 ` [PATCH v3 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
@ 2015-11-09  6:21                   ` Yuanhan Liu
  2015-11-09  6:27                     ` Tetsuya Mukawa
  2015-11-09 22:22                   ` Stephen Hemminger
                                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-11-09  6:21 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

Hi Tetsuya,

Here I just got some minor nits after a very rough glimpse.

On Mon, Nov 09, 2015 at 02:17:01PM +0900, Tetsuya Mukawa wrote:
...
> +static uint16_t
> +eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
> +	struct vhost_queue *r = q;
> +	uint16_t nb_rx = 0;
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		return 0;
> +
> +	rte_atomic32_set(&r->while_queuing, 1);
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		goto out;
> +
> +	/* Dequeue packets from guest TX queue */
> +	nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
> +			r->virtqueue_id, r->mb_pool, bufs, nb_bufs);

Unnecessary cast, as rte_vhost_enqueue_burst is defined with uint16_t
return type.

> +
> +	r->rx_pkts += nb_rx;
> +
> +out:
> +	rte_atomic32_set(&r->while_queuing, 0);
> +
> +	return nb_rx;
> +}
> +
> +static uint16_t
> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
> +	struct vhost_queue *r = q;
> +	uint16_t i, nb_tx = 0;
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		return 0;
> +
> +	rte_atomic32_set(&r->while_queuing, 1);
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		goto out;
> +
> +	/* Enqueue packets to guest RX queue */
> +	nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
> +			r->virtqueue_id, bufs, nb_bufs);

Ditto.

> +
> +	r->tx_pkts += nb_tx;
> +	r->err_pkts += nb_bufs - nb_tx;
> +
> +	for (i = 0; likely(i < nb_tx); i++)
> +		rte_pktmbuf_free(bufs[i]);
> +
> +out:
> +	rte_atomic32_set(&r->while_queuing, 0);
> +
> +	return nb_tx;
> +}
> +
> +static int
> +eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { return 0; }

I personally would not prefer to saving few lines of code to sacrifice
the readability.

> +
> +static int
> +eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
> +		   uint16_t nb_rx_desc __rte_unused,
> +		   unsigned int socket_id,
> +		   const struct rte_eth_rxconf *rx_conf __rte_unused,
> +		   struct rte_mempool *mb_pool)
> +{
> +	struct pmd_internal *internal = dev->data->dev_private;
> +	struct vhost_queue *vq;
> +
> +	if (internal->rx_vhost_queues[rx_queue_id] != NULL)
> +		rte_free(internal->rx_vhost_queues[rx_queue_id]);

Such NULL check is unnecessary; rte_free will handle it.

> +
> +	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
> +			RTE_CACHE_LINE_SIZE, socket_id);
> +	if (vq == NULL) {
> +		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
> +		return -ENOMEM;
> +	}
> +
> +	vq->mb_pool = mb_pool;
> +	vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
> +	internal->rx_vhost_queues[rx_queue_id] = vq;
> +	dev->data->rx_queues[rx_queue_id] = vq;
> +	return 0;
> +}
> +
> +static int
> +eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
> +		   uint16_t nb_tx_desc __rte_unused,
> +		   unsigned int socket_id,
> +		   const struct rte_eth_txconf *tx_conf __rte_unused)
> +{
> +	struct pmd_internal *internal = dev->data->dev_private;
> +	struct vhost_queue *vq;
> +
> +	if (internal->tx_vhost_queues[tx_queue_id] != NULL)
> +		rte_free(internal->tx_vhost_queues[tx_queue_id]);

Ditto.

> +
> +	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
> +			RTE_CACHE_LINE_SIZE, socket_id);
> +	if (vq == NULL) {
> +		RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
> +		return -ENOMEM;
> +	}
> +
> +	vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
> +	internal->tx_vhost_queues[tx_queue_id] = vq;
> +	dev->data->tx_queues[tx_queue_id] = vq;
> +	return 0;
> +}
> +
> +
> +static void
> +eth_dev_info(struct rte_eth_dev *dev,
> +	     struct rte_eth_dev_info *dev_info)
> +{
> +	struct pmd_internal *internal = dev->data->dev_private;
> +
> +	dev_info->driver_name = drivername;
> +	dev_info->max_mac_addrs = 1;
> +	dev_info->max_rx_pktlen = (uint32_t)-1;
> +	dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
> +	dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
> +	dev_info->min_rx_bufsize = 0;
> +}
> +
> +static void
> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
> +{
> +	unsigned i;
> +	unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
> +	const struct pmd_internal *internal = dev->data->dev_private;
> +
> +	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
> +	     i < internal->nb_rx_queues; i++) {
> +		if (internal->rx_vhost_queues[i] == NULL)
> +			continue;
> +		igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i]->rx_pkts;
> +		rx_total += igb_stats->q_ipackets[i];
> +	}
> +
> +	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
> +	     i < internal->nb_tx_queues; i++) {
> +		if (internal->tx_vhost_queues[i] == NULL)
> +			continue;
> +		igb_stats->q_opackets[i] = internal->tx_vhost_queues[i]->tx_pkts;
> +		igb_stats->q_errors[i] = internal->tx_vhost_queues[i]->err_pkts;
> +		tx_total += igb_stats->q_opackets[i];
> +		tx_err_total += igb_stats->q_errors[i];
> +	}
> +
> +	igb_stats->ipackets = rx_total;
> +	igb_stats->opackets = tx_total;
> +	igb_stats->oerrors = tx_err_total;
> +}
> +
> +static void
> +eth_stats_reset(struct rte_eth_dev *dev)
> +{
> +	unsigned i;
> +	struct pmd_internal *internal = dev->data->dev_private;
> +
> +	for (i = 0; i < internal->nb_rx_queues; i++) {
> +		if (internal->rx_vhost_queues[i] == NULL)
> +			continue;
> +		internal->rx_vhost_queues[i]->rx_pkts = 0;
> +	}
> +	for (i = 0; i < internal->nb_tx_queues; i++) {
> +		if (internal->tx_vhost_queues[i] == NULL)
> +			continue;
> +		internal->tx_vhost_queues[i]->tx_pkts = 0;
> +		internal->tx_vhost_queues[i]->err_pkts = 0;
> +	}
> +}
> +
> +static void
> +eth_queue_release(void *q __rte_unused) { ; }
> +static int
> +eth_link_update(struct rte_eth_dev *dev __rte_unused,
> +		int wait_to_complete __rte_unused) { return 0; }

Ditto.

> +
> +static const struct eth_dev_ops ops = {
> +	.dev_start = eth_dev_start,
> +	.dev_stop = eth_dev_stop,
> +	.dev_configure = eth_dev_configure,
> +	.dev_infos_get = eth_dev_info,
> +	.rx_queue_setup = eth_rx_queue_setup,
> +	.tx_queue_setup = eth_tx_queue_setup,
> +	.rx_queue_release = eth_queue_release,
> +	.tx_queue_release = eth_queue_release,
> +	.link_update = eth_link_update,
> +	.stats_get = eth_stats_get,
> +	.stats_reset = eth_stats_reset,
> +};
> +
> +static int
> +eth_dev_vhost_create(const char *name, int index,
> +		     char *iface_name,
> +		     int16_t queues,
> +		     const unsigned numa_node)
> +{
> +	struct rte_eth_dev_data *data = NULL;
> +	struct pmd_internal *internal = NULL;
> +	struct rte_eth_dev *eth_dev = NULL;
> +	struct ether_addr *eth_addr = NULL;
> +
> +	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
> +		numa_node);
> +
> +	/* now do all data allocation - for eth_dev structure, dummy pci driver
> +	 * and internal (private) data
> +	 */
> +	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
> +	if (data == NULL)
> +		goto error;
> +
> +	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
> +	if (internal == NULL)
> +		goto error;
> +
> +	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
> +	if (eth_addr == NULL)
> +		goto error;
> +	*eth_addr = base_eth_addr;
> +	eth_addr->addr_bytes[5] = index;
> +
> +	/* reserve an ethdev entry */
> +	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
> +	if (eth_dev == NULL)
> +		goto error;
> +
> +	/* now put it all together
> +	 * - store queue data in internal,
> +	 * - store numa_node info in ethdev data
> +	 * - point eth_dev_data to internals
> +	 * - and point eth_dev structure to new eth_dev_data structure
> +	 */
> +	internal->nb_rx_queues = queues;
> +	internal->nb_tx_queues = queues;
> +	internal->dev_name = strdup(name);
> +	if (internal->dev_name == NULL)
> +		goto error;
> +	internal->iface_name = strdup(iface_name);
> +	if (internal->iface_name == NULL)
> +		goto error;

If allocation failed here, you will find that internal->dev_name is not
freed.

> +
> +	pthread_mutex_lock(&internal_list_lock);
> +	TAILQ_INSERT_TAIL(&internals_list, internal, next);
> +	pthread_mutex_unlock(&internal_list_lock);
> +
> +	data->dev_private = internal;
> +	data->port_id = eth_dev->data->port_id;
> +	memmove(data->name, eth_dev->data->name, sizeof(data->name));
> +	data->nb_rx_queues = queues;
> +	data->nb_tx_queues = queues;
> +	data->dev_link = pmd_link;
> +	data->mac_addrs = eth_addr;
> +
> +	/* We'll replace the 'data' originally allocated by eth_dev. So the
> +	 * vhost PMD resources won't be shared between multi processes.
> +	 */
> +	eth_dev->data = data;
> +	eth_dev->dev_ops = &ops;
> +	eth_dev->driver = NULL;
> +	eth_dev->data->dev_flags = RTE_ETH_DEV_DETACHABLE;
> +	eth_dev->data->kdrv = RTE_KDRV_NONE;
> +	eth_dev->data->drv_name = internal->dev_name;
> +	eth_dev->data->numa_node = numa_node;
> +
> +	/* finally assign rx and tx ops */
> +	eth_dev->rx_pkt_burst = eth_vhost_rx;
> +	eth_dev->tx_pkt_burst = eth_vhost_tx;
> +
> +	return data->port_id;
> +
> +error:
> +	rte_free(data);
> +	rte_free(internal);
> +	rte_free(eth_addr);
> +
> +	return -1;
> +}
...
...
> +
> +	if ((internal) && (internal->dev_name))
> +		free(internal->dev_name);
> +	if ((internal) && (internal->iface_name))
> +		free(internal->iface_name);
> +
> +	rte_free(eth_dev->data->mac_addrs);
> +	rte_free(eth_dev->data);
> +
> +	for (i = 0; i < internal->nb_rx_queues; i++) {
> +		if (internal->rx_vhost_queues[i] != NULL)
> +			rte_free(internal->rx_vhost_queues[i]);
> +	}
> +	for (i = 0; i < internal->nb_tx_queues; i++) {
> +		if (internal->tx_vhost_queues[i] != NULL)
> +			rte_free(internal->tx_vhost_queues[i]);

Ditto.

(Hopefully I could have a detailed review later, say next week).

	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v3 2/2] vhost: Add VHOST PMD
  2015-11-09  6:21                   ` Yuanhan Liu
@ 2015-11-09  6:27                     ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-09  6:27 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, ann.zhuangyanying

Hi Liu,

Thank you so much for your reviewing.
I will fix them, then submit again in this week.

Thanks,
Tetsuya


On 2015/11/09 15:21, Yuanhan Liu wrote:
> Hi Tetsuya,
>
> Here I just got some minor nits after a very rough glimpse.
>
> On Mon, Nov 09, 2015 at 02:17:01PM +0900, Tetsuya Mukawa wrote:
> ...
>> +static uint16_t
>> +eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
>> +{
>> +	struct vhost_queue *r = q;
>> +	uint16_t nb_rx = 0;
>> +
>> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
>> +		return 0;
>> +
>> +	rte_atomic32_set(&r->while_queuing, 1);
>> +
>> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
>> +		goto out;
>> +
>> +	/* Dequeue packets from guest TX queue */
>> +	nb_rx = (uint16_t)rte_vhost_dequeue_burst(r->device,
>> +			r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
> Unnecessary cast, as rte_vhost_enqueue_burst is defined with uint16_t
> return type.
>
>> +
>> +	r->rx_pkts += nb_rx;
>> +
>> +out:
>> +	rte_atomic32_set(&r->while_queuing, 0);
>> +
>> +	return nb_rx;
>> +}
>> +
>> +static uint16_t
>> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
>> +{
>> +	struct vhost_queue *r = q;
>> +	uint16_t i, nb_tx = 0;
>> +
>> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
>> +		return 0;
>> +
>> +	rte_atomic32_set(&r->while_queuing, 1);
>> +
>> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
>> +		goto out;
>> +
>> +	/* Enqueue packets to guest RX queue */
>> +	nb_tx = (uint16_t)rte_vhost_enqueue_burst(r->device,
>> +			r->virtqueue_id, bufs, nb_bufs);
> Ditto.
>
>> +
>> +	r->tx_pkts += nb_tx;
>> +	r->err_pkts += nb_bufs - nb_tx;
>> +
>> +	for (i = 0; likely(i < nb_tx); i++)
>> +		rte_pktmbuf_free(bufs[i]);
>> +
>> +out:
>> +	rte_atomic32_set(&r->while_queuing, 0);
>> +
>> +	return nb_tx;
>> +}
>> +
>> +static int
>> +eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { return 0; }
> I personally would not prefer to saving few lines of code to sacrifice
> the readability.
>
>> +
>> +static int
>> +eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
>> +		   uint16_t nb_rx_desc __rte_unused,
>> +		   unsigned int socket_id,
>> +		   const struct rte_eth_rxconf *rx_conf __rte_unused,
>> +		   struct rte_mempool *mb_pool)
>> +{
>> +	struct pmd_internal *internal = dev->data->dev_private;
>> +	struct vhost_queue *vq;
>> +
>> +	if (internal->rx_vhost_queues[rx_queue_id] != NULL)
>> +		rte_free(internal->rx_vhost_queues[rx_queue_id]);
> Such NULL check is unnecessary; rte_free will handle it.
>
>> +
>> +	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
>> +			RTE_CACHE_LINE_SIZE, socket_id);
>> +	if (vq == NULL) {
>> +		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
>> +		return -ENOMEM;
>> +	}
>> +
>> +	vq->mb_pool = mb_pool;
>> +	vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
>> +	internal->rx_vhost_queues[rx_queue_id] = vq;
>> +	dev->data->rx_queues[rx_queue_id] = vq;
>> +	return 0;
>> +}
>> +
>> +static int
>> +eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
>> +		   uint16_t nb_tx_desc __rte_unused,
>> +		   unsigned int socket_id,
>> +		   const struct rte_eth_txconf *tx_conf __rte_unused)
>> +{
>> +	struct pmd_internal *internal = dev->data->dev_private;
>> +	struct vhost_queue *vq;
>> +
>> +	if (internal->tx_vhost_queues[tx_queue_id] != NULL)
>> +		rte_free(internal->tx_vhost_queues[tx_queue_id]);
> Ditto.
>
>> +
>> +	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
>> +			RTE_CACHE_LINE_SIZE, socket_id);
>> +	if (vq == NULL) {
>> +		RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
>> +		return -ENOMEM;
>> +	}
>> +
>> +	vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
>> +	internal->tx_vhost_queues[tx_queue_id] = vq;
>> +	dev->data->tx_queues[tx_queue_id] = vq;
>> +	return 0;
>> +}
>> +
>> +
>> +static void
>> +eth_dev_info(struct rte_eth_dev *dev,
>> +	     struct rte_eth_dev_info *dev_info)
>> +{
>> +	struct pmd_internal *internal = dev->data->dev_private;
>> +
>> +	dev_info->driver_name = drivername;
>> +	dev_info->max_mac_addrs = 1;
>> +	dev_info->max_rx_pktlen = (uint32_t)-1;
>> +	dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
>> +	dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
>> +	dev_info->min_rx_bufsize = 0;
>> +}
>> +
>> +static void
>> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
>> +{
>> +	unsigned i;
>> +	unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
>> +	const struct pmd_internal *internal = dev->data->dev_private;
>> +
>> +	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
>> +	     i < internal->nb_rx_queues; i++) {
>> +		if (internal->rx_vhost_queues[i] == NULL)
>> +			continue;
>> +		igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i]->rx_pkts;
>> +		rx_total += igb_stats->q_ipackets[i];
>> +	}
>> +
>> +	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
>> +	     i < internal->nb_tx_queues; i++) {
>> +		if (internal->tx_vhost_queues[i] == NULL)
>> +			continue;
>> +		igb_stats->q_opackets[i] = internal->tx_vhost_queues[i]->tx_pkts;
>> +		igb_stats->q_errors[i] = internal->tx_vhost_queues[i]->err_pkts;
>> +		tx_total += igb_stats->q_opackets[i];
>> +		tx_err_total += igb_stats->q_errors[i];
>> +	}
>> +
>> +	igb_stats->ipackets = rx_total;
>> +	igb_stats->opackets = tx_total;
>> +	igb_stats->oerrors = tx_err_total;
>> +}
>> +
>> +static void
>> +eth_stats_reset(struct rte_eth_dev *dev)
>> +{
>> +	unsigned i;
>> +	struct pmd_internal *internal = dev->data->dev_private;
>> +
>> +	for (i = 0; i < internal->nb_rx_queues; i++) {
>> +		if (internal->rx_vhost_queues[i] == NULL)
>> +			continue;
>> +		internal->rx_vhost_queues[i]->rx_pkts = 0;
>> +	}
>> +	for (i = 0; i < internal->nb_tx_queues; i++) {
>> +		if (internal->tx_vhost_queues[i] == NULL)
>> +			continue;
>> +		internal->tx_vhost_queues[i]->tx_pkts = 0;
>> +		internal->tx_vhost_queues[i]->err_pkts = 0;
>> +	}
>> +}
>> +
>> +static void
>> +eth_queue_release(void *q __rte_unused) { ; }
>> +static int
>> +eth_link_update(struct rte_eth_dev *dev __rte_unused,
>> +		int wait_to_complete __rte_unused) { return 0; }
> Ditto.
>
>> +
>> +static const struct eth_dev_ops ops = {
>> +	.dev_start = eth_dev_start,
>> +	.dev_stop = eth_dev_stop,
>> +	.dev_configure = eth_dev_configure,
>> +	.dev_infos_get = eth_dev_info,
>> +	.rx_queue_setup = eth_rx_queue_setup,
>> +	.tx_queue_setup = eth_tx_queue_setup,
>> +	.rx_queue_release = eth_queue_release,
>> +	.tx_queue_release = eth_queue_release,
>> +	.link_update = eth_link_update,
>> +	.stats_get = eth_stats_get,
>> +	.stats_reset = eth_stats_reset,
>> +};
>> +
>> +static int
>> +eth_dev_vhost_create(const char *name, int index,
>> +		     char *iface_name,
>> +		     int16_t queues,
>> +		     const unsigned numa_node)
>> +{
>> +	struct rte_eth_dev_data *data = NULL;
>> +	struct pmd_internal *internal = NULL;
>> +	struct rte_eth_dev *eth_dev = NULL;
>> +	struct ether_addr *eth_addr = NULL;
>> +
>> +	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
>> +		numa_node);
>> +
>> +	/* now do all data allocation - for eth_dev structure, dummy pci driver
>> +	 * and internal (private) data
>> +	 */
>> +	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
>> +	if (data == NULL)
>> +		goto error;
>> +
>> +	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
>> +	if (internal == NULL)
>> +		goto error;
>> +
>> +	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
>> +	if (eth_addr == NULL)
>> +		goto error;
>> +	*eth_addr = base_eth_addr;
>> +	eth_addr->addr_bytes[5] = index;
>> +
>> +	/* reserve an ethdev entry */
>> +	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
>> +	if (eth_dev == NULL)
>> +		goto error;
>> +
>> +	/* now put it all together
>> +	 * - store queue data in internal,
>> +	 * - store numa_node info in ethdev data
>> +	 * - point eth_dev_data to internals
>> +	 * - and point eth_dev structure to new eth_dev_data structure
>> +	 */
>> +	internal->nb_rx_queues = queues;
>> +	internal->nb_tx_queues = queues;
>> +	internal->dev_name = strdup(name);
>> +	if (internal->dev_name == NULL)
>> +		goto error;
>> +	internal->iface_name = strdup(iface_name);
>> +	if (internal->iface_name == NULL)
>> +		goto error;
> If allocation failed here, you will find that internal->dev_name is not
> freed.
>
>> +
>> +	pthread_mutex_lock(&internal_list_lock);
>> +	TAILQ_INSERT_TAIL(&internals_list, internal, next);
>> +	pthread_mutex_unlock(&internal_list_lock);
>> +
>> +	data->dev_private = internal;
>> +	data->port_id = eth_dev->data->port_id;
>> +	memmove(data->name, eth_dev->data->name, sizeof(data->name));
>> +	data->nb_rx_queues = queues;
>> +	data->nb_tx_queues = queues;
>> +	data->dev_link = pmd_link;
>> +	data->mac_addrs = eth_addr;
>> +
>> +	/* We'll replace the 'data' originally allocated by eth_dev. So the
>> +	 * vhost PMD resources won't be shared between multi processes.
>> +	 */
>> +	eth_dev->data = data;
>> +	eth_dev->dev_ops = &ops;
>> +	eth_dev->driver = NULL;
>> +	eth_dev->data->dev_flags = RTE_ETH_DEV_DETACHABLE;
>> +	eth_dev->data->kdrv = RTE_KDRV_NONE;
>> +	eth_dev->data->drv_name = internal->dev_name;
>> +	eth_dev->data->numa_node = numa_node;
>> +
>> +	/* finally assign rx and tx ops */
>> +	eth_dev->rx_pkt_burst = eth_vhost_rx;
>> +	eth_dev->tx_pkt_burst = eth_vhost_tx;
>> +
>> +	return data->port_id;
>> +
>> +error:
>> +	rte_free(data);
>> +	rte_free(internal);
>> +	rte_free(eth_addr);
>> +
>> +	return -1;
>> +}
> ...
> ...
>> +
>> +	if ((internal) && (internal->dev_name))
>> +		free(internal->dev_name);
>> +	if ((internal) && (internal->iface_name))
>> +		free(internal->iface_name);
>> +
>> +	rte_free(eth_dev->data->mac_addrs);
>> +	rte_free(eth_dev->data);
>> +
>> +	for (i = 0; i < internal->nb_rx_queues; i++) {
>> +		if (internal->rx_vhost_queues[i] != NULL)
>> +			rte_free(internal->rx_vhost_queues[i]);
>> +	}
>> +	for (i = 0; i < internal->nb_tx_queues; i++) {
>> +		if (internal->tx_vhost_queues[i] != NULL)
>> +			rte_free(internal->tx_vhost_queues[i]);
> Ditto.
>
> (Hopefully I could have a detailed review later, say next week).
>
> 	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v3 1/2] vhost: Add callback and private data for vhost PMD
  2015-11-09  5:17                 ` [PATCH v3 1/2] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
@ 2015-11-09 18:16                   ` Aaron Conole
  2015-11-10  3:13                     ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Aaron Conole @ 2015-11-09 18:16 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

Greetings,

Tetsuya Mukawa <mukawa@igel.co.jp> writes:
> These variables are needed to be able to manage one of virtio devices
> using both vhost library APIs and vhost PMD.
> For example, if vhost PMD uses current callback handler and private data
> provided by vhost library, A DPDK application that links vhost library
> cannot use some of vhost library APIs. To avoid it, callback and private
> data for vhost PMD are needed.
>
> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> ---
>  lib/librte_vhost/rte_vhost_version.map        |  6 +++
>  lib/librte_vhost/rte_virtio_net.h             |  3 ++
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
>  lib/librte_vhost/virtio-net.c                 | 56 +++++++++++++++++++++++++--
>  lib/librte_vhost/virtio-net.h                 |  4 +-
>  5 files changed, 70 insertions(+), 12 deletions(-)
>
> diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
> index 3d8709e..00a9ce5 100644
> --- a/lib/librte_vhost/rte_vhost_version.map
> +++ b/lib/librte_vhost/rte_vhost_version.map
> @@ -20,3 +20,9 @@ DPDK_2.1 {
>  	rte_vhost_driver_unregister;
>  
>  } DPDK_2.0;
> +
> +DPDK_2.2 {
> +	global:
> +
> +	rte_vhost_driver_pmd_callback_register;
> +} DPDK_2.1;
> diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
> index 5687452..3ef6e58 100644
> --- a/lib/librte_vhost/rte_virtio_net.h
> +++ b/lib/librte_vhost/rte_virtio_net.h
> @@ -128,6 +128,7 @@ struct virtio_net {
>  	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
>  	uint32_t		virt_qp_nb;	/**< number of queue pair we have allocated */
>  	void			*priv;		/**< private context */
> +	void			*pmd_priv;	/**< private context for vhost PMD */
>  	struct vhost_virtqueue	*virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];	/**< Contains all virtqueue information. */
>  } __rte_cache_aligned;

Sorry if I'm missing something, but this is an ABI breaker, isn't it? I
think this needs the RTE_NEXT_ABI tag around it.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v3 2/2] vhost: Add VHOST PMD
  2015-11-09  5:17                 ` [PATCH v3 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  2015-11-09  6:21                   ` Yuanhan Liu
@ 2015-11-09 22:22                   ` Stephen Hemminger
  2015-11-10  3:14                     ` Tetsuya Mukawa
  2015-11-12 12:52                   ` Wang, Zhihong
                                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 200+ messages in thread
From: Stephen Hemminger @ 2015-11-09 22:22 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Mon,  9 Nov 2015 14:17:01 +0900
Tetsuya Mukawa <mukawa@igel.co.jp> wrote:

> +	rte_atomic32_set(&r->while_queuing, 1);
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		goto out;

You special 2 variable custom locking here is buggy.
If you hit second atomic test, you will leave while_queuing set.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 3/3] vhost: Add VHOST PMD
  2015-10-27  6:12         ` [PATCH 3/3] vhost: Add VHOST PMD Tetsuya Mukawa
  2015-11-02  3:58           ` [PATCH v2 0/2] " Tetsuya Mukawa
@ 2015-11-09 22:25           ` Stephen Hemminger
  2015-11-10  3:27             ` Tetsuya Mukawa
  1 sibling, 1 reply; 200+ messages in thread
From: Stephen Hemminger @ 2015-11-09 22:25 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Tue, 27 Oct 2015 15:12:55 +0900
Tetsuya Mukawa <mukawa@igel.co.jp> wrote:

> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> The vhost messages will be handled only when a port is started. So start
> a port first, then invoke QEMU.
> 
> The PMD has 2 parameters.
>  - iface:  The parameter is used to specify a path to connect to a
>            virtio-net device.
>  - queues: The parameter is used to specify the number of the queues
>            virtio-net device has.
>            (Default: 1)
> 
> Here is an example.
> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
> 
> To connect above testpmd, here is qemu command example.
> 
> $ qemu-system-x86_64 \
>         <snip>
>         -chardev socket,id=chr0,path=/tmp/sock0 \
>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
>         -device virtio-net-pci,netdev=net0
> 
> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>

Brocade developed a much simpler vhost PMD, without all the atomics and
locking.


/*-
 *   BSD LICENSE
 *
 *   Copyright (C) Brocade Communications Systems, Inc.
 *   All rights reserved.
 *
 *   Redistribution and use in source and binary forms, with or without
 *   modification, are permitted provided that the following conditions
 *   are met:
 *
 *     * Redistributions of source code must retain the above copyright
 *       notice, this list of conditions and the following disclaimer.
 *     * Redistributions in binary form must reproduce the above copyright
 *       notice, this list of conditions and the following disclaimer in
 *       the documentation and/or other materials provided with the
 *       distribution.
 *     * Neither the name of Brocade Communications Systems, Inc.
 *       nor the names of its contributors may be used to endorse
 *       or promote products derived from this software without specific
 *       prior written permission.
 *
 *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
 *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
 *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
 *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
 *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */

#include <rte_mbuf.h>
#include <rte_ether.h>
#include <rte_ethdev.h>
#include <rte_malloc.h>
#include <rte_memcpy.h>
#include <rte_dev.h>
#include <rte_log.h>

#include "../librte_vhost/rte_virtio_net.h"
#include "../librte_vhost/virtio-net.h"

struct pmd_internals;

struct vhost_queue {
	struct pmd_internals *internals;

	struct rte_mempool *mb_pool;

	uint64_t	pkts;
	uint64_t	bytes;
};

struct pmd_internals {
	struct virtio_net *dev;
	unsigned numa_node;
	struct eth_driver *eth_drv;

	unsigned nb_rx_queues;
	unsigned nb_tx_queues;

	struct vhost_queue rx_queues[1];
	struct vhost_queue tx_queues[1];
	uint8_t port_id;
};


static const char *drivername = "Vhost PMD";

static struct rte_eth_link pmd_link = {
	.link_speed = 10000,
	.link_duplex = ETH_LINK_FULL_DUPLEX,
	.link_status = 0
};

static uint16_t
eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
{
	int ret, i;
	struct vhost_queue *h = q;

	ret = rte_vhost_dequeue_burst(h->internals->dev,
		VIRTIO_TXQ, h->mb_pool, bufs, nb_bufs);

	for (i = 0; i < ret ; i++) {
		struct rte_mbuf *m = bufs[i];

		m->port = h->internals->port_id;
		++h->pkts;
		h->bytes += rte_pktmbuf_pkt_len(m);
	}
	return ret;
}

static uint16_t
eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
{
	int ret, i;
	struct vhost_queue *h = q;

	ret = rte_vhost_enqueue_burst(h->internals->dev,
		VIRTIO_RXQ, bufs, nb_bufs);

	for (i = 0; i < ret; i++) {
		struct rte_mbuf *m = bufs[i];

		++h->pkts;
		h->bytes += rte_pktmbuf_pkt_len(m);
		rte_pktmbuf_free(m);
	}

	return ret;
}

static int
eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
{
	return 0;
}

static int
eth_dev_start(struct rte_eth_dev *dev)
{
	struct pmd_internals *internals = dev->data->dev_private;

	dev->data->dev_link.link_status = 1;
	RTE_LOG(INFO, PMD, "vhost(%s): link up\n", internals->dev->ifname);
	return 0;
}

static void
eth_dev_stop(struct rte_eth_dev *dev)
{
	struct pmd_internals *internals = dev->data->dev_private;

	dev->data->dev_link.link_status = 0;
	RTE_LOG(INFO, PMD, "vhost(%s): link down\n", internals->dev->ifname);
}

static int
eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
		uint16_t nb_rx_desc __rte_unused,
		unsigned int socket_id __rte_unused,
		const struct rte_eth_rxconf *rx_conf __rte_unused,
		struct rte_mempool *mb_pool)
{
	struct pmd_internals *internals = dev->data->dev_private;

	internals->rx_queues[rx_queue_id].mb_pool = mb_pool;
	dev->data->rx_queues[rx_queue_id] =
		&internals->rx_queues[rx_queue_id];
	internals->rx_queues[rx_queue_id].internals = internals;

	return 0;
}

static int
eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
		uint16_t nb_tx_desc __rte_unused,
		unsigned int socket_id __rte_unused,
		const struct rte_eth_txconf *tx_conf __rte_unused)
{
	struct pmd_internals *internals = dev->data->dev_private;

	dev->data->tx_queues[tx_queue_id] =
		&internals->tx_queues[tx_queue_id];
	internals->tx_queues[tx_queue_id].internals = internals;

	return 0;
}


static void
eth_dev_info(struct rte_eth_dev *dev,
		struct rte_eth_dev_info *dev_info)
{
	struct pmd_internals *internals = dev->data->dev_private;

	dev_info->driver_name = drivername;
	dev_info->max_mac_addrs = 1;
	dev_info->max_rx_pktlen = -1;
	dev_info->max_rx_queues = (uint16_t)internals->nb_rx_queues;
	dev_info->max_tx_queues = (uint16_t)internals->nb_tx_queues;
	dev_info->min_rx_bufsize = 0;
	dev_info->pci_dev = NULL;
}

static void
eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
{
	const struct pmd_internals *internal = dev->data->dev_private;
	unsigned i;

	for (i = 0; i < internal->nb_rx_queues; i++) {
		const struct vhost_queue *h = &internal->rx_queues[i];

		stats->ipackets += h->pkts;
		stats->ibytes += h->bytes;

		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
			stats->q_ibytes[i] = h->bytes;
			stats->q_ipackets[i] = h->pkts;
		}
	}

	for (i = 0; i < internal->nb_tx_queues; i++) {
		const struct vhost_queue *h = &internal->tx_queues[i];

		stats->opackets += h->pkts;
		stats->obytes += h->bytes;

		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
			stats->q_obytes[i] = h->bytes;
			stats->q_opackets[i] = h->pkts;
		}
	}
}

static void
eth_stats_reset(struct rte_eth_dev *dev)
{
	unsigned i;
	struct pmd_internals *internal = dev->data->dev_private;

	for (i = 0; i < internal->nb_rx_queues; i++) {
		internal->rx_queues[i].pkts = 0;
		internal->rx_queues[i].bytes = 0;
	}

	for (i = 0; i < internal->nb_tx_queues; i++) {
		internal->tx_queues[i].pkts = 0;
		internal->tx_queues[i].bytes = 0;
	}
}

static struct eth_driver rte_vhost_pmd = {
	.pci_drv = {
		.name = "rte_vhost_pmd",
		.drv_flags = RTE_PCI_DRV_DETACHABLE,
	},
};

static void
eth_queue_release(void *q __rte_unused)
{
}

static int
eth_link_update(struct rte_eth_dev *dev __rte_unused,
		int wait_to_complete __rte_unused)
{
	return 0;
}

static struct eth_dev_ops eth_ops = {
	.dev_start = eth_dev_start,
	.dev_stop = eth_dev_stop,
	.dev_configure = eth_dev_configure,
	.dev_infos_get = eth_dev_info,
	.rx_queue_setup = eth_rx_queue_setup,
	.tx_queue_setup = eth_tx_queue_setup,
	.rx_queue_release = eth_queue_release,
	.tx_queue_release = eth_queue_release,
	.link_update = eth_link_update,
	.stats_get = eth_stats_get,
	.stats_reset = eth_stats_reset,
};

static int
eth_dev_vhost_create(const char *name, const unsigned numa_node)
{
	const unsigned nb_rx_queues = 1;
	const unsigned nb_tx_queues = 1;
	struct rte_eth_dev_data *data = NULL;
	struct rte_pci_device *pci_dev = NULL;
	struct pmd_internals *internals = NULL;
	struct rte_eth_dev *eth_dev = NULL;
	struct virtio_net *vhost_dev = NULL;
	struct eth_driver *eth_drv = NULL;
	struct rte_pci_id *id_table = NULL;
	struct ether_addr *eth_addr = NULL;

	if (name == NULL)
		return -EINVAL;

	vhost_dev = get_device_by_name(name);

	if (vhost_dev == NULL)
		return -EINVAL;

	RTE_LOG(INFO, PMD, "Creating vhost ethdev on numa socket %u\n",
			numa_node);

	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
	if (data == NULL)
		goto error;

	pci_dev = rte_zmalloc_socket(name, sizeof(*pci_dev), 0, numa_node);
	if (pci_dev == NULL)
		goto error;

	id_table = rte_zmalloc_socket(name, sizeof(*id_table), 0, numa_node);
	if (id_table == NULL)
		goto error;

	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
	if (internals == NULL)
		goto error;

	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
	if (internals == NULL)
		goto error;

	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
	if (eth_dev == NULL)
		goto error;

	eth_drv = rte_zmalloc_socket(name, sizeof(*eth_drv), 0, numa_node);
	if (eth_drv == NULL)
		goto error;

	internals->nb_rx_queues = nb_rx_queues;
	internals->nb_tx_queues = nb_tx_queues;
	internals->numa_node = numa_node;
	internals->dev = vhost_dev;

	internals->port_id = eth_dev->data->port_id;

	eth_drv->pci_drv.name = drivername;
	eth_drv->pci_drv.id_table = id_table;
	internals->eth_drv = eth_drv;

	pci_dev->numa_node = numa_node;
	pci_dev->driver = &eth_drv->pci_drv;

	data->dev_private = internals;
	data->port_id = eth_dev->data->port_id;
	data->nb_rx_queues = (uint16_t)nb_rx_queues;
	data->nb_tx_queues = (uint16_t)nb_tx_queues;
	data->dev_link = pmd_link;
	eth_random_addr(&eth_addr->addr_bytes[0]);
	data->mac_addrs = eth_addr;
	strncpy(data->name, eth_dev->data->name, strlen(eth_dev->data->name));

	eth_dev->data = data;
	eth_dev->dev_ops = &eth_ops;
	eth_dev->pci_dev = pci_dev;
	eth_dev->driver = &rte_vhost_pmd;
	eth_dev->rx_pkt_burst = eth_vhost_rx;
	eth_dev->tx_pkt_burst = eth_vhost_tx;
	TAILQ_INIT(&(eth_dev->link_intr_cbs));

	return 0;

error:
	rte_free(data);
	rte_free(pci_dev);
	rte_free(id_table);
	rte_free(eth_drv);
	rte_free(eth_addr);
	rte_free(internals);

	return -1;
}

static int
rte_pmd_vhost_devinit(const char *name,
		      const char *params __attribute__((unused)))
{
	unsigned numa_node;

	if (name == NULL)
		return -EINVAL;

	RTE_LOG(DEBUG, PMD, "Initializing pmd_vhost for %s\n", name);

	numa_node = rte_socket_id();

	return eth_dev_vhost_create(name, numa_node);
}

static int
rte_pmd_vhost_devuninit(const char *name)
{
	struct rte_eth_dev *eth_dev = NULL;
	struct pmd_internals *internals = NULL;

	if (name == NULL)
		return -EINVAL;

	RTE_LOG(DEBUG, PMD, "Closing vhost ethdev on numa socket %u\n",
			rte_socket_id());

	/* reserve an ethdev entry */
	eth_dev = rte_eth_dev_allocated(name);
	if (eth_dev == NULL)
		return -1;

	internals = (struct pmd_internals *)eth_dev->data->dev_private;
	rte_free(internals->eth_drv->pci_drv.id_table);
	rte_free(internals->eth_drv);
	rte_free(eth_dev->data->dev_private);
	rte_free(eth_dev->data->mac_addrs);
	rte_free(eth_dev->data);
	rte_free(eth_dev->pci_dev);

	rte_eth_dev_release_port(eth_dev);

	return 0;
}

static struct rte_driver pmd_vhost_drv = {
	.name = "vhost",
	.type = PMD_VDEV,
	.init = rte_pmd_vhost_devinit,
	.uninit = rte_pmd_vhost_devuninit,
};

PMD_REGISTER_DRIVER(pmd_vhost_drv);

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v3 1/2] vhost: Add callback and private data for vhost PMD
  2015-11-09 18:16                   ` Aaron Conole
@ 2015-11-10  3:13                     ` Tetsuya Mukawa
  2015-11-10  7:16                       ` Panu Matilainen
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-10  3:13 UTC (permalink / raw)
  To: Aaron Conole; +Cc: dev, ann.zhuangyanying

On 2015/11/10 3:16, Aaron Conole wrote:
> Greetings,
>
> Tetsuya Mukawa <mukawa@igel.co.jp> writes:
>> These variables are needed to be able to manage one of virtio devices
>> using both vhost library APIs and vhost PMD.
>> For example, if vhost PMD uses current callback handler and private data
>> provided by vhost library, A DPDK application that links vhost library
>> cannot use some of vhost library APIs. To avoid it, callback and private
>> data for vhost PMD are needed.
>>
>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
>> ---
>>  lib/librte_vhost/rte_vhost_version.map        |  6 +++
>>  lib/librte_vhost/rte_virtio_net.h             |  3 ++
>>  lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
>>  lib/librte_vhost/virtio-net.c                 | 56 +++++++++++++++++++++++++--
>>  lib/librte_vhost/virtio-net.h                 |  4 +-
>>  5 files changed, 70 insertions(+), 12 deletions(-)
>>
>> diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
>> index 3d8709e..00a9ce5 100644
>> --- a/lib/librte_vhost/rte_vhost_version.map
>> +++ b/lib/librte_vhost/rte_vhost_version.map
>> @@ -20,3 +20,9 @@ DPDK_2.1 {
>>  	rte_vhost_driver_unregister;
>>  
>>  } DPDK_2.0;
>> +
>> +DPDK_2.2 {
>> +	global:
>> +
>> +	rte_vhost_driver_pmd_callback_register;
>> +} DPDK_2.1;
>> diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
>> index 5687452..3ef6e58 100644
>> --- a/lib/librte_vhost/rte_virtio_net.h
>> +++ b/lib/librte_vhost/rte_virtio_net.h
>> @@ -128,6 +128,7 @@ struct virtio_net {
>>  	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
>>  	uint32_t		virt_qp_nb;	/**< number of queue pair we have allocated */
>>  	void			*priv;		/**< private context */
>> +	void			*pmd_priv;	/**< private context for vhost PMD */
>>  	struct vhost_virtqueue	*virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];	/**< Contains all virtqueue information. */
>>  } __rte_cache_aligned;
> Sorry if I'm missing something, but this is an ABI breaker, isn't it? I
> think this needs the RTE_NEXT_ABI tag around it.

Hi Aaron,

Thanks for reviewing. Yes, your are correct.
I guess I can implement vhost PMD without this variable, so I will
remove it.

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v3 2/2] vhost: Add VHOST PMD
  2015-11-09 22:22                   ` Stephen Hemminger
@ 2015-11-10  3:14                     ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-10  3:14 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, ann.zhuangyanying

On 2015/11/10 7:22, Stephen Hemminger wrote:
> On Mon,  9 Nov 2015 14:17:01 +0900
> Tetsuya Mukawa <mukawa@igel.co.jp> wrote:
>
>> +	rte_atomic32_set(&r->while_queuing, 1);
>> +
>> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
>> +		goto out;
> You special 2 variable custom locking here is buggy.
> If you hit second atomic test, you will leave while_queuing set.

Hi Stephen,

Thanks for reviewing.
I clear while_queuing like below.

+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_rx;
+}

Thanks,
tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 3/3] vhost: Add VHOST PMD
  2015-11-09 22:25           ` [PATCH 3/3] vhost: " Stephen Hemminger
@ 2015-11-10  3:27             ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-10  3:27 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, ann.zhuangyanying

On 2015/11/10 7:25, Stephen Hemminger wrote:
> On Tue, 27 Oct 2015 15:12:55 +0900
> Tetsuya Mukawa <mukawa@igel.co.jp> wrote:
>
>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
>> The vhost messages will be handled only when a port is started. So start
>> a port first, then invoke QEMU.
>>
>> The PMD has 2 parameters.
>>  - iface:  The parameter is used to specify a path to connect to a
>>            virtio-net device.
>>  - queues: The parameter is used to specify the number of the queues
>>            virtio-net device has.
>>            (Default: 1)
>>
>> Here is an example.
>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
>>
>> To connect above testpmd, here is qemu command example.
>>
>> $ qemu-system-x86_64 \
>>         <snip>
>>         -chardev socket,id=chr0,path=/tmp/sock0 \
>>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
>>         -device virtio-net-pci,netdev=net0
>>
>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> Brocade developed a much simpler vhost PMD, without all the atomics and
> locking.
>

Hi Stephen,

With your PMD, it seems we need to call some vhost library APIs before
start sending and receiving.
It means we need to manage virtio-net device connections in DPDK
application anyway.

Also, I guess all PMDs should provide feature to be replaced by one of
other PMD without heavy modification for DPDK application.
This is because I tried to manage virtio-net device connections in vhost
PMD.

Thanks,
Tetsuya

> /*-
>  *   BSD LICENSE
>  *
>  *   Copyright (C) Brocade Communications Systems, Inc.
>  *   All rights reserved.
>  *
>  *   Redistribution and use in source and binary forms, with or without
>  *   modification, are permitted provided that the following conditions
>  *   are met:
>  *
>  *     * Redistributions of source code must retain the above copyright
>  *       notice, this list of conditions and the following disclaimer.
>  *     * Redistributions in binary form must reproduce the above copyright
>  *       notice, this list of conditions and the following disclaimer in
>  *       the documentation and/or other materials provided with the
>  *       distribution.
>  *     * Neither the name of Brocade Communications Systems, Inc.
>  *       nor the names of its contributors may be used to endorse
>  *       or promote products derived from this software without specific
>  *       prior written permission.
>  *
>  *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>  *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>  *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>  *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>  *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>  *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>  *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>  *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>  *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>  *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>  */
>
> #include <rte_mbuf.h>
> #include <rte_ether.h>
> #include <rte_ethdev.h>
> #include <rte_malloc.h>
> #include <rte_memcpy.h>
> #include <rte_dev.h>
> #include <rte_log.h>
>
> #include "../librte_vhost/rte_virtio_net.h"
> #include "../librte_vhost/virtio-net.h"
>
> struct pmd_internals;
>
> struct vhost_queue {
> 	struct pmd_internals *internals;
>
> 	struct rte_mempool *mb_pool;
>
> 	uint64_t	pkts;
> 	uint64_t	bytes;
> };
>
> struct pmd_internals {
> 	struct virtio_net *dev;
> 	unsigned numa_node;
> 	struct eth_driver *eth_drv;
>
> 	unsigned nb_rx_queues;
> 	unsigned nb_tx_queues;
>
> 	struct vhost_queue rx_queues[1];
> 	struct vhost_queue tx_queues[1];
> 	uint8_t port_id;
> };
>
>
> static const char *drivername = "Vhost PMD";
>
> static struct rte_eth_link pmd_link = {
> 	.link_speed = 10000,
> 	.link_duplex = ETH_LINK_FULL_DUPLEX,
> 	.link_status = 0
> };
>
> static uint16_t
> eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> {
> 	int ret, i;
> 	struct vhost_queue *h = q;
>
> 	ret = rte_vhost_dequeue_burst(h->internals->dev,
> 		VIRTIO_TXQ, h->mb_pool, bufs, nb_bufs);
>
> 	for (i = 0; i < ret ; i++) {
> 		struct rte_mbuf *m = bufs[i];
>
> 		m->port = h->internals->port_id;
> 		++h->pkts;
> 		h->bytes += rte_pktmbuf_pkt_len(m);
> 	}
> 	return ret;
> }
>
> static uint16_t
> eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> {
> 	int ret, i;
> 	struct vhost_queue *h = q;
>
> 	ret = rte_vhost_enqueue_burst(h->internals->dev,
> 		VIRTIO_RXQ, bufs, nb_bufs);
>
> 	for (i = 0; i < ret; i++) {
> 		struct rte_mbuf *m = bufs[i];
>
> 		++h->pkts;
> 		h->bytes += rte_pktmbuf_pkt_len(m);
> 		rte_pktmbuf_free(m);
> 	}
>
> 	return ret;
> }
>
> static int
> eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
> {
> 	return 0;
> }
>
> static int
> eth_dev_start(struct rte_eth_dev *dev)
> {
> 	struct pmd_internals *internals = dev->data->dev_private;
>
> 	dev->data->dev_link.link_status = 1;
> 	RTE_LOG(INFO, PMD, "vhost(%s): link up\n", internals->dev->ifname);
> 	return 0;
> }
>
> static void
> eth_dev_stop(struct rte_eth_dev *dev)
> {
> 	struct pmd_internals *internals = dev->data->dev_private;
>
> 	dev->data->dev_link.link_status = 0;
> 	RTE_LOG(INFO, PMD, "vhost(%s): link down\n", internals->dev->ifname);
> }
>
> static int
> eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
> 		uint16_t nb_rx_desc __rte_unused,
> 		unsigned int socket_id __rte_unused,
> 		const struct rte_eth_rxconf *rx_conf __rte_unused,
> 		struct rte_mempool *mb_pool)
> {
> 	struct pmd_internals *internals = dev->data->dev_private;
>
> 	internals->rx_queues[rx_queue_id].mb_pool = mb_pool;
> 	dev->data->rx_queues[rx_queue_id] =
> 		&internals->rx_queues[rx_queue_id];
> 	internals->rx_queues[rx_queue_id].internals = internals;
>
> 	return 0;
> }
>
> static int
> eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
> 		uint16_t nb_tx_desc __rte_unused,
> 		unsigned int socket_id __rte_unused,
> 		const struct rte_eth_txconf *tx_conf __rte_unused)
> {
> 	struct pmd_internals *internals = dev->data->dev_private;
>
> 	dev->data->tx_queues[tx_queue_id] =
> 		&internals->tx_queues[tx_queue_id];
> 	internals->tx_queues[tx_queue_id].internals = internals;
>
> 	return 0;
> }
>
>
> static void
> eth_dev_info(struct rte_eth_dev *dev,
> 		struct rte_eth_dev_info *dev_info)
> {
> 	struct pmd_internals *internals = dev->data->dev_private;
>
> 	dev_info->driver_name = drivername;
> 	dev_info->max_mac_addrs = 1;
> 	dev_info->max_rx_pktlen = -1;
> 	dev_info->max_rx_queues = (uint16_t)internals->nb_rx_queues;
> 	dev_info->max_tx_queues = (uint16_t)internals->nb_tx_queues;
> 	dev_info->min_rx_bufsize = 0;
> 	dev_info->pci_dev = NULL;
> }
>
> static void
> eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
> {
> 	const struct pmd_internals *internal = dev->data->dev_private;
> 	unsigned i;
>
> 	for (i = 0; i < internal->nb_rx_queues; i++) {
> 		const struct vhost_queue *h = &internal->rx_queues[i];
>
> 		stats->ipackets += h->pkts;
> 		stats->ibytes += h->bytes;
>
> 		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
> 			stats->q_ibytes[i] = h->bytes;
> 			stats->q_ipackets[i] = h->pkts;
> 		}
> 	}
>
> 	for (i = 0; i < internal->nb_tx_queues; i++) {
> 		const struct vhost_queue *h = &internal->tx_queues[i];
>
> 		stats->opackets += h->pkts;
> 		stats->obytes += h->bytes;
>
> 		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
> 			stats->q_obytes[i] = h->bytes;
> 			stats->q_opackets[i] = h->pkts;
> 		}
> 	}
> }
>
> static void
> eth_stats_reset(struct rte_eth_dev *dev)
> {
> 	unsigned i;
> 	struct pmd_internals *internal = dev->data->dev_private;
>
> 	for (i = 0; i < internal->nb_rx_queues; i++) {
> 		internal->rx_queues[i].pkts = 0;
> 		internal->rx_queues[i].bytes = 0;
> 	}
>
> 	for (i = 0; i < internal->nb_tx_queues; i++) {
> 		internal->tx_queues[i].pkts = 0;
> 		internal->tx_queues[i].bytes = 0;
> 	}
> }
>
> static struct eth_driver rte_vhost_pmd = {
> 	.pci_drv = {
> 		.name = "rte_vhost_pmd",
> 		.drv_flags = RTE_PCI_DRV_DETACHABLE,
> 	},
> };
>
> static void
> eth_queue_release(void *q __rte_unused)
> {
> }
>
> static int
> eth_link_update(struct rte_eth_dev *dev __rte_unused,
> 		int wait_to_complete __rte_unused)
> {
> 	return 0;
> }
>
> static struct eth_dev_ops eth_ops = {
> 	.dev_start = eth_dev_start,
> 	.dev_stop = eth_dev_stop,
> 	.dev_configure = eth_dev_configure,
> 	.dev_infos_get = eth_dev_info,
> 	.rx_queue_setup = eth_rx_queue_setup,
> 	.tx_queue_setup = eth_tx_queue_setup,
> 	.rx_queue_release = eth_queue_release,
> 	.tx_queue_release = eth_queue_release,
> 	.link_update = eth_link_update,
> 	.stats_get = eth_stats_get,
> 	.stats_reset = eth_stats_reset,
> };
>
> static int
> eth_dev_vhost_create(const char *name, const unsigned numa_node)
> {
> 	const unsigned nb_rx_queues = 1;
> 	const unsigned nb_tx_queues = 1;
> 	struct rte_eth_dev_data *data = NULL;
> 	struct rte_pci_device *pci_dev = NULL;
> 	struct pmd_internals *internals = NULL;
> 	struct rte_eth_dev *eth_dev = NULL;
> 	struct virtio_net *vhost_dev = NULL;
> 	struct eth_driver *eth_drv = NULL;
> 	struct rte_pci_id *id_table = NULL;
> 	struct ether_addr *eth_addr = NULL;
>
> 	if (name == NULL)
> 		return -EINVAL;
>
> 	vhost_dev = get_device_by_name(name);
>
> 	if (vhost_dev == NULL)
> 		return -EINVAL;
>
> 	RTE_LOG(INFO, PMD, "Creating vhost ethdev on numa socket %u\n",
> 			numa_node);
>
> 	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
> 	if (data == NULL)
> 		goto error;
>
> 	pci_dev = rte_zmalloc_socket(name, sizeof(*pci_dev), 0, numa_node);
> 	if (pci_dev == NULL)
> 		goto error;
>
> 	id_table = rte_zmalloc_socket(name, sizeof(*id_table), 0, numa_node);
> 	if (id_table == NULL)
> 		goto error;
>
> 	internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node);
> 	if (internals == NULL)
> 		goto error;
>
> 	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
> 	if (internals == NULL)
> 		goto error;
>
> 	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
> 	if (eth_dev == NULL)
> 		goto error;
>
> 	eth_drv = rte_zmalloc_socket(name, sizeof(*eth_drv), 0, numa_node);
> 	if (eth_drv == NULL)
> 		goto error;
>
> 	internals->nb_rx_queues = nb_rx_queues;
> 	internals->nb_tx_queues = nb_tx_queues;
> 	internals->numa_node = numa_node;
> 	internals->dev = vhost_dev;
>
> 	internals->port_id = eth_dev->data->port_id;
>
> 	eth_drv->pci_drv.name = drivername;
> 	eth_drv->pci_drv.id_table = id_table;
> 	internals->eth_drv = eth_drv;
>
> 	pci_dev->numa_node = numa_node;
> 	pci_dev->driver = &eth_drv->pci_drv;
>
> 	data->dev_private = internals;
> 	data->port_id = eth_dev->data->port_id;
> 	data->nb_rx_queues = (uint16_t)nb_rx_queues;
> 	data->nb_tx_queues = (uint16_t)nb_tx_queues;
> 	data->dev_link = pmd_link;
> 	eth_random_addr(&eth_addr->addr_bytes[0]);
> 	data->mac_addrs = eth_addr;
> 	strncpy(data->name, eth_dev->data->name, strlen(eth_dev->data->name));
>
> 	eth_dev->data = data;
> 	eth_dev->dev_ops = &eth_ops;
> 	eth_dev->pci_dev = pci_dev;
> 	eth_dev->driver = &rte_vhost_pmd;
> 	eth_dev->rx_pkt_burst = eth_vhost_rx;
> 	eth_dev->tx_pkt_burst = eth_vhost_tx;
> 	TAILQ_INIT(&(eth_dev->link_intr_cbs));
>
> 	return 0;
>
> error:
> 	rte_free(data);
> 	rte_free(pci_dev);
> 	rte_free(id_table);
> 	rte_free(eth_drv);
> 	rte_free(eth_addr);
> 	rte_free(internals);
>
> 	return -1;
> }
>
> static int
> rte_pmd_vhost_devinit(const char *name,
> 		      const char *params __attribute__((unused)))
> {
> 	unsigned numa_node;
>
> 	if (name == NULL)
> 		return -EINVAL;
>
> 	RTE_LOG(DEBUG, PMD, "Initializing pmd_vhost for %s\n", name);
>
> 	numa_node = rte_socket_id();
>
> 	return eth_dev_vhost_create(name, numa_node);
> }
>
> static int
> rte_pmd_vhost_devuninit(const char *name)
> {
> 	struct rte_eth_dev *eth_dev = NULL;
> 	struct pmd_internals *internals = NULL;
>
> 	if (name == NULL)
> 		return -EINVAL;
>
> 	RTE_LOG(DEBUG, PMD, "Closing vhost ethdev on numa socket %u\n",
> 			rte_socket_id());
>
> 	/* reserve an ethdev entry */
> 	eth_dev = rte_eth_dev_allocated(name);
> 	if (eth_dev == NULL)
> 		return -1;
>
> 	internals = (struct pmd_internals *)eth_dev->data->dev_private;
> 	rte_free(internals->eth_drv->pci_drv.id_table);
> 	rte_free(internals->eth_drv);
> 	rte_free(eth_dev->data->dev_private);
> 	rte_free(eth_dev->data->mac_addrs);
> 	rte_free(eth_dev->data);
> 	rte_free(eth_dev->pci_dev);
>
> 	rte_eth_dev_release_port(eth_dev);
>
> 	return 0;
> }
>
> static struct rte_driver pmd_vhost_drv = {
> 	.name = "vhost",
> 	.type = PMD_VDEV,
> 	.init = rte_pmd_vhost_devinit,
> 	.uninit = rte_pmd_vhost_devuninit,
> };
>
> PMD_REGISTER_DRIVER(pmd_vhost_drv);

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v3 1/2] vhost: Add callback and private data for vhost PMD
  2015-11-10  3:13                     ` Tetsuya Mukawa
@ 2015-11-10  7:16                       ` Panu Matilainen
  2015-11-10  9:48                         ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Panu Matilainen @ 2015-11-10  7:16 UTC (permalink / raw)
  To: Tetsuya Mukawa, Aaron Conole; +Cc: dev, ann.zhuangyanying

On 11/10/2015 05:13 AM, Tetsuya Mukawa wrote:
> On 2015/11/10 3:16, Aaron Conole wrote:
>> Greetings,
>>
>> Tetsuya Mukawa <mukawa@igel.co.jp> writes:
>>> These variables are needed to be able to manage one of virtio devices
>>> using both vhost library APIs and vhost PMD.
>>> For example, if vhost PMD uses current callback handler and private data
>>> provided by vhost library, A DPDK application that links vhost library
>>> cannot use some of vhost library APIs. To avoid it, callback and private
>>> data for vhost PMD are needed.
>>>
>>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
>>> ---
>>>   lib/librte_vhost/rte_vhost_version.map        |  6 +++
>>>   lib/librte_vhost/rte_virtio_net.h             |  3 ++
>>>   lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
>>>   lib/librte_vhost/virtio-net.c                 | 56 +++++++++++++++++++++++++--
>>>   lib/librte_vhost/virtio-net.h                 |  4 +-
>>>   5 files changed, 70 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
>>> index 3d8709e..00a9ce5 100644
>>> --- a/lib/librte_vhost/rte_vhost_version.map
>>> +++ b/lib/librte_vhost/rte_vhost_version.map
>>> @@ -20,3 +20,9 @@ DPDK_2.1 {
>>>   	rte_vhost_driver_unregister;
>>>
>>>   } DPDK_2.0;
>>> +
>>> +DPDK_2.2 {
>>> +	global:
>>> +
>>> +	rte_vhost_driver_pmd_callback_register;
>>> +} DPDK_2.1;
>>> diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
>>> index 5687452..3ef6e58 100644
>>> --- a/lib/librte_vhost/rte_virtio_net.h
>>> +++ b/lib/librte_vhost/rte_virtio_net.h
>>> @@ -128,6 +128,7 @@ struct virtio_net {
>>>   	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
>>>   	uint32_t		virt_qp_nb;	/**< number of queue pair we have allocated */
>>>   	void			*priv;		/**< private context */
>>> +	void			*pmd_priv;	/**< private context for vhost PMD */
>>>   	struct vhost_virtqueue	*virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];	/**< Contains all virtqueue information. */
>>>   } __rte_cache_aligned;
>> Sorry if I'm missing something, but this is an ABI breaker, isn't it? I
>> think this needs the RTE_NEXT_ABI tag around it.
>
> Hi Aaron,
>
> Thanks for reviewing. Yes, your are correct.
> I guess I can implement vhost PMD without this variable, so I will
> remove it.

No need to.

The librte_vhost ABI has already been broken during the DPDK 2.2 cycle 
by the multiqueue changes, but that's okay since it was announced during 
2.1 cycle (in commit 3c848bd7b1c6f4f681b833322a748fdefbb5fb2d).

What is missing right now is bumping the library version, and that must 
happen before 2.2 is released.

	- Panu -

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v3 1/2] vhost: Add callback and private data for vhost PMD
  2015-11-10  7:16                       ` Panu Matilainen
@ 2015-11-10  9:48                         ` Tetsuya Mukawa
  2015-11-10 10:05                           ` Panu Matilainen
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-10  9:48 UTC (permalink / raw)
  To: Panu Matilainen, Aaron Conole; +Cc: dev, ann.zhuangyanying

On 2015/11/10 16:16, Panu Matilainen wrote:
> On 11/10/2015 05:13 AM, Tetsuya Mukawa wrote:
>> On 2015/11/10 3:16, Aaron Conole wrote:
>>> Greetings,
>>>
>>> Tetsuya Mukawa <mukawa@igel.co.jp> writes:
>>>> These variables are needed to be able to manage one of virtio devices
>>>> using both vhost library APIs and vhost PMD.
>>>> For example, if vhost PMD uses current callback handler and private
>>>> data
>>>> provided by vhost library, A DPDK application that links vhost library
>>>> cannot use some of vhost library APIs. To avoid it, callback and
>>>> private
>>>> data for vhost PMD are needed.
>>>>
>>>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
>>>> ---
>>>>   lib/librte_vhost/rte_vhost_version.map        |  6 +++
>>>>   lib/librte_vhost/rte_virtio_net.h             |  3 ++
>>>>   lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
>>>>   lib/librte_vhost/virtio-net.c                 | 56
>>>> +++++++++++++++++++++++++--
>>>>   lib/librte_vhost/virtio-net.h                 |  4 +-
>>>>   5 files changed, 70 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/lib/librte_vhost/rte_vhost_version.map
>>>> b/lib/librte_vhost/rte_vhost_version.map
>>>> index 3d8709e..00a9ce5 100644
>>>> --- a/lib/librte_vhost/rte_vhost_version.map
>>>> +++ b/lib/librte_vhost/rte_vhost_version.map
>>>> @@ -20,3 +20,9 @@ DPDK_2.1 {
>>>>       rte_vhost_driver_unregister;
>>>>
>>>>   } DPDK_2.0;
>>>> +
>>>> +DPDK_2.2 {
>>>> +    global:
>>>> +
>>>> +    rte_vhost_driver_pmd_callback_register;
>>>> +} DPDK_2.1;
>>>> diff --git a/lib/librte_vhost/rte_virtio_net.h
>>>> b/lib/librte_vhost/rte_virtio_net.h
>>>> index 5687452..3ef6e58 100644
>>>> --- a/lib/librte_vhost/rte_virtio_net.h
>>>> +++ b/lib/librte_vhost/rte_virtio_net.h
>>>> @@ -128,6 +128,7 @@ struct virtio_net {
>>>>       char            ifname[IF_NAME_SZ];    /**< Name of the tap
>>>> device or socket path. */
>>>>       uint32_t        virt_qp_nb;    /**< number of queue pair we
>>>> have allocated */
>>>>       void            *priv;        /**< private context */
>>>> +    void            *pmd_priv;    /**< private context for vhost
>>>> PMD */
>>>>       struct vhost_virtqueue    *virtqueue[VHOST_MAX_QUEUE_PAIRS *
>>>> 2];    /**< Contains all virtqueue information. */
>>>>   } __rte_cache_aligned;
>>> Sorry if I'm missing something, but this is an ABI breaker, isn't it? I
>>> think this needs the RTE_NEXT_ABI tag around it.
>>
>> Hi Aaron,
>>
>> Thanks for reviewing. Yes, your are correct.
>> I guess I can implement vhost PMD without this variable, so I will
>> remove it.
>
> No need to.
>
> The librte_vhost ABI has already been broken during the DPDK 2.2 cycle
> by the multiqueue changes, but that's okay since it was announced
> during 2.1 cycle (in commit 3c848bd7b1c6f4f681b833322a748fdefbb5fb2d).
>
> What is missing right now is bumping the library version, and that
> must happen before 2.2 is released.
>
>     - Panu -
>
>

Hi Panu,

Thank you so much. Let me make sure what you mean.
I guess I need to add RTE_NEXT_ABI tags where pmd_priv is used. This is
because we don't break DPDK-2.1 ABI.
Anyway, the tag will be removed when DPDK-2.2 is released, then we can
use vhost PMD.
Is this correct?

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v3 1/2] vhost: Add callback and private data for vhost PMD
  2015-11-10  9:48                         ` Tetsuya Mukawa
@ 2015-11-10 10:05                           ` Panu Matilainen
  2015-11-10 10:15                             ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Panu Matilainen @ 2015-11-10 10:05 UTC (permalink / raw)
  To: Tetsuya Mukawa, Aaron Conole; +Cc: dev, ann.zhuangyanying

On 11/10/2015 11:48 AM, Tetsuya Mukawa wrote:
> On 2015/11/10 16:16, Panu Matilainen wrote:
>> On 11/10/2015 05:13 AM, Tetsuya Mukawa wrote:
>>> On 2015/11/10 3:16, Aaron Conole wrote:
>>>> Greetings,
>>>>
>>>> Tetsuya Mukawa <mukawa@igel.co.jp> writes:
>>>>> These variables are needed to be able to manage one of virtio devices
>>>>> using both vhost library APIs and vhost PMD.
>>>>> For example, if vhost PMD uses current callback handler and private
>>>>> data
>>>>> provided by vhost library, A DPDK application that links vhost library
>>>>> cannot use some of vhost library APIs. To avoid it, callback and
>>>>> private
>>>>> data for vhost PMD are needed.
>>>>>
>>>>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
>>>>> ---
>>>>>    lib/librte_vhost/rte_vhost_version.map        |  6 +++
>>>>>    lib/librte_vhost/rte_virtio_net.h             |  3 ++
>>>>>    lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
>>>>>    lib/librte_vhost/virtio-net.c                 | 56
>>>>> +++++++++++++++++++++++++--
>>>>>    lib/librte_vhost/virtio-net.h                 |  4 +-
>>>>>    5 files changed, 70 insertions(+), 12 deletions(-)
>>>>>
>>>>> diff --git a/lib/librte_vhost/rte_vhost_version.map
>>>>> b/lib/librte_vhost/rte_vhost_version.map
>>>>> index 3d8709e..00a9ce5 100644
>>>>> --- a/lib/librte_vhost/rte_vhost_version.map
>>>>> +++ b/lib/librte_vhost/rte_vhost_version.map
>>>>> @@ -20,3 +20,9 @@ DPDK_2.1 {
>>>>>        rte_vhost_driver_unregister;
>>>>>
>>>>>    } DPDK_2.0;
>>>>> +
>>>>> +DPDK_2.2 {
>>>>> +    global:
>>>>> +
>>>>> +    rte_vhost_driver_pmd_callback_register;
>>>>> +} DPDK_2.1;
>>>>> diff --git a/lib/librte_vhost/rte_virtio_net.h
>>>>> b/lib/librte_vhost/rte_virtio_net.h
>>>>> index 5687452..3ef6e58 100644
>>>>> --- a/lib/librte_vhost/rte_virtio_net.h
>>>>> +++ b/lib/librte_vhost/rte_virtio_net.h
>>>>> @@ -128,6 +128,7 @@ struct virtio_net {
>>>>>        char            ifname[IF_NAME_SZ];    /**< Name of the tap
>>>>> device or socket path. */
>>>>>        uint32_t        virt_qp_nb;    /**< number of queue pair we
>>>>> have allocated */
>>>>>        void            *priv;        /**< private context */
>>>>> +    void            *pmd_priv;    /**< private context for vhost
>>>>> PMD */
>>>>>        struct vhost_virtqueue    *virtqueue[VHOST_MAX_QUEUE_PAIRS *
>>>>> 2];    /**< Contains all virtqueue information. */
>>>>>    } __rte_cache_aligned;
>>>> Sorry if I'm missing something, but this is an ABI breaker, isn't it? I
>>>> think this needs the RTE_NEXT_ABI tag around it.
>>>
>>> Hi Aaron,
>>>
>>> Thanks for reviewing. Yes, your are correct.
>>> I guess I can implement vhost PMD without this variable, so I will
>>> remove it.
>>
>> No need to.
>>
>> The librte_vhost ABI has already been broken during the DPDK 2.2 cycle
>> by the multiqueue changes, but that's okay since it was announced
>> during 2.1 cycle (in commit 3c848bd7b1c6f4f681b833322a748fdefbb5fb2d).
>>
>> What is missing right now is bumping the library version, and that
>> must happen before 2.2 is released.
>>
>>      - Panu -
>>
>>
>
> Hi Panu,
>
> Thank you so much. Let me make sure what you mean.
> I guess I need to add RTE_NEXT_ABI tags where pmd_priv is used. This is
> because we don't break DPDK-2.1 ABI.
> Anyway, the tag will be removed when DPDK-2.2 is released, then we can
> use vhost PMD.
> Is this correct?

Not quite. Because the ABI has already been broken between 2.1 and 2.2, 
you can ride the same wave without messing with NEXT_ABI and such.

Like said, librte_vhost is pending a LIBABIVER bump to 2, but that is 
regardless of this patch.

	- Panu -

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v3 1/2] vhost: Add callback and private data for vhost PMD
  2015-11-10 10:05                           ` Panu Matilainen
@ 2015-11-10 10:15                             ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-10 10:15 UTC (permalink / raw)
  To: Panu Matilainen, Aaron Conole; +Cc: dev, ann.zhuangyanying

On 2015/11/10 19:05, Panu Matilainen wrote:
> On 11/10/2015 11:48 AM, Tetsuya Mukawa wrote:
>> On 2015/11/10 16:16, Panu Matilainen wrote:
>>> On 11/10/2015 05:13 AM, Tetsuya Mukawa wrote:
>>>> On 2015/11/10 3:16, Aaron Conole wrote:
>>>>> Greetings,
>>>>>
>>>>> Tetsuya Mukawa <mukawa@igel.co.jp> writes:
>>>>>> These variables are needed to be able to manage one of virtio
>>>>>> devices
>>>>>> using both vhost library APIs and vhost PMD.
>>>>>> For example, if vhost PMD uses current callback handler and private
>>>>>> data
>>>>>> provided by vhost library, A DPDK application that links vhost
>>>>>> library
>>>>>> cannot use some of vhost library APIs. To avoid it, callback and
>>>>>> private
>>>>>> data for vhost PMD are needed.
>>>>>>
>>>>>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
>>>>>> ---
>>>>>>    lib/librte_vhost/rte_vhost_version.map        |  6 +++
>>>>>>    lib/librte_vhost/rte_virtio_net.h             |  3 ++
>>>>>>    lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++----
>>>>>>    lib/librte_vhost/virtio-net.c                 | 56
>>>>>> +++++++++++++++++++++++++--
>>>>>>    lib/librte_vhost/virtio-net.h                 |  4 +-
>>>>>>    5 files changed, 70 insertions(+), 12 deletions(-)
>>>>>>
>>>>>> diff --git a/lib/librte_vhost/rte_vhost_version.map
>>>>>> b/lib/librte_vhost/rte_vhost_version.map
>>>>>> index 3d8709e..00a9ce5 100644
>>>>>> --- a/lib/librte_vhost/rte_vhost_version.map
>>>>>> +++ b/lib/librte_vhost/rte_vhost_version.map
>>>>>> @@ -20,3 +20,9 @@ DPDK_2.1 {
>>>>>>        rte_vhost_driver_unregister;
>>>>>>
>>>>>>    } DPDK_2.0;
>>>>>> +
>>>>>> +DPDK_2.2 {
>>>>>> +    global:
>>>>>> +
>>>>>> +    rte_vhost_driver_pmd_callback_register;
>>>>>> +} DPDK_2.1;
>>>>>> diff --git a/lib/librte_vhost/rte_virtio_net.h
>>>>>> b/lib/librte_vhost/rte_virtio_net.h
>>>>>> index 5687452..3ef6e58 100644
>>>>>> --- a/lib/librte_vhost/rte_virtio_net.h
>>>>>> +++ b/lib/librte_vhost/rte_virtio_net.h
>>>>>> @@ -128,6 +128,7 @@ struct virtio_net {
>>>>>>        char            ifname[IF_NAME_SZ];    /**< Name of the tap
>>>>>> device or socket path. */
>>>>>>        uint32_t        virt_qp_nb;    /**< number of queue pair we
>>>>>> have allocated */
>>>>>>        void            *priv;        /**< private context */
>>>>>> +    void            *pmd_priv;    /**< private context for vhost
>>>>>> PMD */
>>>>>>        struct vhost_virtqueue    *virtqueue[VHOST_MAX_QUEUE_PAIRS *
>>>>>> 2];    /**< Contains all virtqueue information. */
>>>>>>    } __rte_cache_aligned;
>>>>> Sorry if I'm missing something, but this is an ABI breaker, isn't
>>>>> it? I
>>>>> think this needs the RTE_NEXT_ABI tag around it.
>>>>
>>>> Hi Aaron,
>>>>
>>>> Thanks for reviewing. Yes, your are correct.
>>>> I guess I can implement vhost PMD without this variable, so I will
>>>> remove it.
>>>
>>> No need to.
>>>
>>> The librte_vhost ABI has already been broken during the DPDK 2.2 cycle
>>> by the multiqueue changes, but that's okay since it was announced
>>> during 2.1 cycle (in commit 3c848bd7b1c6f4f681b833322a748fdefbb5fb2d).
>>>
>>> What is missing right now is bumping the library version, and that
>>> must happen before 2.2 is released.
>>>
>>>      - Panu -
>>>
>>>
>>
>> Hi Panu,
>>
>> Thank you so much. Let me make sure what you mean.
>> I guess I need to add RTE_NEXT_ABI tags where pmd_priv is used. This is
>> because we don't break DPDK-2.1 ABI.
>> Anyway, the tag will be removed when DPDK-2.2 is released, then we can
>> use vhost PMD.
>> Is this correct?
>
> Not quite. Because the ABI has already been broken between 2.1 and
> 2.2, you can ride the same wave without messing with NEXT_ABI and such.
>
> Like said, librte_vhost is pending a LIBABIVER bump to 2, but that is
> regardless of this patch.
>
>     - Panu -
>

Thanks. I can clearly understand.

Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v3 2/2] vhost: Add VHOST PMD
  2015-11-09  5:17                 ` [PATCH v3 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  2015-11-09  6:21                   ` Yuanhan Liu
  2015-11-09 22:22                   ` Stephen Hemminger
@ 2015-11-12 12:52                   ` Wang, Zhihong
  2015-11-13  3:09                     ` Tetsuya Mukawa
  2015-11-13  4:03                   ` Rich Lane
  2015-11-13  5:20                   ` [PATCH v4 0/2] " Tetsuya Mukawa
  4 siblings, 1 reply; 200+ messages in thread
From: Wang, Zhihong @ 2015-11-12 12:52 UTC (permalink / raw)
  To: Tetsuya Mukawa, dev, Liu, Yuanhan; +Cc: ann.zhuangyanying

Hi Tetsuya,

In my test I created 2 vdev using "--vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' --vdev 'eth_vhost1,iface=/tmp/sock1,queues=1'", and the qemu message got handled in wrong order.
The reason is that: 2 threads are created to handle message from 2 sockets, but their fds are SHARED, so each thread are reading from both sockets.

This can lead to incorrect behaviors, in my case sometimes the VHOST_USER_SET_MEM_TABLE got handled after VRING initialization and lead to destroy_device().

Detailed log as shown below: thread 69351 & 69352 are both reading fd 25. Thanks Yuanhan for helping debugging!


Thanks
Zhihong


-----------------------------------------------------------------------------------------------------------------

---->  debug: setting up new vq conn for fd: 23, tid: 69352
VHOST_CONFIG: new virtio connection is 25
VHOST_CONFIG: new device, handle is 0
---->  debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_OWNER
---->  debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
---->  debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:26
---->  debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:27
---->  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:28
---->  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:26
---->  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_FEATURES
---->  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_MEM_TABLE
---->  debug: device_fh: 0: user_set_mem_table
VHOST_CONFIG: mapped region 0 fd:27 to 0x7ff6c0000000 sz:0xa0000 off:0x0
VHOST_CONFIG: mapped region 1 fd:29 to 0x7ff680000000 sz:0x40000000 off:0xc0000
---->  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
---->  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
---->  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
---->  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
VHOST_CONFIG: vring kick idx:0 file:30
---->  debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: virtio is not ready for processing.
---->  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
---->  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
---->  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
VHOST_CONFIG: vring kick idx:1 file:31
VHOST_CONFIG: virtio is now ready for processing.
PMD: New connection established
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM

-----------------------------------------------------------------------------------------------------------------

> ...
> +
> +static void *vhost_driver_session(void *param __rte_unused)
> +{
> +	static struct virtio_net_device_ops *vhost_ops;
> +
> +	vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
> +	if (vhost_ops == NULL)
> +		rte_panic("Can't allocate memory\n");
> +
> +	/* set vhost arguments */
> +	vhost_ops->new_device = new_device;
> +	vhost_ops->destroy_device = destroy_device;
> +	if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
> +		rte_panic("Can't register callbacks\n");
> +
> +	/* start event handling */
> +	rte_vhost_driver_session_start();
> +
> +	rte_free(vhost_ops);
> +	pthread_exit(0);
> +}
> +
> +static void vhost_driver_session_start(struct pmd_internal *internal)
> +{
> +	int ret;
> +
> +	ret = pthread_create(&internal->session_th,
> +			NULL, vhost_driver_session, NULL);
> +	if (ret)
> +		rte_panic("Can't create a thread\n");
> +}
> +
> ...

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v3 2/2] vhost: Add VHOST PMD
  2015-11-12 12:52                   ` Wang, Zhihong
@ 2015-11-13  3:09                     ` Tetsuya Mukawa
  2015-11-13  3:50                       ` Wang, Zhihong
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-13  3:09 UTC (permalink / raw)
  To: Wang, Zhihong, dev, Liu, Yuanhan; +Cc: ann.zhuangyanying

On 2015/11/12 21:52, Wang, Zhihong wrote:
> Hi Tetsuya,
>
> In my test I created 2 vdev using "--vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' --vdev 'eth_vhost1,iface=/tmp/sock1,queues=1'", and the qemu message got handled in wrong order.
> The reason is that: 2 threads are created to handle message from 2 sockets, but their fds are SHARED, so each thread are reading from both sockets.
>
> This can lead to incorrect behaviors, in my case sometimes the VHOST_USER_SET_MEM_TABLE got handled after VRING initialization and lead to destroy_device().
>
> Detailed log as shown below: thread 69351 & 69352 are both reading fd 25. Thanks Yuanhan for helping debugging!
>

Hi Zhihong and Yuanhan,

Thank you so much for debugging the issue.
I will fix vhost PMD not to create multiple message handling threads.

I am going to submit the PMD today.
Could you please check it again using latest one?

Tetsuya


> Thanks
> Zhihong
>
>
> -----------------------------------------------------------------------------------------------------------------
>
> ---->  debug: setting up new vq conn for fd: 23, tid: 69352
> VHOST_CONFIG: new virtio connection is 25
> VHOST_CONFIG: new device, handle is 0
> ---->  debug: vserver_message_handler thread id: 69352, fd: 25
> VHOST_CONFIG: read message VHOST_USER_SET_OWNER
> ---->  debug: vserver_message_handler thread id: 69352, fd: 25
> VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
> ---->  debug: vserver_message_handler thread id: 69352, fd: 25
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
> VHOST_CONFIG: vring call idx:0 file:26
> ---->  debug: vserver_message_handler thread id: 69352, fd: 25
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
> VHOST_CONFIG: vring call idx:1 file:27
> ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
> VHOST_CONFIG: vring call idx:0 file:28
> ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
> VHOST_CONFIG: vring call idx:1 file:26
> ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> VHOST_CONFIG: read message VHOST_USER_SET_FEATURES
> ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> VHOST_CONFIG: read message VHOST_USER_SET_MEM_TABLE
> ---->  debug: device_fh: 0: user_set_mem_table
> VHOST_CONFIG: mapped region 0 fd:27 to 0x7ff6c0000000 sz:0xa0000 off:0x0
> VHOST_CONFIG: mapped region 1 fd:29 to 0x7ff680000000 sz:0x40000000 off:0xc0000
> ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
> ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
> ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
> ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
> VHOST_CONFIG: vring kick idx:0 file:30
> ---->  debug: vserver_message_handler thread id: 69352, fd: 25
> VHOST_CONFIG: virtio is not ready for processing.
> ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
> ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
> ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
> VHOST_CONFIG: vring kick idx:1 file:31
> VHOST_CONFIG: virtio is now ready for processing.
> PMD: New connection established
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
>
> -----------------------------------------------------------------------------------------------------------------
>
>> ...
>> +
>> +static void *vhost_driver_session(void *param __rte_unused)
>> +{
>> +	static struct virtio_net_device_ops *vhost_ops;
>> +
>> +	vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
>> +	if (vhost_ops == NULL)
>> +		rte_panic("Can't allocate memory\n");
>> +
>> +	/* set vhost arguments */
>> +	vhost_ops->new_device = new_device;
>> +	vhost_ops->destroy_device = destroy_device;
>> +	if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
>> +		rte_panic("Can't register callbacks\n");
>> +
>> +	/* start event handling */
>> +	rte_vhost_driver_session_start();
>> +
>> +	rte_free(vhost_ops);
>> +	pthread_exit(0);
>> +}
>> +
>> +static void vhost_driver_session_start(struct pmd_internal *internal)
>> +{
>> +	int ret;
>> +
>> +	ret = pthread_create(&internal->session_th,
>> +			NULL, vhost_driver_session, NULL);
>> +	if (ret)
>> +		rte_panic("Can't create a thread\n");
>> +}
>> +
>> ...

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v3 2/2] vhost: Add VHOST PMD
  2015-11-13  3:09                     ` Tetsuya Mukawa
@ 2015-11-13  3:50                       ` Wang, Zhihong
  0 siblings, 0 replies; 200+ messages in thread
From: Wang, Zhihong @ 2015-11-13  3:50 UTC (permalink / raw)
  To: Tetsuya Mukawa, dev, Liu, Yuanhan; +Cc: ann.zhuangyanying



> -----Original Message-----
> From: Tetsuya Mukawa [mailto:mukawa@igel.co.jp]
> Sent: Friday, November 13, 2015 11:10 AM
> To: Wang, Zhihong <zhihong.wang@intel.com>; dev@dpdk.org; Liu, Yuanhan
> <yuanhan.liu@intel.com>
> Cc: ann.zhuangyanying@huawei.com
> Subject: Re: [dpdk-dev] [PATCH v3 2/2] vhost: Add VHOST PMD
> 
> On 2015/11/12 21:52, Wang, Zhihong wrote:
> > Hi Tetsuya,
> >
> > In my test I created 2 vdev using "--vdev
> 'eth_vhost0,iface=/tmp/sock0,queues=1' --vdev
> 'eth_vhost1,iface=/tmp/sock1,queues=1'", and the qemu message got handled
> in wrong order.
> > The reason is that: 2 threads are created to handle message from 2 sockets, but
> their fds are SHARED, so each thread are reading from both sockets.
> >
> > This can lead to incorrect behaviors, in my case sometimes the
> VHOST_USER_SET_MEM_TABLE got handled after VRING initialization and lead to
> destroy_device().
> >
> > Detailed log as shown below: thread 69351 & 69352 are both reading fd 25.
> Thanks Yuanhan for helping debugging!
> >
> 
> Hi Zhihong and Yuanhan,
> 
> Thank you so much for debugging the issue.
> I will fix vhost PMD not to create multiple message handling threads.
> 
> I am going to submit the PMD today.
> Could you please check it again using latest one?
> 

Looking forward to it!


> Tetsuya
> 
> 
> > Thanks
> > Zhihong
> >
> >
> > ----------------------------------------------------------------------
> > -------------------------------------------
> >
> > ---->  debug: setting up new vq conn for fd: 23, tid: 69352
> > VHOST_CONFIG: new virtio connection is 25
> > VHOST_CONFIG: new device, handle is 0
> > ---->  debug: vserver_message_handler thread id: 69352, fd: 25
> > VHOST_CONFIG: read message VHOST_USER_SET_OWNER
> > ---->  debug: vserver_message_handler thread id: 69352, fd: 25
> > VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
> > ---->  debug: vserver_message_handler thread id: 69352, fd: 25
> > VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
> > VHOST_CONFIG: vring call idx:0 file:26
> > ---->  debug: vserver_message_handler thread id: 69352, fd: 25
> > VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
> > VHOST_CONFIG: vring call idx:1 file:27
> > ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> > VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
> > VHOST_CONFIG: vring call idx:0 file:28
> > ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> > VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
> > VHOST_CONFIG: vring call idx:1 file:26
> > ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> > VHOST_CONFIG: read message VHOST_USER_SET_FEATURES
> > ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> > VHOST_CONFIG: read message VHOST_USER_SET_MEM_TABLE
> > ---->  debug: device_fh: 0: user_set_mem_table
> > VHOST_CONFIG: mapped region 0 fd:27 to 0x7ff6c0000000 sz:0xa0000
> > off:0x0
> > VHOST_CONFIG: mapped region 1 fd:29 to 0x7ff680000000 sz:0x40000000
> > off:0xc0000
> > ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> > VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
> > ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> > VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
> > ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> > VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
> > ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> > VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
> > VHOST_CONFIG: vring kick idx:0 file:30
> > ---->  debug: vserver_message_handler thread id: 69352, fd: 25
> > VHOST_CONFIG: virtio is not ready for processing.
> > ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> > VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
> > ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> > VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
> > ---->  debug: vserver_message_handler thread id: 69351, fd: 25
> > VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
> > VHOST_CONFIG: vring kick idx:1 file:31
> > VHOST_CONFIG: virtio is now ready for processing.
> > PMD: New connection established
> > VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
> >
> > ----------------------------------------------------------------------
> > -------------------------------------------
> >
> >> ...
> >> +
> >> +static void *vhost_driver_session(void *param __rte_unused) {
> >> +	static struct virtio_net_device_ops *vhost_ops;
> >> +
> >> +	vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
> >> +	if (vhost_ops == NULL)
> >> +		rte_panic("Can't allocate memory\n");
> >> +
> >> +	/* set vhost arguments */
> >> +	vhost_ops->new_device = new_device;
> >> +	vhost_ops->destroy_device = destroy_device;
> >> +	if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
> >> +		rte_panic("Can't register callbacks\n");
> >> +
> >> +	/* start event handling */
> >> +	rte_vhost_driver_session_start();
> >> +
> >> +	rte_free(vhost_ops);
> >> +	pthread_exit(0);
> >> +}
> >> +
> >> +static void vhost_driver_session_start(struct pmd_internal
> >> +*internal) {
> >> +	int ret;
> >> +
> >> +	ret = pthread_create(&internal->session_th,
> >> +			NULL, vhost_driver_session, NULL);
> >> +	if (ret)
> >> +		rte_panic("Can't create a thread\n"); }
> >> +
> >> ...

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v3 2/2] vhost: Add VHOST PMD
  2015-11-09  5:17                 ` [PATCH v3 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
                                     ` (2 preceding siblings ...)
  2015-11-12 12:52                   ` Wang, Zhihong
@ 2015-11-13  4:03                   ` Rich Lane
  2015-11-13  4:29                     ` Tetsuya Mukawa
  2015-11-13  5:20                   ` [PATCH v4 0/2] " Tetsuya Mukawa
  4 siblings, 1 reply; 200+ messages in thread
From: Rich Lane @ 2015-11-13  4:03 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

>
> +       if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
> +               ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
> +                                        &open_iface, &iface_name);
> +               if (ret < 0)
> +                       goto out_free;
> +       }
>

I noticed that the strdup in eth_dev_vhost_create crashes if you don't pass
the iface option, so this should probably return an error if the option
doesn't exist.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v3 2/2] vhost: Add VHOST PMD
  2015-11-13  4:03                   ` Rich Lane
@ 2015-11-13  4:29                     ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-13  4:29 UTC (permalink / raw)
  To: Rich Lane; +Cc: dev, ann.zhuangyanying

On 2015/11/13 13:03, Rich Lane wrote:
>> +       if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
>> +               ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
>> +                                        &open_iface, &iface_name);
>> +               if (ret < 0)
>> +                       goto out_free;
>> +       }
>>
> I noticed that the strdup in eth_dev_vhost_create crashes if you don't pass
> the iface option, so this should probably return an error if the option
> doesn't exist.
>

Hi Lane,

Yes, you are correct. Thanks for checking!
I will fix it also.

Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v4 0/2] Add VHOST PMD
  2015-11-09  5:17                 ` [PATCH v3 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
                                     ` (3 preceding siblings ...)
  2015-11-13  4:03                   ` Rich Lane
@ 2015-11-13  5:20                   ` Tetsuya Mukawa
  2015-11-13  5:20                     ` [PATCH v4 1/2] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
                                       ` (2 more replies)
  4 siblings, 3 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-13  5:20 UTC (permalink / raw)
  To: dev, zhihong.wang, yuanhan.liu; +Cc: ann.zhuangyanying

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.

* Known issue.
We may see issues while handling RESET_OWNER message.
These handlings are done in vhost library, so not a part of vhost PMD.
So far, we are waiting for QEMU fixing.

PATCH v4 changes:
 - Rebase on latest DPDK tree.
 - Fix cording style.
 - Fix code not to invoke multiple messaging handling threads.
 - Fix code to handle vdev parameters correctly.
 - Remove needless cast.
 - Remove needless if-condition before rt_free().

PATCH v3 changes:
 - Rebase on latest matser
 - Specify correct queue_id in RX/TX function.

PATCH v2 changes:
 - Remove a below patch that fixes vhost library.
   The patch was applied as a separate patch.
   - vhost: fix crash with multiqueue enabled
 - Fix typos.
   (Thanks to Thomas, Monjalon)
 - Rebase on latest tree with above bernard's patches.

PATCH v1 changes:
 - Support vhost multiple queues.
 - Rebase on "remove pci driver from vdevs".
 - Optimize RX/TX functions.
 - Fix resource leaks.
 - Fix compile issue.
 - Add patch to fix vhost library.

RFC PATCH v3 changes:
 - Optimize performance.
   In RX/TX functions, change code to access only per core data.
 - Add below API to allow user to use vhost library APIs for a port managed
   by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
    - rte_eth_vhost_portid2vdev()
   To support this functionality, vhost library is also changed.
   Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
 - Add code to support vhost multiple queues.
   Actually, multiple queues functionality is not enabled so far.

RFC PATCH v2 changes:
 - Fix issues reported by checkpatch.pl
   (Thanks to Stephen Hemminger)


Tetsuya Mukawa (2):
  vhost: Add callback and private data for vhost PMD
  vhost: Add VHOST PMD

 config/common_linuxapp                        |   6 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/rel_notes/release_2_2.rst          |   2 +
 drivers/net/Makefile                          |   4 +
 drivers/net/vhost/Makefile                    |  62 ++
 drivers/net/vhost/rte_eth_vhost.c             | 783 ++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h             |  65 +++
 drivers/net/vhost/rte_pmd_vhost_version.map   |   8 +
 lib/librte_vhost/rte_vhost_version.map        |   6 +
 lib/librte_vhost/rte_virtio_net.h             |   3 +
 lib/librte_vhost/vhost_user/virtio-net-user.c |  13 +-
 lib/librte_vhost/virtio-net.c                 |  60 +-
 lib/librte_vhost/virtio-net.h                 |   4 +-
 mk/rte.app.mk                                 |   8 +-
 14 files changed, 1011 insertions(+), 14 deletions(-)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

-- 
2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v4 1/2] vhost: Add callback and private data for vhost PMD
  2015-11-13  5:20                   ` [PATCH v4 0/2] " Tetsuya Mukawa
@ 2015-11-13  5:20                     ` Tetsuya Mukawa
  2015-11-17 13:29                       ` Yuanhan Liu
  2015-11-13  5:20                     ` [PATCH v4 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  2015-11-13  5:32                     ` [PATCH v4 0/2] " Yuanhan Liu
  2 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-13  5:20 UTC (permalink / raw)
  To: dev, zhihong.wang, yuanhan.liu; +Cc: ann.zhuangyanying

These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs. To avoid it, callback and private
data for vhost PMD are needed.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 lib/librte_vhost/rte_vhost_version.map        |  6 +++
 lib/librte_vhost/rte_virtio_net.h             |  3 ++
 lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++---
 lib/librte_vhost/virtio-net.c                 | 60 +++++++++++++++++++++++++--
 lib/librte_vhost/virtio-net.h                 |  4 +-
 5 files changed, 73 insertions(+), 13 deletions(-)

diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 3d8709e..00a9ce5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -20,3 +20,9 @@ DPDK_2.1 {
 	rte_vhost_driver_unregister;
 
 } DPDK_2.0;
+
+DPDK_2.2 {
+	global:
+
+	rte_vhost_driver_pmd_callback_register;
+} DPDK_2.1;
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 5687452..3ef6e58 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -128,6 +128,7 @@ struct virtio_net {
 	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
 	uint32_t		virt_qp_nb;	/**< number of queue pair we have allocated */
 	void			*priv;		/**< private context */
+	void			*pmd_priv;	/**< private context for vhost PMD */
 	struct vhost_virtqueue	*virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];	/**< Contains all virtqueue information. */
 } __rte_cache_aligned;
 
@@ -224,6 +225,8 @@ int rte_vhost_driver_unregister(const char *dev_name);
 
 /* Register callbacks. */
 int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const);
+/* Register callbacks for vhost PMD (Only for internal). */
+int rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const);
 /* Start vhost driver session blocking loop. */
 int rte_vhost_driver_session_start(void);
 
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index d07452a..d8ae2fc 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -111,7 +111,7 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
 
 	/* Remove from the data plane. */
 	if (dev->flags & VIRTIO_DEV_RUNNING)
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	if (dev->mem) {
 		free_mem_region(dev);
@@ -272,7 +272,7 @@ user_set_vring_kick(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
 
 	if (virtio_is_ready(dev) &&
 		!(dev->flags & VIRTIO_DEV_RUNNING))
-			notify_ops->new_device(dev);
+			notify_new_device(dev);
 }
 
 /*
@@ -288,7 +288,7 @@ user_get_vring_base(struct vhost_device_ctx ctx,
 		return -1;
 	/* We have to stop the queue (virtio) if it is running. */
 	if (dev->flags & VIRTIO_DEV_RUNNING)
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	/* Here we are safe to get the last used index */
 	ops->get_vring_base(ctx, state->index, state);
@@ -324,10 +324,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
 		"set queue enable: %d to qp idx: %d\n",
 		enable, state->index);
 
-	if (notify_ops->vring_state_changed) {
-		notify_ops->vring_state_changed(dev, base_idx / VIRTIO_QNUM,
-						enable);
-	}
+	notify_vring_state_changed(dev, base_idx / VIRTIO_QNUM, enable);
 
 	dev->virtqueue[base_idx + VIRTIO_RXQ]->enabled = enable;
 	dev->virtqueue[base_idx + VIRTIO_TXQ]->enabled = enable;
@@ -341,7 +338,7 @@ user_destroy_device(struct vhost_device_ctx ctx)
 	struct virtio_net *dev = get_device(ctx);
 
 	if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	if (dev && dev->mem) {
 		free_mem_region(dev);
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index cc917da..886c104 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -65,6 +65,8 @@ struct virtio_net_config_ll {
 
 /* device ops to add/remove device to/from data core. */
 struct virtio_net_device_ops const *notify_ops;
+/* device ops for vhost PMD to add/remove device to/from data core. */
+struct virtio_net_device_ops const *pmd_notify_ops;
 /* root address of the linked list of managed virtio devices */
 static struct virtio_net_config_ll *ll_root;
 
@@ -81,6 +83,45 @@ static struct virtio_net_config_ll *ll_root;
 static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
 
 
+int
+notify_new_device(struct virtio_net *dev)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->new_device != NULL)) {
+		int ret = pmd_notify_ops->new_device(dev);
+
+		if (ret != 0)
+			return ret;
+	}
+	if ((notify_ops != NULL) && (notify_ops->new_device != NULL))
+		return notify_ops->new_device(dev);
+
+	return 0;
+}
+
+void
+notify_destroy_device(volatile struct virtio_net *dev)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->destroy_device != NULL))
+		pmd_notify_ops->destroy_device(dev);
+	if ((notify_ops != NULL) && (notify_ops->destroy_device != NULL))
+		notify_ops->destroy_device(dev);
+}
+
+int
+notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->vring_state_changed != NULL)) {
+		int ret = pmd_notify_ops->vring_state_changed(dev, queue_id, enable);
+
+		if (ret != 0)
+			return ret;
+	}
+	if ((notify_ops != NULL) && (notify_ops->vring_state_changed != NULL))
+		return notify_ops->vring_state_changed(dev, queue_id, enable);
+
+	return 0;
+}
+
 /*
  * Converts QEMU virtual address to Vhost virtual address. This function is
  * used to convert the ring addresses to our address space.
@@ -374,7 +415,7 @@ destroy_device(struct vhost_device_ctx ctx)
 			 * the function to remove it from the data core.
 			 */
 			if ((ll_dev_cur->dev.flags & VIRTIO_DEV_RUNNING))
-				notify_ops->destroy_device(&(ll_dev_cur->dev));
+				notify_destroy_device(&(ll_dev_cur->dev));
 			ll_dev_cur = rm_config_ll_entry(ll_dev_cur,
 					ll_dev_last);
 		} else {
@@ -432,7 +473,7 @@ reset_owner(struct vhost_device_ctx ctx)
 		return -1;
 
 	if (dev->flags & VIRTIO_DEV_RUNNING)
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	cleanup_device(dev);
 	reset_device(dev);
@@ -790,12 +831,12 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 	if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
 		if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
 			((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
-			return notify_ops->new_device(dev);
+			return notify_new_device(dev);
 		}
 	/* Otherwise we remove it. */
 	} else
 		if (file->fd == VIRTIO_DEV_STOPPED)
-			notify_ops->destroy_device(dev);
+			notify_destroy_device(dev);
 	return 0;
 }
 
@@ -879,3 +920,14 @@ rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const op
 
 	return 0;
 }
+
+/*
+ * Register ops so that we can add/remove device to data core.
+ */
+int
+rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const ops)
+{
+	pmd_notify_ops = ops;
+
+	return 0;
+}
diff --git a/lib/librte_vhost/virtio-net.h b/lib/librte_vhost/virtio-net.h
index 75fb57e..0816e71 100644
--- a/lib/librte_vhost/virtio-net.h
+++ b/lib/librte_vhost/virtio-net.h
@@ -37,7 +37,9 @@
 #include "vhost-net.h"
 #include "rte_virtio_net.h"
 
-struct virtio_net_device_ops const *notify_ops;
 struct virtio_net *get_device(struct vhost_device_ctx ctx);
 
+int notify_new_device(struct virtio_net *dev);
+void notify_destroy_device(volatile struct virtio_net *dev);
+int notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable);
 #endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v4 2/2] vhost: Add VHOST PMD
  2015-11-13  5:20                   ` [PATCH v4 0/2] " Tetsuya Mukawa
  2015-11-13  5:20                     ` [PATCH v4 1/2] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
@ 2015-11-13  5:20                     ` Tetsuya Mukawa
  2015-11-16  1:57                       ` Wang, Zhihong
                                         ` (3 more replies)
  2015-11-13  5:32                     ` [PATCH v4 0/2] " Yuanhan Liu
  2 siblings, 4 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-13  5:20 UTC (permalink / raw)
  To: dev, zhihong.wang, yuanhan.liu; +Cc: ann.zhuangyanying

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
 - iface:  The parameter is used to specify a path to connect to a
           virtio-net device.
 - queues: The parameter is used to specify the number of the queues
           virtio-net device has.
           (Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
        <snip>
        -chardev socket,id=chr0,path=/tmp/sock0 \
        -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
        -device virtio-net-pci,netdev=net0

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 config/common_linuxapp                      |   6 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/rel_notes/release_2_2.rst        |   2 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 +++
 drivers/net/vhost/rte_eth_vhost.c           | 783 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           |  65 +++
 drivers/net/vhost/rte_pmd_vhost_version.map |   8 +
 mk/rte.app.mk                               |   8 +-
 9 files changed, 938 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 52173d5..1ea23ef 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -461,6 +461,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 
 #
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 2d4936d..57d1041 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -47,6 +47,7 @@ Network Interface Controller Drivers
     mlx4
     mlx5
     virtio
+    vhost
     vmxnet3
     pcap_ring
 
diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 1c02ff6..c2284d3 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -94,6 +94,8 @@ New Features
 
 * **Added vhost-user multiple queue support.**
 
+* **Added vhost PMD.**
+
 * **Added port hotplug support to vmxnet3.**
 
 * **Added port hotplug support to xenvirt.**
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 6da1ce2..66eb63d 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -50,5 +50,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 include $(RTE_SDK)/mk/rte.sharelib.mk
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..8186a80
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+#   BSD LICENSE
+#
+#   Copyright (c) 2010-2015 Intel Corporation.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..7fb30fe
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,783 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2015 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co.,Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+
+#include "rte_eth_vhost.h"
+
+#define ETH_VHOST_IFACE_ARG		"iface"
+#define ETH_VHOST_QUEUES_ARG		"queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+	ETH_VHOST_IFACE_ARG,
+	ETH_VHOST_QUEUES_ARG,
+	NULL
+};
+
+static struct ether_addr base_eth_addr = {
+	.addr_bytes = {
+		0x56 /* V */,
+		0x48 /* H */,
+		0x4F /* O */,
+		0x53 /* S */,
+		0x54 /* T */,
+		0x00
+	}
+};
+
+struct vhost_queue {
+	rte_atomic32_t allow_queuing;
+	rte_atomic32_t while_queuing;
+	struct virtio_net *device;
+	struct pmd_internal *internal;
+	struct rte_mempool *mb_pool;
+	uint16_t virtqueue_id;
+	uint64_t rx_pkts;
+	uint64_t tx_pkts;
+	uint64_t err_pkts;
+};
+
+struct pmd_internal {
+	TAILQ_ENTRY(pmd_internal) next;
+	char *dev_name;
+	char *iface_name;
+	unsigned nb_rx_queues;
+	unsigned nb_tx_queues;
+
+	struct vhost_queue *rx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+	struct vhost_queue *tx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+
+	volatile uint16_t once;
+};
+
+TAILQ_HEAD(pmd_internal_head, pmd_internal);
+static struct pmd_internal_head internals_list =
+	TAILQ_HEAD_INITIALIZER(internals_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static rte_atomic16_t nb_started_ports;
+pthread_t session_th;
+
+static struct rte_eth_link pmd_link = {
+		.link_speed = 10000,
+		.link_duplex = ETH_LINK_FULL_DUPLEX,
+		.link_status = 0
+};
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t nb_rx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Dequeue packets from guest TX queue */
+	nb_rx = rte_vhost_dequeue_burst(r->device,
+			r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
+
+	r->rx_pkts += nb_rx;
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_tx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Enqueue packets to guest RX queue */
+	nb_tx = rte_vhost_enqueue_burst(r->device,
+			r->virtqueue_id, bufs, nb_bufs);
+
+	r->tx_pkts += nb_tx;
+	r->err_pkts += nb_bufs - nb_tx;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		rte_pktmbuf_free(bufs[i]);
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	return 0;
+}
+
+static inline struct pmd_internal *
+find_internal_resource(char *ifname)
+{
+	int found = 0;
+	struct pmd_internal *internal;
+
+	if (ifname == NULL)
+		return NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(internal, &internals_list, next) {
+		if (!strcmp(internal->iface_name, ifname)) {
+			found = 1;
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	if (!found)
+		return NULL;
+
+	return internal;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	internal = find_internal_resource(dev->ifname);
+	if (internal == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid device name\n");
+		return -1;
+	}
+
+	if ((dev->virt_qp_nb < internal->nb_rx_queues) ||
+			(dev->virt_qp_nb < internal->nb_tx_queues)) {
+		RTE_LOG(INFO, PMD, "Not enough queues\n");
+		return -1;
+	}
+
+	eth_dev = rte_eth_dev_allocated(internal->dev_name);
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+		return -1;
+	}
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+	}
+
+	dev->flags |= VIRTIO_DEV_RUNNING;
+	dev->pmd_priv = eth_dev;
+	eth_dev->data->dev_link.link_status = 1;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	RTE_LOG(INFO, PMD, "New connection established\n");
+
+	return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return;
+	}
+
+	eth_dev = (struct rte_eth_dev *)dev->pmd_priv;
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+		return;
+	}
+
+	internal = eth_dev->data->dev_private;
+
+	/* Wait until rx/tx_pkt_burst stops accessing vhost device */
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+
+	eth_dev->data->dev_link.link_status = 0;
+
+	dev->pmd_priv = NULL;
+	dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+
+	RTE_LOG(INFO, PMD, "Connection closed\n");
+}
+
+static void *vhost_driver_session(void *param __rte_unused)
+{
+	static struct virtio_net_device_ops *vhost_ops;
+
+	vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
+	if (vhost_ops == NULL)
+		rte_panic("Can't allocate memory\n");
+
+	/* set vhost arguments */
+	vhost_ops->new_device = new_device;
+	vhost_ops->destroy_device = destroy_device;
+	if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
+		rte_panic("Can't register callbacks\n");
+
+	/* start event handling */
+	rte_vhost_driver_session_start();
+
+	rte_free(vhost_ops);
+	pthread_exit(0);
+}
+
+static void vhost_driver_session_start(void)
+{
+	int ret;
+
+	ret = pthread_create(&session_th,
+			NULL, vhost_driver_session, NULL);
+	if (ret)
+		rte_panic("Can't create a thread\n");
+}
+
+static void vhost_driver_session_stop(void)
+{
+	int ret;
+
+	ret = pthread_cancel(session_th);
+	if (ret)
+		rte_panic("Can't cancel the thread\n");
+
+	ret = pthread_join(session_th, NULL);
+	if (ret)
+		rte_panic("Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	int ret;
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+		ret = rte_vhost_driver_register(internal->iface_name);
+		if (ret)
+			return ret;
+	}
+
+	/* We need only one message handling thread */
+	if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
+		vhost_driver_session_start();
+
+	return 0;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 1, 0))
+		rte_vhost_driver_unregister(internal->iface_name);
+
+	if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
+		vhost_driver_session_stop();
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+	struct vhost_queue *vq;
+
+	rte_free(internal->rx_vhost_queues[rx_queue_id]);
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->mb_pool = mb_pool;
+	vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
+	internal->rx_vhost_queues[rx_queue_id] = vq;
+	dev->data->rx_queues[rx_queue_id] = vq;
+	return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+	struct vhost_queue *vq;
+
+	rte_free(internal->tx_vhost_queues[tx_queue_id]);
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
+	internal->tx_vhost_queues[tx_queue_id] = vq;
+	dev->data->tx_queues[tx_queue_id] = vq;
+	return 0;
+}
+
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+	     struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	dev_info->driver_name = drivername;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = (uint32_t)-1;
+	dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
+	dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
+	dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+	unsigned i;
+	unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
+	const struct pmd_internal *internal = dev->data->dev_private;
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+	     i < internal->nb_rx_queues; i++) {
+		if (internal->rx_vhost_queues[i] == NULL)
+			continue;
+		igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i]->rx_pkts;
+		rx_total += igb_stats->q_ipackets[i];
+	}
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+	     i < internal->nb_tx_queues; i++) {
+		if (internal->tx_vhost_queues[i] == NULL)
+			continue;
+		igb_stats->q_opackets[i] = internal->tx_vhost_queues[i]->tx_pkts;
+		igb_stats->q_errors[i] = internal->tx_vhost_queues[i]->err_pkts;
+		tx_total += igb_stats->q_opackets[i];
+		tx_err_total += igb_stats->q_errors[i];
+	}
+
+	igb_stats->ipackets = rx_total;
+	igb_stats->opackets = tx_total;
+	igb_stats->oerrors = tx_err_total;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	unsigned i;
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		if (internal->rx_vhost_queues[i] == NULL)
+			continue;
+		internal->rx_vhost_queues[i]->rx_pkts = 0;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		if (internal->tx_vhost_queues[i] == NULL)
+			continue;
+		internal->tx_vhost_queues[i]->tx_pkts = 0;
+		internal->tx_vhost_queues[i]->err_pkts = 0;
+	}
+}
+
+static void
+eth_queue_release(void *q __rte_unused)
+{
+	return;
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, int index,
+		     char *iface_name,
+		     int16_t queues,
+		     const unsigned numa_node)
+{
+	struct rte_eth_dev_data *data = NULL;
+	struct pmd_internal *internal = NULL;
+	struct rte_eth_dev *eth_dev = NULL;
+	struct ether_addr *eth_addr = NULL;
+
+	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+		numa_node);
+
+	/* now do all data allocation - for eth_dev structure, dummy pci driver
+	 * and internal (private) data
+	 */
+	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+	if (data == NULL)
+		goto error;
+
+	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+	if (internal == NULL)
+		goto error;
+
+	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+	if (eth_addr == NULL)
+		goto error;
+	*eth_addr = base_eth_addr;
+	eth_addr->addr_bytes[5] = index;
+
+	/* reserve an ethdev entry */
+	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+	if (eth_dev == NULL)
+		goto error;
+
+	/* now put it all together
+	 * - store queue data in internal,
+	 * - store numa_node info in ethdev data
+	 * - point eth_dev_data to internals
+	 * - and point eth_dev structure to new eth_dev_data structure
+	 */
+	internal->nb_rx_queues = queues;
+	internal->nb_tx_queues = queues;
+	internal->dev_name = strdup(name);
+	if (internal->dev_name == NULL)
+		goto error;
+	internal->iface_name = strdup(iface_name);
+	if (internal->iface_name == NULL) {
+		free(internal->dev_name);
+		goto error;
+	}
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internals_list, internal, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	data->dev_private = internal;
+	data->port_id = eth_dev->data->port_id;
+	memmove(data->name, eth_dev->data->name, sizeof(data->name));
+	data->nb_rx_queues = queues;
+	data->nb_tx_queues = queues;
+	data->dev_link = pmd_link;
+	data->mac_addrs = eth_addr;
+
+	/* We'll replace the 'data' originally allocated by eth_dev. So the
+	 * vhost PMD resources won't be shared between multi processes.
+	 */
+	eth_dev->data = data;
+	eth_dev->dev_ops = &ops;
+	eth_dev->driver = NULL;
+	eth_dev->data->dev_flags = RTE_ETH_DEV_DETACHABLE;
+	eth_dev->data->kdrv = RTE_KDRV_NONE;
+	eth_dev->data->drv_name = internal->dev_name;
+	eth_dev->data->numa_node = numa_node;
+
+	/* finally assign rx and tx ops */
+	eth_dev->rx_pkt_burst = eth_vhost_rx;
+	eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+	return data->port_id;
+
+error:
+	rte_free(data);
+	rte_free(internal);
+	rte_free(eth_addr);
+
+	return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	const char **iface_name = extra_args;
+
+	if (value == NULL)
+		return -1;
+
+	*iface_name = value;
+
+	return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *q = extra_args;
+
+	if ((value == NULL) || (extra_args == NULL))
+		return -EINVAL;
+
+	*q = (uint16_t)strtoul(value, NULL, 0);
+	if ((*q == USHRT_MAX) && (errno == ERANGE))
+		return -1;
+
+	if (*q > RTE_MAX_QUEUES_PER_PORT)
+		return -1;
+
+	return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
+	int index;
+	char *iface_name;
+	uint16_t queues;
+
+	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+	if (strlen(name) < strlen("eth_vhost"))
+		return -1;
+
+	index = strtol(name + strlen("eth_vhost"), NULL, 0);
+	if (errno == ERANGE)
+		return -1;
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+		return -1;
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+					 &open_iface, &iface_name);
+		if (ret < 0)
+			goto out_free;
+	} else {
+		ret = -1;
+		goto out_free;
+	}
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+					 &open_queues, &queues);
+		if (ret < 0)
+			goto out_free;
+
+	} else
+		queues = 1;
+
+	eth_dev_vhost_create(name, index,
+			iface_name, queues, rte_socket_id());
+
+out_free:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internal *internal;
+	unsigned int i;
+
+	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+	if (name == NULL)
+		return -EINVAL;
+
+	/* find an ethdev entry */
+	eth_dev = rte_eth_dev_allocated(name);
+	if (eth_dev == NULL)
+		return -ENODEV;
+
+	internal = eth_dev->data->dev_private;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_REMOVE(&internals_list, internal, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	eth_dev_stop(eth_dev);
+
+	if ((internal) && (internal->dev_name))
+		free(internal->dev_name);
+	if ((internal) && (internal->iface_name))
+		free(internal->iface_name);
+
+	rte_free(eth_dev->data->mac_addrs);
+	rte_free(eth_dev->data);
+
+	for (i = 0; i < internal->nb_rx_queues; i++)
+		rte_free(internal->rx_vhost_queues[i]);
+	for (i = 0; i < internal->nb_tx_queues; i++)
+		rte_free(internal->tx_vhost_queues[i]);
+	rte_free(internal);
+
+	rte_eth_dev_release_port(eth_dev);
+
+	return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+	.name = "eth_vhost",
+	.type = PMD_VDEV,
+	.init = rte_pmd_vhost_devinit,
+	.uninit = rte_pmd_vhost_devuninit,
+};
+
+struct
+virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
+{
+	struct rte_eth_dev *eth_dev;
+
+	if (rte_eth_dev_is_valid_port(port_id) == 0)
+		return NULL;
+
+	eth_dev = &rte_eth_devices[port_id];
+	if (strncmp("eth_vhost", eth_dev->data->drv_name,
+				strlen("eth_vhost")) == 0) {
+		struct pmd_internal *internal;
+		struct vhost_queue *vq;
+
+		internal = eth_dev->data->dev_private;
+		vq = internal->rx_vhost_queues[0];
+		if ((vq != NULL) && (vq->device != NULL))
+			return vq->device;
+	}
+
+	return NULL;
+}
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..22a880f
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,65 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co., Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_AF_PACKET_H_
+#define _RTE_ETH_AF_PACKET_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_virtio_net.h>
+
+/**
+ * The function convert specified port_id to virtio device structure.
+ * The retured device can be used for vhost library APIs.
+ * To use vhost library APIs and vhost PMD parallely, below API should
+ * not be called, because the API will be called by vhost PMD.
+ * - rte_vhost_driver_session_start()
+ * Once a device is managed by vhost PMD, below API should not be called.
+ * - rte_vhost_driver_unregister()
+ * To unregister the device, call Port Hotplug APIs.
+ *
+ * @param port_id
+ *  port number
+ * @return
+ *  virtio net device structure corresponding to the specified port
+ *  NULL will be returned in error cases.
+ */
+struct virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..bf0361a
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,8 @@
+DPDK_2.2 {
+
+	global:
+
+	rte_eth_vhost_portid2vdev;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 724efa7..1af4bb3 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -148,7 +148,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP)       += -lrte_pmd_pcap
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL)       += -lrte_pmd_null
 
-endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
+
+endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
+
+endif # $(CONFIG_RTE_BUILD_SHARED_LIB)
 
 endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 0/2] Add VHOST PMD
  2015-11-13  5:20                   ` [PATCH v4 0/2] " Tetsuya Mukawa
  2015-11-13  5:20                     ` [PATCH v4 1/2] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
  2015-11-13  5:20                     ` [PATCH v4 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
@ 2015-11-13  5:32                     ` Yuanhan Liu
  2015-11-13  5:37                       ` Tetsuya Mukawa
  2015-11-13  6:50                       ` Tetsuya Mukawa
  2 siblings, 2 replies; 200+ messages in thread
From: Yuanhan Liu @ 2015-11-13  5:32 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Fri, Nov 13, 2015 at 02:20:29PM +0900, Tetsuya Mukawa wrote:
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost.
> 
> * Known issue.
> We may see issues while handling RESET_OWNER message.
> These handlings are done in vhost library, so not a part of vhost PMD.
> So far, we are waiting for QEMU fixing.

Fix patches have already been applied. Please help test :)

	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 0/2] Add VHOST PMD
  2015-11-13  5:32                     ` [PATCH v4 0/2] " Yuanhan Liu
@ 2015-11-13  5:37                       ` Tetsuya Mukawa
  2015-11-13  6:50                       ` Tetsuya Mukawa
  1 sibling, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-13  5:37 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, ann.zhuangyanying

On 2015/11/13 14:32, Yuanhan Liu wrote:
> On Fri, Nov 13, 2015 at 02:20:29PM +0900, Tetsuya Mukawa wrote:
>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>> of librte_vhost.
>>
>> * Known issue.
>> We may see issues while handling RESET_OWNER message.
>> These handlings are done in vhost library, so not a part of vhost PMD.
>> So far, we are waiting for QEMU fixing.
> Fix patches have already been applied. Please help test :)
>
> 	--yliu

Thanks!
I have checked it, and it worked!

Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 0/2] Add VHOST PMD
  2015-11-13  5:32                     ` [PATCH v4 0/2] " Yuanhan Liu
  2015-11-13  5:37                       ` Tetsuya Mukawa
@ 2015-11-13  6:50                       ` Tetsuya Mukawa
  2015-11-17 13:26                         ` Yuanhan Liu
  1 sibling, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-13  6:50 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, ann.zhuangyanying

On 2015/11/13 14:32, Yuanhan Liu wrote:
> On Fri, Nov 13, 2015 at 02:20:29PM +0900, Tetsuya Mukawa wrote:
>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>> of librte_vhost.
>>
>> * Known issue.
>> We may see issues while handling RESET_OWNER message.
>> These handlings are done in vhost library, so not a part of vhost PMD.
>> So far, we are waiting for QEMU fixing.
> Fix patches have already been applied. Please help test :)
>
> 	--yliu

Hi Yuanhan,

It seems there might be an another issue related with "vq->callfd" in
vhost library.
We may miss something to handle the value correctly.

Anyway, here are steps.
1. Apply vhost PMD patch.
(I guess you don't need it to reproduce the issue, but to reproduce it,
using the PMD may be easy)
2. Start testpmd on host with vhost-user PMD.
3. Start QEMU with virtio-net device.
4. Login QEMU.
5. Bind the virtio-net device to igb_uio.
6. Start testpmd in QEMU.
7. Quit testmd in QEMU.
8. Start testpmd again in QEMU.

It seems when last command is executed, testpmd on host doesn't receive
SET_VRING_CALL message from QEMU.
Because of this, testpmd on host assumes virtio-net device is not ready.
(I made sure virtio_is_ready() was failed on host).

According to QEMU source code, SET_VRING_KICK will be called when
virtqueue starts, but SET_VRING_CALL will be called when virtqueue is
initialized.
Not sure exactly, might be "vq->call" will be valid while connection is
established?

Also I've found a workaround.
Please execute after step7.

8. Bind the virtio-net device to virtio-pci kernel driver.
9. Bind the virtio-net device to igb_uio.
10. Start testpmd in QEMU.

When step8 is executed, connection will be re-established, and testpmd
on host will be able to receive SET_VRING_CALL.
Then testpmd on host can start.

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 2/2] vhost: Add VHOST PMD
  2015-11-13  5:20                     ` [PATCH v4 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
@ 2015-11-16  1:57                       ` Wang, Zhihong
  2015-11-20 11:43                       ` Yuanhan Liu
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ messages in thread
From: Wang, Zhihong @ 2015-11-16  1:57 UTC (permalink / raw)
  To: Tetsuya Mukawa, dev, Liu, Yuanhan; +Cc: ann.zhuangyanying

A quick glimpse and the bug is gone now :)
Will have more test later on.

> -----Original Message-----
> From: Tetsuya Mukawa [mailto:mukawa@igel.co.jp]
> Sent: Friday, November 13, 2015 1:21 PM
> To: dev@dpdk.org; Wang, Zhihong <zhihong.wang@intel.com>; Liu, Yuanhan
> <yuanhan.liu@intel.com>
> Cc: Loftus, Ciara <ciara.loftus@intel.com>; pmatilai@redhat.com;
> ann.zhuangyanying@huawei.com; Richardson, Bruce
> <bruce.richardson@intel.com>; Xie, Huawei <huawei.xie@intel.com>;
> thomas.monjalon@6wind.com; stephen@networkplumber.org;
> rich.lane@bigswitch.com; Tetsuya Mukawa <mukawa@igel.co.jp>
> Subject: [PATCH v4 2/2] vhost: Add VHOST PMD
> 
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> The vhost messages will be handled only when a port is started. So start
> a port first, then invoke QEMU.
> 
> The PMD has 2 parameters.
>  - iface:  The parameter is used to specify a path to connect to a
>            virtio-net device.
>  - queues: The parameter is used to specify the number of the queues
>            virtio-net device has.
>            (Default: 1)
> 
> Here is an example.
> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
> 
> To connect above testpmd, here is qemu command example.
> 
> $ qemu-system-x86_64 \
>         <snip>
>         -chardev socket,id=chr0,path=/tmp/sock0 \
>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
>         -device virtio-net-pci,netdev=net0
> 
> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> ---

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 0/2] Add VHOST PMD
  2015-11-13  6:50                       ` Tetsuya Mukawa
@ 2015-11-17 13:26                         ` Yuanhan Liu
  2015-11-19  1:20                           ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-11-17 13:26 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Fri, Nov 13, 2015 at 03:50:16PM +0900, Tetsuya Mukawa wrote:
> On 2015/11/13 14:32, Yuanhan Liu wrote:
> > On Fri, Nov 13, 2015 at 02:20:29PM +0900, Tetsuya Mukawa wrote:
> >> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> >> of librte_vhost.
> >>
> >> * Known issue.
> >> We may see issues while handling RESET_OWNER message.
> >> These handlings are done in vhost library, so not a part of vhost PMD.
> >> So far, we are waiting for QEMU fixing.
> > Fix patches have already been applied. Please help test :)
> >
> > 	--yliu
> 
> Hi Yuanhan,
> 
> It seems there might be an another issue related with "vq->callfd" in
> vhost library.
> We may miss something to handle the value correctly.
> 
> Anyway, here are steps.
> 1. Apply vhost PMD patch.
> (I guess you don't need it to reproduce the issue, but to reproduce it,
> using the PMD may be easy)
> 2. Start testpmd on host with vhost-user PMD.
> 3. Start QEMU with virtio-net device.
> 4. Login QEMU.
> 5. Bind the virtio-net device to igb_uio.
> 6. Start testpmd in QEMU.
> 7. Quit testmd in QEMU.
> 8. Start testpmd again in QEMU.
> 
> It seems when last command is executed, testpmd on host doesn't receive
> SET_VRING_CALL message from QEMU.
> Because of this, testpmd on host assumes virtio-net device is not ready.
> (I made sure virtio_is_ready() was failed on host).
> 
> According to QEMU source code, SET_VRING_KICK will be called when
> virtqueue starts, but SET_VRING_CALL will be called when virtqueue is
> initialized.
> Not sure exactly, might be "vq->call" will be valid while connection is
> established?

Yes, it would be valid as far as we don't reset it from another
set_vring_call. So, we should not reset it on reset_device().

	--yliu
> 
> Also I've found a workaround.
> Please execute after step7.
> 
> 8. Bind the virtio-net device to virtio-pci kernel driver.
> 9. Bind the virtio-net device to igb_uio.
> 10. Start testpmd in QEMU.
> 
> When step8 is executed, connection will be re-established, and testpmd
> on host will be able to receive SET_VRING_CALL.
> Then testpmd on host can start.
> 
> Thanks,
> Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 1/2] vhost: Add callback and private data for vhost PMD
  2015-11-13  5:20                     ` [PATCH v4 1/2] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
@ 2015-11-17 13:29                       ` Yuanhan Liu
  2015-11-19  2:03                         ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-11-17 13:29 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Fri, Nov 13, 2015 at 02:20:30PM +0900, Tetsuya Mukawa wrote:
> These variables are needed to be able to manage one of virtio devices
> using both vhost library APIs and vhost PMD.
> For example, if vhost PMD uses current callback handler and private data
> provided by vhost library, A DPDK application that links vhost library
> cannot use some of vhost library APIs.

Can you be more specific about this?

	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 0/2] Add VHOST PMD
  2015-11-17 13:26                         ` Yuanhan Liu
@ 2015-11-19  1:20                           ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-19  1:20 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, ann.zhuangyanying

On 2015/11/17 22:26, Yuanhan Liu wrote:
> On Fri, Nov 13, 2015 at 03:50:16PM +0900, Tetsuya Mukawa wrote:
>> On 2015/11/13 14:32, Yuanhan Liu wrote:
>>> On Fri, Nov 13, 2015 at 02:20:29PM +0900, Tetsuya Mukawa wrote:
>>>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>>>> of librte_vhost.
>>>>
>>>> * Known issue.
>>>> We may see issues while handling RESET_OWNER message.
>>>> These handlings are done in vhost library, so not a part of vhost PMD.
>>>> So far, we are waiting for QEMU fixing.
>>> Fix patches have already been applied. Please help test :)
>>>
>>> 	--yliu
>> Hi Yuanhan,
>>
>> It seems there might be an another issue related with "vq->callfd" in
>> vhost library.
>> We may miss something to handle the value correctly.
>>
>> Anyway, here are steps.
>> 1. Apply vhost PMD patch.
>> (I guess you don't need it to reproduce the issue, but to reproduce it,
>> using the PMD may be easy)
>> 2. Start testpmd on host with vhost-user PMD.
>> 3. Start QEMU with virtio-net device.
>> 4. Login QEMU.
>> 5. Bind the virtio-net device to igb_uio.
>> 6. Start testpmd in QEMU.
>> 7. Quit testmd in QEMU.
>> 8. Start testpmd again in QEMU.
>>
>> It seems when last command is executed, testpmd on host doesn't receive
>> SET_VRING_CALL message from QEMU.
>> Because of this, testpmd on host assumes virtio-net device is not ready.
>> (I made sure virtio_is_ready() was failed on host).
>>
>> According to QEMU source code, SET_VRING_KICK will be called when
>> virtqueue starts, but SET_VRING_CALL will be called when virtqueue is
>> initialized.
>> Not sure exactly, might be "vq->call" will be valid while connection is
>> established?
> Yes, it would be valid as far as we don't reset it from another
> set_vring_call. So, we should not reset it on reset_device().
>
> 	--yliu

Hi Yuanhan,

Thanks for checking.
I will submit the patch for this today.

Tetsuya

>> Also I've found a workaround.
>> Please execute after step7.
>>
>> 8. Bind the virtio-net device to virtio-pci kernel driver.
>> 9. Bind the virtio-net device to igb_uio.
>> 10. Start testpmd in QEMU.
>>
>> When step8 is executed, connection will be re-established, and testpmd
>> on host will be able to receive SET_VRING_CALL.
>> Then testpmd on host can start.
>>
>> Thanks,
>> Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 1/2] vhost: Add callback and private data for vhost PMD
  2015-11-17 13:29                       ` Yuanhan Liu
@ 2015-11-19  2:03                         ` Tetsuya Mukawa
  2015-11-19  2:18                           ` Yuanhan Liu
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-19  2:03 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, ann.zhuangyanying

On 2015/11/17 22:29, Yuanhan Liu wrote:
> On Fri, Nov 13, 2015 at 02:20:30PM +0900, Tetsuya Mukawa wrote:
>> These variables are needed to be able to manage one of virtio devices
>> using both vhost library APIs and vhost PMD.
>> For example, if vhost PMD uses current callback handler and private data
>> provided by vhost library, A DPDK application that links vhost library
>> cannot use some of vhost library APIs.
> Can you be more specific about this?
>
> 	--yliu

How about like below?

commit log:
Currently, when virtio device is created and destroyed, vhost library
will call one of callback handlers.
The vhost PMD need to use this pair of callback handlers to know which
virtio devices are connected actually.
Because we can register only one pair of callbacks to vhost library, if
the PMD use it, DPDK applications
cannot have a way to know the events. This may break legacy DPDK
applications that uses vhost library.
To prevent it, this patch adds one more pair of callbacks to vhost
library especially for the vhost PMD.
With the patch, legacy applications can use the vhost PMD even if they
need additional specific handling
for virtio device creation and destruction.
For example, legacy application can call
rte_vhost_enable_guest_notification() in callbacks to change setting.

Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 1/2] vhost: Add callback and private data for vhost PMD
  2015-11-19  2:03                         ` Tetsuya Mukawa
@ 2015-11-19  2:18                           ` Yuanhan Liu
  2015-11-19  3:13                             ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-11-19  2:18 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Thu, Nov 19, 2015 at 11:03:50AM +0900, Tetsuya Mukawa wrote:
> On 2015/11/17 22:29, Yuanhan Liu wrote:
> > On Fri, Nov 13, 2015 at 02:20:30PM +0900, Tetsuya Mukawa wrote:
> >> These variables are needed to be able to manage one of virtio devices
> >> using both vhost library APIs and vhost PMD.
> >> For example, if vhost PMD uses current callback handler and private data
> >> provided by vhost library, A DPDK application that links vhost library
> >> cannot use some of vhost library APIs.
> > Can you be more specific about this?
> >
> > 	--yliu
> 
> How about like below?
> 
> commit log:
> Currently, when virtio device is created and destroyed, vhost library
> will call one of callback handlers.
> The vhost PMD need to use this pair of callback handlers to know which
> virtio devices are connected actually.
> Because we can register only one pair of callbacks to vhost library, if
> the PMD use it, DPDK applications
> cannot have a way to know the events.

Will (and why) the two co-exist at same time?

	--yliu

> This may break legacy DPDK
> applications that uses vhost library.
> To prevent it, this patch adds one more pair of callbacks to vhost
> library especially for the vhost PMD.
> With the patch, legacy applications can use the vhost PMD even if they
> need additional specific handling
> for virtio device creation and destruction.
> For example, legacy application can call
> rte_vhost_enable_guest_notification() in callbacks to change setting.
> 
> Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 1/2] vhost: Add callback and private data for vhost PMD
  2015-11-19  2:18                           ` Yuanhan Liu
@ 2015-11-19  3:13                             ` Tetsuya Mukawa
  2015-11-19  3:33                               ` Yuanhan Liu
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-19  3:13 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, ann.zhuangyanying

On 2015/11/19 11:18, Yuanhan Liu wrote:
> On Thu, Nov 19, 2015 at 11:03:50AM +0900, Tetsuya Mukawa wrote:
>> On 2015/11/17 22:29, Yuanhan Liu wrote:
>>> On Fri, Nov 13, 2015 at 02:20:30PM +0900, Tetsuya Mukawa wrote:
>>>> These variables are needed to be able to manage one of virtio devices
>>>> using both vhost library APIs and vhost PMD.
>>>> For example, if vhost PMD uses current callback handler and private data
>>>> provided by vhost library, A DPDK application that links vhost library
>>>> cannot use some of vhost library APIs.
>>> Can you be more specific about this?
>>>
>>> 	--yliu
>> How about like below?
>>
>> commit log:
>> Currently, when virtio device is created and destroyed, vhost library
>> will call one of callback handlers.
>> The vhost PMD need to use this pair of callback handlers to know which
>> virtio devices are connected actually.
>> Because we can register only one pair of callbacks to vhost library, if
>> the PMD use it, DPDK applications
>> cannot have a way to know the events.
> Will (and why) the two co-exist at same time?

Yes it is. Sure, I will describe below in commit log.

Because we cannot map some of vhost library APIs to ethdev APIs, in some
cases, we still
need to use vhost library APIs for a port created by the vhost PMD. One
of example is
rte_vhost_enable_guest_notification().

Thanks,
Tetsuya


>
> 	--yliu
>
>> This may break legacy DPDK
>> applications that uses vhost library.
>> To prevent it, this patch adds one more pair of callbacks to vhost
>> library especially for the vhost PMD.
>> With the patch, legacy applications can use the vhost PMD even if they
>> need additional specific handling
>> for virtio device creation and destruction.
>> For example, legacy application can call
>> rte_vhost_enable_guest_notification() in callbacks to change setting.
>>
>> Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 1/2] vhost: Add callback and private data for vhost PMD
  2015-11-19  3:13                             ` Tetsuya Mukawa
@ 2015-11-19  3:33                               ` Yuanhan Liu
  2015-11-19  5:14                                 ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-11-19  3:33 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Thu, Nov 19, 2015 at 12:13:38PM +0900, Tetsuya Mukawa wrote:
> On 2015/11/19 11:18, Yuanhan Liu wrote:
> > On Thu, Nov 19, 2015 at 11:03:50AM +0900, Tetsuya Mukawa wrote:
> >> On 2015/11/17 22:29, Yuanhan Liu wrote:
> >>> On Fri, Nov 13, 2015 at 02:20:30PM +0900, Tetsuya Mukawa wrote:
> >>>> These variables are needed to be able to manage one of virtio devices
> >>>> using both vhost library APIs and vhost PMD.
> >>>> For example, if vhost PMD uses current callback handler and private data
> >>>> provided by vhost library, A DPDK application that links vhost library
> >>>> cannot use some of vhost library APIs.
> >>> Can you be more specific about this?
> >>>
> >>> 	--yliu
> >> How about like below?
> >>
> >> commit log:
> >> Currently, when virtio device is created and destroyed, vhost library
> >> will call one of callback handlers.
> >> The vhost PMD need to use this pair of callback handlers to know which
> >> virtio devices are connected actually.
> >> Because we can register only one pair of callbacks to vhost library, if
> >> the PMD use it, DPDK applications
> >> cannot have a way to know the events.
> > Will (and why) the two co-exist at same time?
> 
> Yes it is. Sure, I will describe below in commit log.
> 
> Because we cannot map some of vhost library APIs to ethdev APIs, in some
> cases, we still
> need to use vhost library APIs for a port created by the vhost PMD. One
> of example is
> rte_vhost_enable_guest_notification().

I don't get it why it has something to do with a standalone PMD callback.
And if you don't call rte_vhost_enable_guest_notification() inside vhost
PMD, where else can you call that? I mean, you can't start vhost-pmd
and vhost-swithc in the mean time, right?

And, pmd callback and the old notify callback will not exist at same
time in one case, right? If so, why is that needed?

BTW, if it's a MUST, would you provide a specific example?


	--yliu
> 
> Thanks,
> Tetsuya
> 
> 
> >
> > 	--yliu
> >
> >> This may break legacy DPDK
> >> applications that uses vhost library.
> >> To prevent it, this patch adds one more pair of callbacks to vhost
> >> library especially for the vhost PMD.
> >> With the patch, legacy applications can use the vhost PMD even if they
> >> need additional specific handling
> >> for virtio device creation and destruction.
> >> For example, legacy application can call
> >> rte_vhost_enable_guest_notification() in callbacks to change setting.
> >>
> >> Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 1/2] vhost: Add callback and private data for vhost PMD
  2015-11-19  3:33                               ` Yuanhan Liu
@ 2015-11-19  5:14                                 ` Tetsuya Mukawa
  2015-11-19  5:45                                   ` Yuanhan Liu
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-19  5:14 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, ann.zhuangyanying

On 2015/11/19 12:33, Yuanhan Liu wrote:
> On Thu, Nov 19, 2015 at 12:13:38PM +0900, Tetsuya Mukawa wrote:
>> On 2015/11/19 11:18, Yuanhan Liu wrote:
>>> On Thu, Nov 19, 2015 at 11:03:50AM +0900, Tetsuya Mukawa wrote:
>>>> On 2015/11/17 22:29, Yuanhan Liu wrote:
>>>>> On Fri, Nov 13, 2015 at 02:20:30PM +0900, Tetsuya Mukawa wrote:
>>>>>> These variables are needed to be able to manage one of virtio devices
>>>>>> using both vhost library APIs and vhost PMD.
>>>>>> For example, if vhost PMD uses current callback handler and private data
>>>>>> provided by vhost library, A DPDK application that links vhost library
>>>>>> cannot use some of vhost library APIs.
>>>>> Can you be more specific about this?
>>>>>
>>>>> 	--yliu
>>>> How about like below?
>>>>
>>>> commit log:
>>>> Currently, when virtio device is created and destroyed, vhost library
>>>> will call one of callback handlers.
>>>> The vhost PMD need to use this pair of callback handlers to know which
>>>> virtio devices are connected actually.
>>>> Because we can register only one pair of callbacks to vhost library, if
>>>> the PMD use it, DPDK applications
>>>> cannot have a way to know the events.
>>> Will (and why) the two co-exist at same time?
>> Yes it is. Sure, I will describe below in commit log.
>>
>> Because we cannot map some of vhost library APIs to ethdev APIs, in some
>> cases, we still
>> need to use vhost library APIs for a port created by the vhost PMD. One
>> of example is
>> rte_vhost_enable_guest_notification().
> I don't get it why it has something to do with a standalone PMD callback.
> And if you don't call rte_vhost_enable_guest_notification() inside vhost
> PMD, where else can you call that? I mean, you can't start vhost-pmd
> and vhost-swithc in the mean time, right?

No it's not true, even after connecting to virtio-net device, you can
change the flag.
It's just a hint for virtio-net driver, and it will be used while queuing.
(We may be able to change the flag, even while sending or receiving packets)

>
> And, pmd callback and the old notify callback will not exist at same
> time in one case, right? If so, why is that needed?
>
> BTW, if it's a MUST, would you provide a specific example?

Actually, this patch is not a MUST.

But still the users need callback handlers to know when virtio-net
device is connected or disconnected.
This is because the user can call rte_vhost_enable_guest_notification()
only while connection is established.

Probably we can use link status changed callback of the PMD for this
purpose.
(The vhost PMD will notice DPDK application using link status callback)

But I am not sure whether we need to implement link status changed
callback for this purpose.
While processing this callback handler, the users will only calls vhost
library APIs that ethdev API cannot map, or store some variables related
with vhost library.
If so, this callback handler itself is specific for using vhost library.
And it may be ok that callback itself is implemented as one of vhost
library APIs.

Tetsuya

>
> 	--yliu
>> Thanks,
>> Tetsuya
>>
>>
>>> 	--yliu
>>>
>>>> This may break legacy DPDK
>>>> applications that uses vhost library.
>>>> To prevent it, this patch adds one more pair of callbacks to vhost
>>>> library especially for the vhost PMD.
>>>> With the patch, legacy applications can use the vhost PMD even if they
>>>> need additional specific handling
>>>> for virtio device creation and destruction.
>>>> For example, legacy application can call
>>>> rte_vhost_enable_guest_notification() in callbacks to change setting.
>>>>
>>>> Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 1/2] vhost: Add callback and private data for vhost PMD
  2015-11-19  5:14                                 ` Tetsuya Mukawa
@ 2015-11-19  5:45                                   ` Yuanhan Liu
  2015-11-19  5:58                                     ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-11-19  5:45 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Thu, Nov 19, 2015 at 02:14:13PM +0900, Tetsuya Mukawa wrote:
> On 2015/11/19 12:33, Yuanhan Liu wrote:
> > On Thu, Nov 19, 2015 at 12:13:38PM +0900, Tetsuya Mukawa wrote:
> >> On 2015/11/19 11:18, Yuanhan Liu wrote:
> >>> On Thu, Nov 19, 2015 at 11:03:50AM +0900, Tetsuya Mukawa wrote:
> >>>> On 2015/11/17 22:29, Yuanhan Liu wrote:
> >>>>> On Fri, Nov 13, 2015 at 02:20:30PM +0900, Tetsuya Mukawa wrote:
> >>>>>> These variables are needed to be able to manage one of virtio devices
> >>>>>> using both vhost library APIs and vhost PMD.
> >>>>>> For example, if vhost PMD uses current callback handler and private data
> >>>>>> provided by vhost library, A DPDK application that links vhost library
> >>>>>> cannot use some of vhost library APIs.
> >>>>> Can you be more specific about this?
> >>>>>
> >>>>> 	--yliu
> >>>> How about like below?
> >>>>
> >>>> commit log:
> >>>> Currently, when virtio device is created and destroyed, vhost library
> >>>> will call one of callback handlers.
> >>>> The vhost PMD need to use this pair of callback handlers to know which
> >>>> virtio devices are connected actually.
> >>>> Because we can register only one pair of callbacks to vhost library, if
> >>>> the PMD use it, DPDK applications
> >>>> cannot have a way to know the events.
> >>> Will (and why) the two co-exist at same time?
> >> Yes it is. Sure, I will describe below in commit log.
> >>
> >> Because we cannot map some of vhost library APIs to ethdev APIs, in some
> >> cases, we still
> >> need to use vhost library APIs for a port created by the vhost PMD. One
> >> of example is
> >> rte_vhost_enable_guest_notification().
> > I don't get it why it has something to do with a standalone PMD callback.
> > And if you don't call rte_vhost_enable_guest_notification() inside vhost
> > PMD, where else can you call that? I mean, you can't start vhost-pmd
> > and vhost-swithc in the mean time, right?
> 
> No it's not true, even after connecting to virtio-net device, you can
> change the flag.
> It's just a hint for virtio-net driver, and it will be used while queuing.
> (We may be able to change the flag, even while sending or receiving packets)
> 
> >
> > And, pmd callback and the old notify callback will not exist at same
> > time in one case, right? If so, why is that needed?
> >
> > BTW, if it's a MUST, would you provide a specific example?
> 
> Actually, this patch is not a MUST.
> 
> But still the users need callback handlers to know when virtio-net
> device is connected or disconnected.
> This is because the user can call rte_vhost_enable_guest_notification()
> only while connection is established.

What does "the user" mean? Is there a second user of vhost lib besides
vhost PMD, that he has to interact with those connected devices? If so,
how?

	--yliu
> 
> Probably we can use link status changed callback of the PMD for this
> purpose.
> (The vhost PMD will notice DPDK application using link status callback)
> 
> But I am not sure whether we need to implement link status changed
> callback for this purpose.
> While processing this callback handler, the users will only calls vhost
> library APIs that ethdev API cannot map, or store some variables related
> with vhost library.
> If so, this callback handler itself is specific for using vhost library.
> And it may be ok that callback itself is implemented as one of vhost
> library APIs.
> 
> Tetsuya
> 
> >
> > 	--yliu
> >> Thanks,
> >> Tetsuya
> >>
> >>
> >>> 	--yliu
> >>>
> >>>> This may break legacy DPDK
> >>>> applications that uses vhost library.
> >>>> To prevent it, this patch adds one more pair of callbacks to vhost
> >>>> library especially for the vhost PMD.
> >>>> With the patch, legacy applications can use the vhost PMD even if they
> >>>> need additional specific handling
> >>>> for virtio device creation and destruction.
> >>>> For example, legacy application can call
> >>>> rte_vhost_enable_guest_notification() in callbacks to change setting.
> >>>>
> >>>> Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 1/2] vhost: Add callback and private data for vhost PMD
  2015-11-19  5:45                                   ` Yuanhan Liu
@ 2015-11-19  5:58                                     ` Tetsuya Mukawa
  2015-11-19  6:31                                       ` Yuanhan Liu
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-19  5:58 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, ann.zhuangyanying

On 2015/11/19 14:45, Yuanhan Liu wrote:
> On Thu, Nov 19, 2015 at 02:14:13PM +0900, Tetsuya Mukawa wrote:
>> On 2015/11/19 12:33, Yuanhan Liu wrote:
>>> On Thu, Nov 19, 2015 at 12:13:38PM +0900, Tetsuya Mukawa wrote:
>>>> On 2015/11/19 11:18, Yuanhan Liu wrote:
>>>>> On Thu, Nov 19, 2015 at 11:03:50AM +0900, Tetsuya Mukawa wrote:
>>>>>> On 2015/11/17 22:29, Yuanhan Liu wrote:
>>>>>>> On Fri, Nov 13, 2015 at 02:20:30PM +0900, Tetsuya Mukawa wrote:
>>>>>>>> These variables are needed to be able to manage one of virtio devices
>>>>>>>> using both vhost library APIs and vhost PMD.
>>>>>>>> For example, if vhost PMD uses current callback handler and private data
>>>>>>>> provided by vhost library, A DPDK application that links vhost library
>>>>>>>> cannot use some of vhost library APIs.
>>>>>>> Can you be more specific about this?
>>>>>>>
>>>>>>> 	--yliu
>>>>>> How about like below?
>>>>>>
>>>>>> commit log:
>>>>>> Currently, when virtio device is created and destroyed, vhost library
>>>>>> will call one of callback handlers.
>>>>>> The vhost PMD need to use this pair of callback handlers to know which
>>>>>> virtio devices are connected actually.
>>>>>> Because we can register only one pair of callbacks to vhost library, if
>>>>>> the PMD use it, DPDK applications
>>>>>> cannot have a way to know the events.
>>>>> Will (and why) the two co-exist at same time?
>>>> Yes it is. Sure, I will describe below in commit log.
>>>>
>>>> Because we cannot map some of vhost library APIs to ethdev APIs, in some
>>>> cases, we still
>>>> need to use vhost library APIs for a port created by the vhost PMD. One
>>>> of example is
>>>> rte_vhost_enable_guest_notification().
>>> I don't get it why it has something to do with a standalone PMD callback.
>>> And if you don't call rte_vhost_enable_guest_notification() inside vhost
>>> PMD, where else can you call that? I mean, you can't start vhost-pmd
>>> and vhost-swithc in the mean time, right?
>> No it's not true, even after connecting to virtio-net device, you can
>> change the flag.
>> It's just a hint for virtio-net driver, and it will be used while queuing.
>> (We may be able to change the flag, even while sending or receiving packets)
>>
>>> And, pmd callback and the old notify callback will not exist at same
>>> time in one case, right? If so, why is that needed?
>>>
>>> BTW, if it's a MUST, would you provide a specific example?
>> Actually, this patch is not a MUST.
>>
>> But still the users need callback handlers to know when virtio-net
>> device is connected or disconnected.
>> This is because the user can call rte_vhost_enable_guest_notification()
>> only while connection is established.
> What does "the user" mean? Is there a second user of vhost lib besides
> vhost PMD, that he has to interact with those connected devices? If so,
> how?

Sorry, my English is wrong.
Not a second user.

For example, If DPDK application has a port created by vhost PMD, then
needs to call rte_vhost_enable_guest_notification() to the port.
DPDK application needs to know when virtio-net device is connected or
disconnected, because the function is only valid while connecting.
But without callback handler, DPDK application cannot know it.

This is what I wanted to explain.

Thanks,
Tetsuya

> 	--yliu
>> Probably we can use link status changed callback of the PMD for this
>> purpose.
>> (The vhost PMD will notice DPDK application using link status callback)
>>
>> But I am not sure whether we need to implement link status changed
>> callback for this purpose.
>> While processing this callback handler, the users will only calls vhost
>> library APIs that ethdev API cannot map, or store some variables related
>> with vhost library.
>> If so, this callback handler itself is specific for using vhost library.
>> And it may be ok that callback itself is implemented as one of vhost
>> library APIs.
>>
>> Tetsuya
>>
>>> 	--yliu
>>>> Thanks,
>>>> Tetsuya
>>>>
>>>>
>>>>> 	--yliu
>>>>>
>>>>>> This may break legacy DPDK
>>>>>> applications that uses vhost library.
>>>>>> To prevent it, this patch adds one more pair of callbacks to vhost
>>>>>> library especially for the vhost PMD.
>>>>>> With the patch, legacy applications can use the vhost PMD even if they
>>>>>> need additional specific handling
>>>>>> for virtio device creation and destruction.
>>>>>> For example, legacy application can call
>>>>>> rte_vhost_enable_guest_notification() in callbacks to change setting.
>>>>>>
>>>>>> Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 1/2] vhost: Add callback and private data for vhost PMD
  2015-11-19  5:58                                     ` Tetsuya Mukawa
@ 2015-11-19  6:31                                       ` Yuanhan Liu
  2015-11-19  6:37                                         ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-11-19  6:31 UTC (permalink / raw)
  To: Tetsuya Mukawa, Xie, Huawei; +Cc: dev, ann.zhuangyanying

On Thu, Nov 19, 2015 at 02:58:56PM +0900, Tetsuya Mukawa wrote:
> On 2015/11/19 14:45, Yuanhan Liu wrote:
> > On Thu, Nov 19, 2015 at 02:14:13PM +0900, Tetsuya Mukawa wrote:
> >> On 2015/11/19 12:33, Yuanhan Liu wrote:
> >>> On Thu, Nov 19, 2015 at 12:13:38PM +0900, Tetsuya Mukawa wrote:
> >>>> On 2015/11/19 11:18, Yuanhan Liu wrote:
> >>>>> On Thu, Nov 19, 2015 at 11:03:50AM +0900, Tetsuya Mukawa wrote:
> >>>>>> On 2015/11/17 22:29, Yuanhan Liu wrote:
> >>>>>>> On Fri, Nov 13, 2015 at 02:20:30PM +0900, Tetsuya Mukawa wrote:
> >>>>>>>> These variables are needed to be able to manage one of virtio devices
> >>>>>>>> using both vhost library APIs and vhost PMD.
> >>>>>>>> For example, if vhost PMD uses current callback handler and private data
> >>>>>>>> provided by vhost library, A DPDK application that links vhost library
> >>>>>>>> cannot use some of vhost library APIs.
> >>>>>>> Can you be more specific about this?
> >>>>>>>
> >>>>>>> 	--yliu
> >>>>>> How about like below?
> >>>>>>
> >>>>>> commit log:
> >>>>>> Currently, when virtio device is created and destroyed, vhost library
> >>>>>> will call one of callback handlers.
> >>>>>> The vhost PMD need to use this pair of callback handlers to know which
> >>>>>> virtio devices are connected actually.
> >>>>>> Because we can register only one pair of callbacks to vhost library, if
> >>>>>> the PMD use it, DPDK applications
> >>>>>> cannot have a way to know the events.
> >>>>> Will (and why) the two co-exist at same time?
> >>>> Yes it is. Sure, I will describe below in commit log.
> >>>>
> >>>> Because we cannot map some of vhost library APIs to ethdev APIs, in some
> >>>> cases, we still
> >>>> need to use vhost library APIs for a port created by the vhost PMD. One
> >>>> of example is
> >>>> rte_vhost_enable_guest_notification().
> >>> I don't get it why it has something to do with a standalone PMD callback.
> >>> And if you don't call rte_vhost_enable_guest_notification() inside vhost
> >>> PMD, where else can you call that? I mean, you can't start vhost-pmd
> >>> and vhost-swithc in the mean time, right?
> >> No it's not true, even after connecting to virtio-net device, you can
> >> change the flag.
> >> It's just a hint for virtio-net driver, and it will be used while queuing.
> >> (We may be able to change the flag, even while sending or receiving packets)
> >>
> >>> And, pmd callback and the old notify callback will not exist at same
> >>> time in one case, right? If so, why is that needed?
> >>>
> >>> BTW, if it's a MUST, would you provide a specific example?
> >> Actually, this patch is not a MUST.
> >>
> >> But still the users need callback handlers to know when virtio-net
> >> device is connected or disconnected.
> >> This is because the user can call rte_vhost_enable_guest_notification()
> >> only while connection is established.
> > What does "the user" mean? Is there a second user of vhost lib besides
> > vhost PMD, that he has to interact with those connected devices? If so,
> > how?
> 
> Sorry, my English is wrong.
> Not a second user.
> 
> For example, If DPDK application has a port created by vhost PMD, then
> needs to call rte_vhost_enable_guest_notification() to the port.

So, you are mixing the usage of vhost PMD and vhost lib in a DPDK
application? Say,

        DPDK application
               start_vhost_pmd
                      rte_vhost_driver_pmd_callback_register
               rte_vhost_driver_callback_register

I know little about PMD, and not quite sure it's a good combo.

Huawei, comments?

	--yliu

> DPDK application needs to know when virtio-net device is connected or
> disconnected, because the function is only valid while connecting.
> But without callback handler, DPDK application cannot know it.
> 
> This is what I wanted to explain.
> 
> Thanks,
> Tetsuya
> 
> > 	--yliu
> >> Probably we can use link status changed callback of the PMD for this
> >> purpose.
> >> (The vhost PMD will notice DPDK application using link status callback)
> >>
> >> But I am not sure whether we need to implement link status changed
> >> callback for this purpose.
> >> While processing this callback handler, the users will only calls vhost
> >> library APIs that ethdev API cannot map, or store some variables related
> >> with vhost library.
> >> If so, this callback handler itself is specific for using vhost library.
> >> And it may be ok that callback itself is implemented as one of vhost
> >> library APIs.
> >>
> >> Tetsuya
> >>
> >>> 	--yliu
> >>>> Thanks,
> >>>> Tetsuya
> >>>>
> >>>>
> >>>>> 	--yliu
> >>>>>
> >>>>>> This may break legacy DPDK
> >>>>>> applications that uses vhost library.
> >>>>>> To prevent it, this patch adds one more pair of callbacks to vhost
> >>>>>> library especially for the vhost PMD.
> >>>>>> With the patch, legacy applications can use the vhost PMD even if they
> >>>>>> need additional specific handling
> >>>>>> for virtio device creation and destruction.
> >>>>>> For example, legacy application can call
> >>>>>> rte_vhost_enable_guest_notification() in callbacks to change setting.
> >>>>>>
> >>>>>> Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 1/2] vhost: Add callback and private data for vhost PMD
  2015-11-19  6:31                                       ` Yuanhan Liu
@ 2015-11-19  6:37                                         ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-19  6:37 UTC (permalink / raw)
  To: Yuanhan Liu, Xie, Huawei; +Cc: dev, ann.zhuangyanying

On 2015/11/19 15:31, Yuanhan Liu wrote:
> On Thu, Nov 19, 2015 at 02:58:56PM +0900, Tetsuya Mukawa wrote:
>> On 2015/11/19 14:45, Yuanhan Liu wrote:
>>> On Thu, Nov 19, 2015 at 02:14:13PM +0900, Tetsuya Mukawa wrote:
>>>> On 2015/11/19 12:33, Yuanhan Liu wrote:
>>>>> On Thu, Nov 19, 2015 at 12:13:38PM +0900, Tetsuya Mukawa wrote:
>>>>>> On 2015/11/19 11:18, Yuanhan Liu wrote:
>>>>>>> On Thu, Nov 19, 2015 at 11:03:50AM +0900, Tetsuya Mukawa wrote:
>>>>>>>> On 2015/11/17 22:29, Yuanhan Liu wrote:
>>>>>>>>> On Fri, Nov 13, 2015 at 02:20:30PM +0900, Tetsuya Mukawa wrote:
>>>>>>>>>> These variables are needed to be able to manage one of virtio devices
>>>>>>>>>> using both vhost library APIs and vhost PMD.
>>>>>>>>>> For example, if vhost PMD uses current callback handler and private data
>>>>>>>>>> provided by vhost library, A DPDK application that links vhost library
>>>>>>>>>> cannot use some of vhost library APIs.
>>>>>>>>> Can you be more specific about this?
>>>>>>>>>
>>>>>>>>> 	--yliu
>>>>>>>> How about like below?
>>>>>>>>
>>>>>>>> commit log:
>>>>>>>> Currently, when virtio device is created and destroyed, vhost library
>>>>>>>> will call one of callback handlers.
>>>>>>>> The vhost PMD need to use this pair of callback handlers to know which
>>>>>>>> virtio devices are connected actually.
>>>>>>>> Because we can register only one pair of callbacks to vhost library, if
>>>>>>>> the PMD use it, DPDK applications
>>>>>>>> cannot have a way to know the events.
>>>>>>> Will (and why) the two co-exist at same time?
>>>>>> Yes it is. Sure, I will describe below in commit log.
>>>>>>
>>>>>> Because we cannot map some of vhost library APIs to ethdev APIs, in some
>>>>>> cases, we still
>>>>>> need to use vhost library APIs for a port created by the vhost PMD. One
>>>>>> of example is
>>>>>> rte_vhost_enable_guest_notification().
>>>>> I don't get it why it has something to do with a standalone PMD callback.
>>>>> And if you don't call rte_vhost_enable_guest_notification() inside vhost
>>>>> PMD, where else can you call that? I mean, you can't start vhost-pmd
>>>>> and vhost-swithc in the mean time, right?
>>>> No it's not true, even after connecting to virtio-net device, you can
>>>> change the flag.
>>>> It's just a hint for virtio-net driver, and it will be used while queuing.
>>>> (We may be able to change the flag, even while sending or receiving packets)
>>>>
>>>>> And, pmd callback and the old notify callback will not exist at same
>>>>> time in one case, right? If so, why is that needed?
>>>>>
>>>>> BTW, if it's a MUST, would you provide a specific example?
>>>> Actually, this patch is not a MUST.
>>>>
>>>> But still the users need callback handlers to know when virtio-net
>>>> device is connected or disconnected.
>>>> This is because the user can call rte_vhost_enable_guest_notification()
>>>> only while connection is established.
>>> What does "the user" mean? Is there a second user of vhost lib besides
>>> vhost PMD, that he has to interact with those connected devices? If so,
>>> how?
>> Sorry, my English is wrong.
>> Not a second user.
>>
>> For example, If DPDK application has a port created by vhost PMD, then
>> needs to call rte_vhost_enable_guest_notification() to the port.
> So, you are mixing the usage of vhost PMD and vhost lib in a DPDK
> application? Say,

Yes, that is my intention.
Using ethdev(PMD) APIs and some library specific APIs to a same port is
used in bonding PMD also.

Thanks,
Tetsuya

>         DPDK application
>                start_vhost_pmd
>                       rte_vhost_driver_pmd_callback_register
>                rte_vhost_driver_callback_register
>
> I know little about PMD, and not quite sure it's a good combo.
>
> Huawei, comments?
>
> 	--yliu
>
>> DPDK application needs to know when virtio-net device is connected or
>> disconnected, because the function is only valid while connecting.
>> But without callback handler, DPDK application cannot know it.
>>
>> This is what I wanted to explain.
>>
>> Thanks,
>> Tetsuya
>>
>>> 	--yliu
>>>> Probably we can use link status changed callback of the PMD for this
>>>> purpose.
>>>> (The vhost PMD will notice DPDK application using link status callback)
>>>>
>>>> But I am not sure whether we need to implement link status changed
>>>> callback for this purpose.
>>>> While processing this callback handler, the users will only calls vhost
>>>> library APIs that ethdev API cannot map, or store some variables related
>>>> with vhost library.
>>>> If so, this callback handler itself is specific for using vhost library.
>>>> And it may be ok that callback itself is implemented as one of vhost
>>>> library APIs.
>>>>
>>>> Tetsuya
>>>>
>>>>> 	--yliu
>>>>>> Thanks,
>>>>>> Tetsuya
>>>>>>
>>>>>>
>>>>>>> 	--yliu
>>>>>>>
>>>>>>>> This may break legacy DPDK
>>>>>>>> applications that uses vhost library.
>>>>>>>> To prevent it, this patch adds one more pair of callbacks to vhost
>>>>>>>> library especially for the vhost PMD.
>>>>>>>> With the patch, legacy applications can use the vhost PMD even if they
>>>>>>>> need additional specific handling
>>>>>>>> for virtio device creation and destruction.
>>>>>>>> For example, legacy application can call
>>>>>>>> rte_vhost_enable_guest_notification() in callbacks to change setting.
>>>>>>>>
>>>>>>>> Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 2/2] vhost: Add VHOST PMD
  2015-11-13  5:20                     ` [PATCH v4 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  2015-11-16  1:57                       ` Wang, Zhihong
@ 2015-11-20 11:43                       ` Yuanhan Liu
  2015-11-24  2:48                         ` Tetsuya Mukawa
  2015-11-21  0:15                       ` Rich Lane
  2015-11-24  9:00                       ` [PATCH v5 0/3] " Tetsuya Mukawa
  3 siblings, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-11-20 11:43 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Fri, Nov 13, 2015 at 02:20:31PM +0900, Tetsuya Mukawa wrote:
....
> +static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
> +
> +static rte_atomic16_t nb_started_ports;
> +pthread_t session_th;

static?

> +
> +static struct rte_eth_link pmd_link = {
> +		.link_speed = 10000,
> +		.link_duplex = ETH_LINK_FULL_DUPLEX,
> +		.link_status = 0
> +};
> +
> +static uint16_t
> +eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
> +	struct vhost_queue *r = q;
> +	uint16_t nb_rx = 0;
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		return 0;
> +
> +	rte_atomic32_set(&r->while_queuing, 1);
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		goto out;
> +
> +	/* Dequeue packets from guest TX queue */
> +	nb_rx = rte_vhost_dequeue_burst(r->device,
> +			r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
> +
> +	r->rx_pkts += nb_rx;
> +
> +out:
> +	rte_atomic32_set(&r->while_queuing, 0);
> +
> +	return nb_rx;
> +}
> +
> +static uint16_t
> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
> +	struct vhost_queue *r = q;
> +	uint16_t i, nb_tx = 0;
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		return 0;
> +
> +	rte_atomic32_set(&r->while_queuing, 1);
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		goto out;
> +
> +	/* Enqueue packets to guest RX queue */
> +	nb_tx = rte_vhost_enqueue_burst(r->device,
> +			r->virtqueue_id, bufs, nb_bufs);
> +
> +	r->tx_pkts += nb_tx;
> +	r->err_pkts += nb_bufs - nb_tx;
> +
> +	for (i = 0; likely(i < nb_tx); i++)
> +		rte_pktmbuf_free(bufs[i]);

We should free upto nb_bufs here, but not nb_tx, right?

> +
> +out:
> +	rte_atomic32_set(&r->while_queuing, 0);
> +
> +	return nb_tx;
> +}
> +
> +static int
> +eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
> +{
> +	return 0;
> +}
> +
> +static inline struct pmd_internal *
> +find_internal_resource(char *ifname)
> +{
> +	int found = 0;
> +	struct pmd_internal *internal;
> +
> +	if (ifname == NULL)
> +		return NULL;
> +
> +	pthread_mutex_lock(&internal_list_lock);
> +
> +	TAILQ_FOREACH(internal, &internals_list, next) {
> +		if (!strcmp(internal->iface_name, ifname)) {
> +			found = 1;
> +			break;
> +		}
> +	}
> +
> +	pthread_mutex_unlock(&internal_list_lock);
> +
> +	if (!found)
> +		return NULL;
> +
> +	return internal;
> +}
> +
...
> +static void *vhost_driver_session(void *param __rte_unused)

static void *
vhost_driver_session_start(..)

> +{
> +	static struct virtio_net_device_ops *vhost_ops;
> +
> +	vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
> +	if (vhost_ops == NULL)
> +		rte_panic("Can't allocate memory\n");

Why not making them static?

> +
> +	/* set vhost arguments */
> +	vhost_ops->new_device = new_device;
> +	vhost_ops->destroy_device = destroy_device;
> +	if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
> +		rte_panic("Can't register callbacks\n");
> +
> +	/* start event handling */
> +	rte_vhost_driver_session_start();
> +
> +	rte_free(vhost_ops);
> +	pthread_exit(0);
> +}
> +
> +static void vhost_driver_session_start(void)

ditto.


> +{
> +	int ret;
> +
> +	ret = pthread_create(&session_th,
> +			NULL, vhost_driver_session, NULL);
> +	if (ret)
> +		rte_panic("Can't create a thread\n");
> +}
> +
> +static void vhost_driver_session_stop(void)

Ditto.

> +{
> +	int ret;
> +
> +	ret = pthread_cancel(session_th);
> +	if (ret)
> +		rte_panic("Can't cancel the thread\n");
> +
> +	ret = pthread_join(session_th, NULL);
> +	if (ret)
> +		rte_panic("Can't join the thread\n");
> +}
> +
> +static int
> +eth_dev_start(struct rte_eth_dev *dev)
...
> +	internal->nb_rx_queues = queues;
> +	internal->nb_tx_queues = queues;
> +	internal->dev_name = strdup(name);
> +	if (internal->dev_name == NULL)
> +		goto error;
> +	internal->iface_name = strdup(iface_name);
> +	if (internal->iface_name == NULL) {
> +		free(internal->dev_name);
> +		goto error;
> +	}

You still didn't resolve my comments from last email: if allocation
failed here, internal->dev_name is not freed.

> +
> +	pthread_mutex_lock(&internal_list_lock);
> +	TAILQ_INSERT_TAIL(&internals_list, internal, next);
> +	pthread_mutex_unlock(&internal_list_lock);
> +
...
> +static struct rte_driver pmd_vhost_drv = {
> +	.name = "eth_vhost",
> +	.type = PMD_VDEV,
> +	.init = rte_pmd_vhost_devinit,
> +	.uninit = rte_pmd_vhost_devuninit,
> +};
> +
> +struct
> +virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)

struct virtio_net *
rte_eth_vhost_portid2vdev()

BTW, why making a speical eth API for virtio? This doesn't make too much
sense to me.

Besides those minor nits, this patch looks good to me. Thanks for the
work!

	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 2/2] vhost: Add VHOST PMD
  2015-11-13  5:20                     ` [PATCH v4 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  2015-11-16  1:57                       ` Wang, Zhihong
  2015-11-20 11:43                       ` Yuanhan Liu
@ 2015-11-21  0:15                       ` Rich Lane
  2015-11-24  4:41                         ` Tetsuya Mukawa
  2015-11-24  9:00                       ` [PATCH v5 0/3] " Tetsuya Mukawa
  3 siblings, 1 reply; 200+ messages in thread
From: Rich Lane @ 2015-11-21  0:15 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: yuanhan.liu, dev, ann.zhuangyanying

On Thu, Nov 12, 2015 at 9:20 PM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:

> +static uint16_t
> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
>
...

> +
> +       /* Enqueue packets to guest RX queue */
> +       nb_tx = rte_vhost_enqueue_burst(r->device,
> +                       r->virtqueue_id, bufs, nb_bufs);
> +
> +       r->tx_pkts += nb_tx;
> +       r->err_pkts += nb_bufs - nb_tx;
>

I don't think a full TX queue is counted as an error by physical NIC PMDs
like ixgbe and i40e. It is counted as an error by the af_packet, pcap, and
ring PMDs. I'd suggest not counting it as an error because it's a common
and expected condition, and the application might just retry the TX later.

Are the byte counts left out because it would be a performance hit? It
seems like it would be a minimal cost given how much we're already touching
each packet.


> +static int
> +new_device(struct virtio_net *dev)
> +{
>
...

> +
> +       if ((dev->virt_qp_nb < internal->nb_rx_queues) ||
> +                       (dev->virt_qp_nb < internal->nb_tx_queues)) {
> +               RTE_LOG(INFO, PMD, "Not enough queues\n");
> +               return -1;
> +       }
>

Would it make sense to take the minimum of the guest and host queuepairs
and use that below in place of nb_rx_queues/nb_tx_queues? That way the host
can support a large maximum number of queues and each guest can choose how
many it wants to use. The host application will receive vring_state_changed
callbacks for each queue the guest activates.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 2/2] vhost: Add VHOST PMD
  2015-11-20 11:43                       ` Yuanhan Liu
@ 2015-11-24  2:48                         ` Tetsuya Mukawa
  2015-11-24  3:40                           ` Yuanhan Liu
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-24  2:48 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, ann.zhuangyanying

On 2015/11/20 20:43, Yuanhan Liu wrote:
> On Fri, Nov 13, 2015 at 02:20:31PM +0900, Tetsuya Mukawa wrote:
> ....
>> +static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
>> +
>> +static rte_atomic16_t nb_started_ports;
>> +pthread_t session_th;
> static?

Hi Yuanhan,

I appreciate your carefully reviewing.
I will fix issues you commented, and submit it again.

I added below 2 comments.
Could you please check it?

>> +
>> +static uint16_t
>> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
>> +{
>> +	struct vhost_queue *r = q;
>> +	uint16_t i, nb_tx = 0;
>> +
>> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
>> +		return 0;
>> +
>> +	rte_atomic32_set(&r->while_queuing, 1);
>> +
>> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
>> +		goto out;
>> +
>> +	/* Enqueue packets to guest RX queue */
>> +	nb_tx = rte_vhost_enqueue_burst(r->device,
>> +			r->virtqueue_id, bufs, nb_bufs);
>> +
>> +	r->tx_pkts += nb_tx;
>> +	r->err_pkts += nb_bufs - nb_tx;
>> +
>> +	for (i = 0; likely(i < nb_tx); i++)
>> +		rte_pktmbuf_free(bufs[i]);
> We should free upto nb_bufs here, but not nb_tx, right?

I guess we don't need to free all packet buffers.
Could you please check l2fwd_send_burst() in l2fwd example?
It seems DPDK application frees packet buffers that failed to send.

>> +static struct rte_driver pmd_vhost_drv = {
>> +	.name = "eth_vhost",
>> +	.type = PMD_VDEV,
>> +	.init = rte_pmd_vhost_devinit,
>> +	.uninit = rte_pmd_vhost_devuninit,
>> +};
>> +
>> +struct
>> +virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
> struct virtio_net *
> rte_eth_vhost_portid2vdev()
>
> BTW, why making a speical eth API for virtio? This doesn't make too much
> sense to me.

This is a kind of helper function.

I assume that DPDK applications want to know relation between port_id
and virtio device structure.
But, in "new" callback handler that DPDK application registers,
application can receive virtio device structure, but it doesn't tell
which port is.

To know it, probably here are steps that DPDK application needs to do.

1. Store interface name that is specified when vhost pmd is invoked.
(For example, store information like /tmp/socket0 is for port0, and
/tmp/socket1 is for port1)
2. Compare above interface name and dev->ifname that is stored in virtio
device structure, then DPDK application can know which port is.

If DPDK application uses Port Hotplug, I guess above steps are easy.
But if they don't, interface name will be specified in "--vdev" EAL
command line option.
So probably it's not so easy to handle interface name in DPDK application.
This is why I added the function.

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 2/2] vhost: Add VHOST PMD
  2015-11-24  2:48                         ` Tetsuya Mukawa
@ 2015-11-24  3:40                           ` Yuanhan Liu
  2015-11-24  3:44                             ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-11-24  3:40 UTC (permalink / raw)
  To: Tetsuya Mukawa, Xie, Huawei; +Cc: dev, ann.zhuangyanying

On Tue, Nov 24, 2015 at 11:48:04AM +0900, Tetsuya Mukawa wrote:
> On 2015/11/20 20:43, Yuanhan Liu wrote:
> > On Fri, Nov 13, 2015 at 02:20:31PM +0900, Tetsuya Mukawa wrote:
> > ....
> >> +static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
> >> +
> >> +static rte_atomic16_t nb_started_ports;
> >> +pthread_t session_th;
> > static?
> 
> Hi Yuanhan,
> 
> I appreciate your carefully reviewing.
> I will fix issues you commented, and submit it again.
> 
> I added below 2 comments.
> Could you please check it?
> 
> >> +
> >> +static uint16_t
> >> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> >> +{
> >> +	struct vhost_queue *r = q;
> >> +	uint16_t i, nb_tx = 0;
> >> +
> >> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> >> +		return 0;
> >> +
> >> +	rte_atomic32_set(&r->while_queuing, 1);
> >> +
> >> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> >> +		goto out;
> >> +
> >> +	/* Enqueue packets to guest RX queue */
> >> +	nb_tx = rte_vhost_enqueue_burst(r->device,
> >> +			r->virtqueue_id, bufs, nb_bufs);
> >> +
> >> +	r->tx_pkts += nb_tx;
> >> +	r->err_pkts += nb_bufs - nb_tx;
> >> +
> >> +	for (i = 0; likely(i < nb_tx); i++)
> >> +		rte_pktmbuf_free(bufs[i]);
> > We should free upto nb_bufs here, but not nb_tx, right?
> 
> I guess we don't need to free all packet buffers.
> Could you please check l2fwd_send_burst() in l2fwd example?
> It seems DPDK application frees packet buffers that failed to send.

Yes, you are right. I was thinking it's just a vhost app, and forgot
that this is for rte_eth_tx_burst, sigh ...

> 
> >> +static struct rte_driver pmd_vhost_drv = {
> >> +	.name = "eth_vhost",
> >> +	.type = PMD_VDEV,
> >> +	.init = rte_pmd_vhost_devinit,
> >> +	.uninit = rte_pmd_vhost_devuninit,
> >> +};
> >> +
> >> +struct
> >> +virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
> > struct virtio_net *
> > rte_eth_vhost_portid2vdev()
> >
> > BTW, why making a speical eth API for virtio? This doesn't make too much
> > sense to me.
> 
> This is a kind of helper function.

Yeah, I know that. I was thinking that an API prefixed with rte_eth_
should be a common interface for all eth drivers. Here this one is
for vhost PMD only, though.

I then had a quick check of DPDK code, and found a similar example,
bond, such as rte_eth_bond_create(). So, it might be okay to introduce
PMD specific eth APIs?

Anyway, I would suggest you to put it into another patch, so that
it can be reworked (or even dropped) if someone else doesn't like
it (or doesn't think it's necessary).

	--yliu

> 
> I assume that DPDK applications want to know relation between port_id
> and virtio device structure.
> But, in "new" callback handler that DPDK application registers,
> application can receive virtio device structure, but it doesn't tell
> which port is.
> 
> To know it, probably here are steps that DPDK application needs to do.
> 
> 1. Store interface name that is specified when vhost pmd is invoked.
> (For example, store information like /tmp/socket0 is for port0, and
> /tmp/socket1 is for port1)
> 2. Compare above interface name and dev->ifname that is stored in virtio
> device structure, then DPDK application can know which port is.
> 
> If DPDK application uses Port Hotplug, I guess above steps are easy.
> But if they don't, interface name will be specified in "--vdev" EAL
> command line option.
> So probably it's not so easy to handle interface name in DPDK application.
> This is why I added the function.
> 
> Thanks,
> Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 2/2] vhost: Add VHOST PMD
  2015-11-24  3:40                           ` Yuanhan Liu
@ 2015-11-24  3:44                             ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-24  3:44 UTC (permalink / raw)
  To: Yuanhan Liu, Xie, Huawei; +Cc: dev, ann.zhuangyanying

On 2015/11/24 12:40, Yuanhan Liu wrote:
> On Tue, Nov 24, 2015 at 11:48:04AM +0900, Tetsuya Mukawa wrote:
>> On 2015/11/20 20:43, Yuanhan Liu wrote:
>>> On Fri, Nov 13, 2015 at 02:20:31PM +0900, Tetsuya Mukawa wrote:
>>> ....
>>>> +static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
>>>> +
>>>> +static rte_atomic16_t nb_started_ports;
>>>> +pthread_t session_th;
>>> static?
>> Hi Yuanhan,
>>
>> I appreciate your carefully reviewing.
>> I will fix issues you commented, and submit it again.
>>
>> I added below 2 comments.
>> Could you please check it?
>>
>>>> +
>>>> +static uint16_t
>>>> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
>>>> +{
>>>> +	struct vhost_queue *r = q;
>>>> +	uint16_t i, nb_tx = 0;
>>>> +
>>>> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
>>>> +		return 0;
>>>> +
>>>> +	rte_atomic32_set(&r->while_queuing, 1);
>>>> +
>>>> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
>>>> +		goto out;
>>>> +
>>>> +	/* Enqueue packets to guest RX queue */
>>>> +	nb_tx = rte_vhost_enqueue_burst(r->device,
>>>> +			r->virtqueue_id, bufs, nb_bufs);
>>>> +
>>>> +	r->tx_pkts += nb_tx;
>>>> +	r->err_pkts += nb_bufs - nb_tx;
>>>> +
>>>> +	for (i = 0; likely(i < nb_tx); i++)
>>>> +		rte_pktmbuf_free(bufs[i]);
>>> We should free upto nb_bufs here, but not nb_tx, right?
>> I guess we don't need to free all packet buffers.
>> Could you please check l2fwd_send_burst() in l2fwd example?
>> It seems DPDK application frees packet buffers that failed to send.
> Yes, you are right. I was thinking it's just a vhost app, and forgot
> that this is for rte_eth_tx_burst, sigh ...
>
>>>> +static struct rte_driver pmd_vhost_drv = {
>>>> +	.name = "eth_vhost",
>>>> +	.type = PMD_VDEV,
>>>> +	.init = rte_pmd_vhost_devinit,
>>>> +	.uninit = rte_pmd_vhost_devuninit,
>>>> +};
>>>> +
>>>> +struct
>>>> +virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
>>> struct virtio_net *
>>> rte_eth_vhost_portid2vdev()
>>>
>>> BTW, why making a speical eth API for virtio? This doesn't make too much
>>> sense to me.
>> This is a kind of helper function.
> Yeah, I know that. I was thinking that an API prefixed with rte_eth_
> should be a common interface for all eth drivers. Here this one is
> for vhost PMD only, though.
>
> I then had a quick check of DPDK code, and found a similar example,
> bond, such as rte_eth_bond_create(). So, it might be okay to introduce
> PMD specific eth APIs?

Yes, I guess so.

>
> Anyway, I would suggest you to put it into another patch, so that
> it can be reworked (or even dropped) if someone else doesn't like
> it (or doesn't think it's necessary).

Sure, it's nice idea.
I will split the patch.

Tetsuya

> 	--yliu
>
>> I assume that DPDK applications want to know relation between port_id
>> and virtio device structure.
>> But, in "new" callback handler that DPDK application registers,
>> application can receive virtio device structure, but it doesn't tell
>> which port is.
>>
>> To know it, probably here are steps that DPDK application needs to do.
>>
>> 1. Store interface name that is specified when vhost pmd is invoked.
>> (For example, store information like /tmp/socket0 is for port0, and
>> /tmp/socket1 is for port1)
>> 2. Compare above interface name and dev->ifname that is stored in virtio
>> device structure, then DPDK application can know which port is.
>>
>> If DPDK application uses Port Hotplug, I guess above steps are easy.
>> But if they don't, interface name will be specified in "--vdev" EAL
>> command line option.
>> So probably it's not so easy to handle interface name in DPDK application.
>> This is why I added the function.
>>
>> Thanks,
>> Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v4 2/2] vhost: Add VHOST PMD
  2015-11-21  0:15                       ` Rich Lane
@ 2015-11-24  4:41                         ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-24  4:41 UTC (permalink / raw)
  To: Rich Lane; +Cc: yuanhan.liu, dev, ann.zhuangyanying

On 2015/11/21 9:15, Rich Lane wrote:
> On Thu, Nov 12, 2015 at 9:20 PM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:
>
>> +static uint16_t
>> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
>> +{
>>
> ...
>
>> +
>> +       /* Enqueue packets to guest RX queue */
>> +       nb_tx = rte_vhost_enqueue_burst(r->device,
>> +                       r->virtqueue_id, bufs, nb_bufs);
>> +
>> +       r->tx_pkts += nb_tx;
>> +       r->err_pkts += nb_bufs - nb_tx;
>>
> I don't think a full TX queue is counted as an error by physical NIC PMDs
> like ixgbe and i40e. It is counted as an error by the af_packet, pcap, and
> ring PMDs. I'd suggest not counting it as an error because it's a common
> and expected condition, and the application might just retry the TX later.

Hi Rich,

Thanks for commenting.
I will count it as "imissed".

> Are the byte counts left out because it would be a performance hit? It
> seems like it would be a minimal cost given how much we're already touching
> each packet.

I just ignore it for performance.
But you are correct, I will add it.

>
>> +static int
>> +new_device(struct virtio_net *dev)
>> +{
>>
> ...
>
>> +
>> +       if ((dev->virt_qp_nb < internal->nb_rx_queues) ||
>> +                       (dev->virt_qp_nb < internal->nb_tx_queues)) {
>> +               RTE_LOG(INFO, PMD, "Not enough queues\n");
>> +               return -1;
>> +       }
>>
> Would it make sense to take the minimum of the guest and host queuepairs
> and use that below in place of nb_rx_queues/nb_tx_queues? That way the host
> can support a large maximum number of queues and each guest can choose how
> many it wants to use. The host application will receive vring_state_changed
> callbacks for each queue the guest activates.
>

Thanks for checking this.
I agree with you.

After reading your comment, here is my guess for this PMD behavior.

This PMD should assume that virtio-net device(QEMU) has same or more
queues than specified in vhost PMD option.
In a case that the assumption is break, application should handle
vring_state_change callback correctly.
(Then stop accessing to disabled queues not to spend CPU power.)

Anyway, I will just remove above if-condition, because of above PMD
assumption.

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v5 0/3] Add VHOST PMD
  2015-11-13  5:20                     ` [PATCH v4 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
                                         ` (2 preceding siblings ...)
  2015-11-21  0:15                       ` Rich Lane
@ 2015-11-24  9:00                       ` Tetsuya Mukawa
  2015-11-24  9:00                         ` [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
                                           ` (3 more replies)
  3 siblings, 4 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-24  9:00 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ann.zhuangyanying

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.

PATCH v5 changes:
 - Rebase on latest master.
 - Fix RX/TX routine to count RX/TX bytes.
 - Fix RX/TX routine not to count as error packets if enqueue/dequeue
   cannot send all packets.
 - Fix if-condition checking for multiqueues.
 - Add "static" to pthread variable.
 - Fix format.
 - Change default behavior not to receive queueing event from driver.
 - Split the patch to separate rte_eth_vhost_portid2vdev().

PATCH v4 changes:
 - Rebase on latest DPDK tree.
 - Fix cording style.
 - Fix code not to invoke multiple messaging handling threads.
 - Fix code to handle vdev parameters correctly.
 - Remove needless cast.
 - Remove needless if-condition before rt_free().

PATCH v3 changes:
 - Rebase on latest matser
 - Specify correct queue_id in RX/TX function.

PATCH v2 changes:
 - Remove a below patch that fixes vhost library.
   The patch was applied as a separate patch.
   - vhost: fix crash with multiqueue enabled
 - Fix typos.
   (Thanks to Thomas, Monjalon)
 - Rebase on latest tree with above bernard's patches.

PATCH v1 changes:
 - Support vhost multiple queues.
 - Rebase on "remove pci driver from vdevs".
 - Optimize RX/TX functions.
 - Fix resource leaks.
 - Fix compile issue.
 - Add patch to fix vhost library.

RFC PATCH v3 changes:
 - Optimize performance.
   In RX/TX functions, change code to access only per core data.
 - Add below API to allow user to use vhost library APIs for a port managed
   by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
    - rte_eth_vhost_portid2vdev()
   To support this functionality, vhost library is also changed.
   Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
 - Add code to support vhost multiple queues.
   Actually, multiple queues functionality is not enabled so far.

RFC PATCH v2 changes:
 - Fix issues reported by checkpatch.pl
   (Thanks to Stephen Hemminger)


Tetsuya Mukawa (3):
  vhost: Add callback and private data for vhost PMD
  vhost: Add VHOST PMD
  vhost: Add helper function to convert port id to virtio device pointer

 config/common_linuxapp                        |   6 +
 doc/guides/nics/index.rst                     |   1 +
 doc/guides/rel_notes/release_2_2.rst          |   2 +
 drivers/net/Makefile                          |   4 +
 drivers/net/vhost/Makefile                    |  62 ++
 drivers/net/vhost/rte_eth_vhost.c             | 796 ++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h             |  65 +++
 drivers/net/vhost/rte_pmd_vhost_version.map   |   8 +
 lib/librte_vhost/rte_vhost_version.map        |   6 +
 lib/librte_vhost/rte_virtio_net.h             |   3 +
 lib/librte_vhost/vhost_user/virtio-net-user.c |  13 +-
 lib/librte_vhost/virtio-net.c                 |  60 +-
 lib/librte_vhost/virtio-net.h                 |   4 +-
 mk/rte.app.mk                                 |   8 +-
 14 files changed, 1024 insertions(+), 14 deletions(-)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

-- 
2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-11-24  9:00                       ` [PATCH v5 0/3] " Tetsuya Mukawa
@ 2015-11-24  9:00                         ` Tetsuya Mukawa
  2015-12-17 11:42                           ` Yuanhan Liu
                                             ` (6 more replies)
  2015-11-24  9:00                         ` [PATCH v5 2/3] " Tetsuya Mukawa
                                           ` (2 subsequent siblings)
  3 siblings, 7 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-24  9:00 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ann.zhuangyanying

The vhost PMD will be a wrapper of vhost library, but some of vhost
library APIs cannot be mapped to ethdev library APIs.
Becasue of this, in some cases, we still need to use vhost library APIs
for a port created by the vhost PMD.

Currently, when virtio device is created and destroyed, vhost library
will call one of callback handlers. The vhost PMD need to use this
pair of callback handlers to know which virtio devices are connected
actually.
Because we can register only one pair of callbacks to vhost library, if
the PMD use it, DPDK applications cannot have a way to know the events.

This may break legacy DPDK applications that uses vhost library. To prevent
it, this patch adds one more pair of callbacks to vhost library especially
for the vhost PMD.
With the patch, legacy applications can use the vhost PMD even if they need
additional specific handling for virtio device creation and destruction.

For example, legacy application can call
rte_vhost_enable_guest_notification() in callbacks to change setting.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 lib/librte_vhost/rte_vhost_version.map        |  6 +++
 lib/librte_vhost/rte_virtio_net.h             |  3 ++
 lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++---
 lib/librte_vhost/virtio-net.c                 | 60 +++++++++++++++++++++++++--
 lib/librte_vhost/virtio-net.h                 |  4 +-
 5 files changed, 73 insertions(+), 13 deletions(-)

diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 3d8709e..00a9ce5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -20,3 +20,9 @@ DPDK_2.1 {
 	rte_vhost_driver_unregister;
 
 } DPDK_2.0;
+
+DPDK_2.2 {
+	global:
+
+	rte_vhost_driver_pmd_callback_register;
+} DPDK_2.1;
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 5687452..3ef6e58 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -128,6 +128,7 @@ struct virtio_net {
 	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
 	uint32_t		virt_qp_nb;	/**< number of queue pair we have allocated */
 	void			*priv;		/**< private context */
+	void			*pmd_priv;	/**< private context for vhost PMD */
 	struct vhost_virtqueue	*virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];	/**< Contains all virtqueue information. */
 } __rte_cache_aligned;
 
@@ -224,6 +225,8 @@ int rte_vhost_driver_unregister(const char *dev_name);
 
 /* Register callbacks. */
 int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const);
+/* Register callbacks for vhost PMD (Only for internal). */
+int rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const);
 /* Start vhost driver session blocking loop. */
 int rte_vhost_driver_session_start(void);
 
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index d07452a..d8ae2fc 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -111,7 +111,7 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
 
 	/* Remove from the data plane. */
 	if (dev->flags & VIRTIO_DEV_RUNNING)
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	if (dev->mem) {
 		free_mem_region(dev);
@@ -272,7 +272,7 @@ user_set_vring_kick(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
 
 	if (virtio_is_ready(dev) &&
 		!(dev->flags & VIRTIO_DEV_RUNNING))
-			notify_ops->new_device(dev);
+			notify_new_device(dev);
 }
 
 /*
@@ -288,7 +288,7 @@ user_get_vring_base(struct vhost_device_ctx ctx,
 		return -1;
 	/* We have to stop the queue (virtio) if it is running. */
 	if (dev->flags & VIRTIO_DEV_RUNNING)
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	/* Here we are safe to get the last used index */
 	ops->get_vring_base(ctx, state->index, state);
@@ -324,10 +324,7 @@ user_set_vring_enable(struct vhost_device_ctx ctx,
 		"set queue enable: %d to qp idx: %d\n",
 		enable, state->index);
 
-	if (notify_ops->vring_state_changed) {
-		notify_ops->vring_state_changed(dev, base_idx / VIRTIO_QNUM,
-						enable);
-	}
+	notify_vring_state_changed(dev, base_idx / VIRTIO_QNUM, enable);
 
 	dev->virtqueue[base_idx + VIRTIO_RXQ]->enabled = enable;
 	dev->virtqueue[base_idx + VIRTIO_TXQ]->enabled = enable;
@@ -341,7 +338,7 @@ user_destroy_device(struct vhost_device_ctx ctx)
 	struct virtio_net *dev = get_device(ctx);
 
 	if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	if (dev && dev->mem) {
 		free_mem_region(dev);
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 8364938..dc977b7 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -65,6 +65,8 @@ struct virtio_net_config_ll {
 
 /* device ops to add/remove device to/from data core. */
 struct virtio_net_device_ops const *notify_ops;
+/* device ops for vhost PMD to add/remove device to/from data core. */
+struct virtio_net_device_ops const *pmd_notify_ops;
 /* root address of the linked list of managed virtio devices */
 static struct virtio_net_config_ll *ll_root;
 
@@ -81,6 +83,45 @@ static struct virtio_net_config_ll *ll_root;
 static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
 
 
+int
+notify_new_device(struct virtio_net *dev)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->new_device != NULL)) {
+		int ret = pmd_notify_ops->new_device(dev);
+
+		if (ret != 0)
+			return ret;
+	}
+	if ((notify_ops != NULL) && (notify_ops->new_device != NULL))
+		return notify_ops->new_device(dev);
+
+	return 0;
+}
+
+void
+notify_destroy_device(volatile struct virtio_net *dev)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->destroy_device != NULL))
+		pmd_notify_ops->destroy_device(dev);
+	if ((notify_ops != NULL) && (notify_ops->destroy_device != NULL))
+		notify_ops->destroy_device(dev);
+}
+
+int
+notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable)
+{
+	if ((pmd_notify_ops != NULL) && (pmd_notify_ops->vring_state_changed != NULL)) {
+		int ret = pmd_notify_ops->vring_state_changed(dev, queue_id, enable);
+
+		if (ret != 0)
+			return ret;
+	}
+	if ((notify_ops != NULL) && (notify_ops->vring_state_changed != NULL))
+		return notify_ops->vring_state_changed(dev, queue_id, enable);
+
+	return 0;
+}
+
 /*
  * Converts QEMU virtual address to Vhost virtual address. This function is
  * used to convert the ring addresses to our address space.
@@ -393,7 +434,7 @@ destroy_device(struct vhost_device_ctx ctx)
 			 * the function to remove it from the data core.
 			 */
 			if ((ll_dev_cur->dev.flags & VIRTIO_DEV_RUNNING))
-				notify_ops->destroy_device(&(ll_dev_cur->dev));
+				notify_destroy_device(&(ll_dev_cur->dev));
 			ll_dev_cur = rm_config_ll_entry(ll_dev_cur,
 					ll_dev_last);
 		} else {
@@ -451,7 +492,7 @@ reset_owner(struct vhost_device_ctx ctx)
 		return -1;
 
 	if (dev->flags & VIRTIO_DEV_RUNNING)
-		notify_ops->destroy_device(dev);
+		notify_destroy_device(dev);
 
 	cleanup_device(dev, 0);
 	reset_device(dev);
@@ -809,12 +850,12 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 	if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
 		if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
 			((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
-			return notify_ops->new_device(dev);
+			return notify_new_device(dev);
 		}
 	/* Otherwise we remove it. */
 	} else
 		if (file->fd == VIRTIO_DEV_STOPPED)
-			notify_ops->destroy_device(dev);
+			notify_destroy_device(dev);
 	return 0;
 }
 
@@ -898,3 +939,14 @@ rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const op
 
 	return 0;
 }
+
+/*
+ * Register ops so that we can add/remove device to data core.
+ */
+int
+rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const * const ops)
+{
+	pmd_notify_ops = ops;
+
+	return 0;
+}
diff --git a/lib/librte_vhost/virtio-net.h b/lib/librte_vhost/virtio-net.h
index 75fb57e..0816e71 100644
--- a/lib/librte_vhost/virtio-net.h
+++ b/lib/librte_vhost/virtio-net.h
@@ -37,7 +37,9 @@
 #include "vhost-net.h"
 #include "rte_virtio_net.h"
 
-struct virtio_net_device_ops const *notify_ops;
 struct virtio_net *get_device(struct vhost_device_ctx ctx);
 
+int notify_new_device(struct virtio_net *dev);
+void notify_destroy_device(volatile struct virtio_net *dev);
+int notify_vring_state_changed(struct virtio_net *dev, uint16_t queue_id, int enable);
 #endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v5 2/3] vhost: Add VHOST PMD
  2015-11-24  9:00                       ` [PATCH v5 0/3] " Tetsuya Mukawa
  2015-11-24  9:00                         ` [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
@ 2015-11-24  9:00                         ` Tetsuya Mukawa
  2015-12-18  7:45                           ` Yuanhan Liu
  2015-11-24  9:00                         ` [PATCH v5 3/3] vhost: Add helper function to convert port id to virtio device pointer Tetsuya Mukawa
  2015-12-08  1:12                         ` [PATCH v5 0/3] Add VHOST PMD Tetsuya Mukawa
  3 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-24  9:00 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ann.zhuangyanying

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
 - iface:  The parameter is used to specify a path to connect to a
           virtio-net device.
 - queues: The parameter is used to specify the number of the queues
           virtio-net device has.
           (Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
        <snip>
        -chardev socket,id=chr0,path=/tmp/sock0 \
        -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
        -device virtio-net-pci,netdev=net0

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 config/common_linuxapp                      |   6 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/rel_notes/release_2_2.rst        |   2 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  57 ++
 drivers/net/vhost/rte_eth_vhost.c           | 771 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_pmd_vhost_version.map |   8 +
 mk/rte.app.mk                               |   8 +-
 8 files changed, 856 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/config/common_linuxapp b/config/common_linuxapp
index f72c46d..0140a8e 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -466,6 +466,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 
 #
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 0a0b724..26db9b7 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -48,6 +48,7 @@ Network Interface Controller Drivers
     mlx5
     szedata2
     virtio
+    vhost
     vmxnet3
     pcap_ring
 
diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 8c77768..b6071ab 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -111,6 +111,8 @@ New Features
 
 * **Added vhost-user multiple queue support.**
 
+* **Added vhost PMD.**
+
 * **Added port hotplug support to vmxnet3.**
 
 * **Added port hotplug support to xenvirt.**
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index cddcd57..18d03cf 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -51,5 +51,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 include $(RTE_SDK)/mk/rte.sharelib.mk
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..8bec47a
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,57 @@
+#   BSD LICENSE
+#
+#   Copyright (c) 2010-2015 Intel Corporation.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..9ef05bc
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,771 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2015 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co.,Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+
+#define ETH_VHOST_IFACE_ARG		"iface"
+#define ETH_VHOST_QUEUES_ARG		"queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+	ETH_VHOST_IFACE_ARG,
+	ETH_VHOST_QUEUES_ARG,
+	NULL
+};
+
+static struct ether_addr base_eth_addr = {
+	.addr_bytes = {
+		0x56 /* V */,
+		0x48 /* H */,
+		0x4F /* O */,
+		0x53 /* S */,
+		0x54 /* T */,
+		0x00
+	}
+};
+
+struct vhost_queue {
+	rte_atomic32_t allow_queuing;
+	rte_atomic32_t while_queuing;
+	struct virtio_net *device;
+	struct pmd_internal *internal;
+	struct rte_mempool *mb_pool;
+	uint16_t virtqueue_id;
+	uint64_t rx_pkts;
+	uint64_t tx_pkts;
+	uint64_t missed_pkts;
+	uint64_t rx_bytes;
+	uint64_t tx_bytes;
+};
+
+struct pmd_internal {
+	TAILQ_ENTRY(pmd_internal) next;
+	char *dev_name;
+	char *iface_name;
+	unsigned nb_rx_queues;
+	unsigned nb_tx_queues;
+
+	struct vhost_queue *rx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+	struct vhost_queue *tx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+
+	volatile uint16_t once;
+};
+
+TAILQ_HEAD(pmd_internal_head, pmd_internal);
+static struct pmd_internal_head internals_list =
+	TAILQ_HEAD_INITIALIZER(internals_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static rte_atomic16_t nb_started_ports;
+static pthread_t session_th;
+
+static struct rte_eth_link pmd_link = {
+		.link_speed = 10000,
+		.link_duplex = ETH_LINK_FULL_DUPLEX,
+		.link_status = 0
+};
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_rx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Dequeue packets from guest TX queue */
+	nb_rx = rte_vhost_dequeue_burst(r->device,
+			r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
+
+	r->rx_pkts += nb_rx;
+
+	for (i = 0; likely(i < nb_rx); i++)
+		r->rx_bytes += bufs[i]->pkt_len;
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_tx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Enqueue packets to guest RX queue */
+	nb_tx = rte_vhost_enqueue_burst(r->device,
+			r->virtqueue_id, bufs, nb_bufs);
+
+	r->tx_pkts += nb_tx;
+	r->missed_pkts += nb_bufs - nb_tx;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		r->tx_bytes += bufs[i]->pkt_len;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		rte_pktmbuf_free(bufs[i]);
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	return 0;
+}
+
+static inline struct pmd_internal *
+find_internal_resource(char *ifname)
+{
+	int found = 0;
+	struct pmd_internal *internal;
+
+	if (ifname == NULL)
+		return NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(internal, &internals_list, next) {
+		if (!strcmp(internal->iface_name, ifname)) {
+			found = 1;
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	if (!found)
+		return NULL;
+
+	return internal;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	internal = find_internal_resource(dev->ifname);
+	if (internal == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid device name\n");
+		return -1;
+	}
+
+	eth_dev = rte_eth_dev_allocated(internal->dev_name);
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+		return -1;
+	}
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+		rte_vhost_enable_guest_notification(
+				dev, vq->virtqueue_id, 0);
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+		rte_vhost_enable_guest_notification(
+				dev, vq->virtqueue_id, 0);
+	}
+
+	dev->flags |= VIRTIO_DEV_RUNNING;
+	dev->pmd_priv = eth_dev;
+	eth_dev->data->dev_link.link_status = 1;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	RTE_LOG(INFO, PMD, "New connection established\n");
+
+	return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return;
+	}
+
+	eth_dev = (struct rte_eth_dev *)dev->pmd_priv;
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+		return;
+	}
+
+	internal = eth_dev->data->dev_private;
+
+	/* Wait until rx/tx_pkt_burst stops accessing vhost device */
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+
+	eth_dev->data->dev_link.link_status = 0;
+
+	dev->pmd_priv = NULL;
+	dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+
+	RTE_LOG(INFO, PMD, "Connection closed\n");
+}
+
+static void *
+vhost_driver_session(void *param __rte_unused)
+{
+	static struct virtio_net_device_ops vhost_ops;
+
+	/* set vhost arguments */
+	vhost_ops.new_device = new_device;
+	vhost_ops.destroy_device = destroy_device;
+	if (rte_vhost_driver_pmd_callback_register(&vhost_ops) < 0)
+		rte_panic("Can't register callbacks\n");
+
+	/* start event handling */
+	rte_vhost_driver_session_start();
+
+	pthread_exit(0);
+}
+
+static void
+vhost_driver_session_start(void)
+{
+	int ret;
+
+	ret = pthread_create(&session_th,
+			NULL, vhost_driver_session, NULL);
+	if (ret)
+		rte_panic("Can't create a thread\n");
+}
+
+static void
+vhost_driver_session_stop(void)
+{
+	int ret;
+
+	ret = pthread_cancel(session_th);
+	if (ret)
+		rte_panic("Can't cancel the thread\n");
+
+	ret = pthread_join(session_th, NULL);
+	if (ret)
+		rte_panic("Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	int ret;
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+		ret = rte_vhost_driver_register(internal->iface_name);
+		if (ret)
+			return ret;
+	}
+
+	/* We need only one message handling thread */
+	if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
+		vhost_driver_session_start();
+
+	return 0;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 1, 0))
+		rte_vhost_driver_unregister(internal->iface_name);
+
+	if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
+		vhost_driver_session_stop();
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+	struct vhost_queue *vq;
+
+	rte_free(internal->rx_vhost_queues[rx_queue_id]);
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->mb_pool = mb_pool;
+	vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
+	internal->rx_vhost_queues[rx_queue_id] = vq;
+	dev->data->rx_queues[rx_queue_id] = vq;
+	return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+	struct vhost_queue *vq;
+
+	rte_free(internal->tx_vhost_queues[tx_queue_id]);
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
+	internal->tx_vhost_queues[tx_queue_id] = vq;
+	dev->data->tx_queues[tx_queue_id] = vq;
+	return 0;
+}
+
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+	     struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	dev_info->driver_name = drivername;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = (uint32_t)-1;
+	dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
+	dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
+	dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+	unsigned i;
+	unsigned long rx_total = 0, tx_total = 0, tx_missed_total = 0;
+	unsigned long rx_total_bytes = 0, tx_total_bytes = 0;
+	const struct pmd_internal *internal = dev->data->dev_private;
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+	     i < internal->nb_rx_queues; i++) {
+		if (internal->rx_vhost_queues[i] == NULL)
+			continue;
+		igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i]->rx_pkts;
+		rx_total += igb_stats->q_ipackets[i];
+
+		igb_stats->q_ibytes[i] = internal->rx_vhost_queues[i]->rx_bytes;
+		rx_total_bytes += igb_stats->q_ibytes[i];
+	}
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+	     i < internal->nb_tx_queues; i++) {
+		if (internal->tx_vhost_queues[i] == NULL)
+			continue;
+		igb_stats->q_opackets[i] = internal->tx_vhost_queues[i]->tx_pkts;
+		tx_missed_total += internal->tx_vhost_queues[i]->missed_pkts;
+		tx_total += igb_stats->q_opackets[i];
+
+		igb_stats->q_obytes[i] = internal->tx_vhost_queues[i]->tx_bytes;
+		tx_total_bytes += igb_stats->q_obytes[i];
+	}
+
+	igb_stats->ipackets = rx_total;
+	igb_stats->opackets = tx_total;
+	igb_stats->imissed = tx_missed_total;
+	igb_stats->ibytes = rx_total_bytes;
+	igb_stats->obytes = tx_total_bytes;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	unsigned i;
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		if (internal->rx_vhost_queues[i] == NULL)
+			continue;
+		internal->rx_vhost_queues[i]->rx_pkts = 0;
+		internal->rx_vhost_queues[i]->rx_bytes = 0;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		if (internal->tx_vhost_queues[i] == NULL)
+			continue;
+		internal->tx_vhost_queues[i]->tx_pkts = 0;
+		internal->tx_vhost_queues[i]->tx_bytes = 0;
+		internal->tx_vhost_queues[i]->missed_pkts = 0;
+	}
+}
+
+static void
+eth_queue_release(void *q __rte_unused)
+{
+	return;
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, int index,
+		     char *iface_name,
+		     int16_t queues,
+		     const unsigned numa_node)
+{
+	struct rte_eth_dev_data *data = NULL;
+	struct pmd_internal *internal = NULL;
+	struct rte_eth_dev *eth_dev = NULL;
+	struct ether_addr *eth_addr = NULL;
+
+	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+		numa_node);
+
+	/* now do all data allocation - for eth_dev structure, dummy pci driver
+	 * and internal (private) data
+	 */
+	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+	if (data == NULL)
+		goto error;
+
+	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+	if (internal == NULL)
+		goto error;
+
+	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+	if (eth_addr == NULL)
+		goto error;
+	*eth_addr = base_eth_addr;
+	eth_addr->addr_bytes[5] = index;
+
+	/* reserve an ethdev entry */
+	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+	if (eth_dev == NULL)
+		goto error;
+
+	/* now put it all together
+	 * - store queue data in internal,
+	 * - store numa_node info in ethdev data
+	 * - point eth_dev_data to internals
+	 * - and point eth_dev structure to new eth_dev_data structure
+	 */
+	internal->nb_rx_queues = queues;
+	internal->nb_tx_queues = queues;
+	internal->dev_name = strdup(name);
+	if (internal->dev_name == NULL)
+		goto error;
+	internal->iface_name = strdup(iface_name);
+	if (internal->iface_name == NULL) {
+		free(internal->dev_name);
+		goto error;
+	}
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internals_list, internal, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	data->dev_private = internal;
+	data->port_id = eth_dev->data->port_id;
+	memmove(data->name, eth_dev->data->name, sizeof(data->name));
+	data->nb_rx_queues = queues;
+	data->nb_tx_queues = queues;
+	data->dev_link = pmd_link;
+	data->mac_addrs = eth_addr;
+
+	/* We'll replace the 'data' originally allocated by eth_dev. So the
+	 * vhost PMD resources won't be shared between multi processes.
+	 */
+	eth_dev->data = data;
+	eth_dev->dev_ops = &ops;
+	eth_dev->driver = NULL;
+	eth_dev->data->dev_flags = RTE_ETH_DEV_DETACHABLE;
+	eth_dev->data->kdrv = RTE_KDRV_NONE;
+	eth_dev->data->drv_name = internal->dev_name;
+	eth_dev->data->numa_node = numa_node;
+
+	/* finally assign rx and tx ops */
+	eth_dev->rx_pkt_burst = eth_vhost_rx;
+	eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+	return data->port_id;
+
+error:
+	rte_free(data);
+	rte_free(internal);
+	rte_free(eth_addr);
+
+	return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	const char **iface_name = extra_args;
+
+	if (value == NULL)
+		return -1;
+
+	*iface_name = value;
+
+	return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *q = extra_args;
+
+	if ((value == NULL) || (extra_args == NULL))
+		return -EINVAL;
+
+	*q = (uint16_t)strtoul(value, NULL, 0);
+	if ((*q == USHRT_MAX) && (errno == ERANGE))
+		return -1;
+
+	if (*q > RTE_MAX_QUEUES_PER_PORT)
+		return -1;
+
+	return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
+	int index;
+	char *iface_name;
+	uint16_t queues;
+
+	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+	if (strlen(name) < strlen("eth_vhost"))
+		return -1;
+
+	index = strtol(name + strlen("eth_vhost"), NULL, 0);
+	if (errno == ERANGE)
+		return -1;
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+		return -1;
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+					 &open_iface, &iface_name);
+		if (ret < 0)
+			goto out_free;
+	} else {
+		ret = -1;
+		goto out_free;
+	}
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+					 &open_queues, &queues);
+		if (ret < 0)
+			goto out_free;
+
+	} else
+		queues = 1;
+
+	eth_dev_vhost_create(name, index,
+			iface_name, queues, rte_socket_id());
+
+out_free:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internal *internal;
+	unsigned int i;
+
+	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+	if (name == NULL)
+		return -EINVAL;
+
+	/* find an ethdev entry */
+	eth_dev = rte_eth_dev_allocated(name);
+	if (eth_dev == NULL)
+		return -ENODEV;
+
+	internal = eth_dev->data->dev_private;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_REMOVE(&internals_list, internal, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	eth_dev_stop(eth_dev);
+
+	if ((internal) && (internal->dev_name))
+		free(internal->dev_name);
+	if ((internal) && (internal->iface_name))
+		free(internal->iface_name);
+
+	rte_free(eth_dev->data->mac_addrs);
+	rte_free(eth_dev->data);
+
+	for (i = 0; i < internal->nb_rx_queues; i++)
+		rte_free(internal->rx_vhost_queues[i]);
+	for (i = 0; i < internal->nb_tx_queues; i++)
+		rte_free(internal->tx_vhost_queues[i]);
+	rte_free(internal);
+
+	rte_eth_dev_release_port(eth_dev);
+
+	return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+	.name = "eth_vhost",
+	.type = PMD_VDEV,
+	.init = rte_pmd_vhost_devinit,
+	.uninit = rte_pmd_vhost_devuninit,
+};
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..bf0361a
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,8 @@
+DPDK_2.2 {
+
+	global:
+
+	rte_eth_vhost_portid2vdev;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 148653e..542df30 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -151,7 +151,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP)       += -lrte_pmd_pcap
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL)       += -lrte_pmd_null
 
-endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
+
+endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
+
+endif # $(CONFIG_RTE_BUILD_SHARED_LIB)
 
 endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v5 3/3] vhost: Add helper function to convert port id to virtio device pointer
  2015-11-24  9:00                       ` [PATCH v5 0/3] " Tetsuya Mukawa
  2015-11-24  9:00                         ` [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
  2015-11-24  9:00                         ` [PATCH v5 2/3] " Tetsuya Mukawa
@ 2015-11-24  9:00                         ` Tetsuya Mukawa
  2015-12-17 11:47                           ` Yuanhan Liu
  2015-12-08  1:12                         ` [PATCH v5 0/3] Add VHOST PMD Tetsuya Mukawa
  3 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-11-24  9:00 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ann.zhuangyanying

This helper function is used to convert port id to virtio device
pointer. To use this function, a port should be managed by vhost PMD.
After getting virtio device pointer, it can be used for calling vhost
library APIs. But some library APIs should not be called with vhost PMD.

Here is.
 - rte_vhost_driver_session_start()
 - rte_vhost_driver_unregister()

Above APIs will not work with vhost PMD.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 drivers/net/vhost/Makefile        |  5 +++
 drivers/net/vhost/rte_eth_vhost.c | 25 +++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h | 65 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 95 insertions(+)
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h

diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
index 8bec47a..8186a80 100644
--- a/drivers/net/vhost/Makefile
+++ b/drivers/net/vhost/Makefile
@@ -48,6 +48,11 @@ LIBABIVER := 1
 #
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
 
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
 DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index 9ef05bc..bfe1f18 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -41,6 +41,8 @@
 #include <rte_kvargs.h>
 #include <rte_virtio_net.h>
 
+#include "rte_eth_vhost.h"
+
 #define ETH_VHOST_IFACE_ARG		"iface"
 #define ETH_VHOST_QUEUES_ARG		"queues"
 
@@ -768,4 +770,27 @@ static struct rte_driver pmd_vhost_drv = {
 	.uninit = rte_pmd_vhost_devuninit,
 };
 
+struct
+virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id)
+{
+	struct rte_eth_dev *eth_dev;
+
+	if (rte_eth_dev_is_valid_port(port_id) == 0)
+		return NULL;
+
+	eth_dev = &rte_eth_devices[port_id];
+	if (strncmp("eth_vhost", eth_dev->data->drv_name,
+				strlen("eth_vhost")) == 0) {
+		struct pmd_internal *internal;
+		struct vhost_queue *vq;
+
+		internal = eth_dev->data->dev_private;
+		vq = internal->rx_vhost_queues[0];
+		if ((vq != NULL) && (vq->device != NULL))
+			return vq->device;
+	}
+
+	return NULL;
+}
+
 PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..22a880f
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,65 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co., Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_AF_PACKET_H_
+#define _RTE_ETH_AF_PACKET_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_virtio_net.h>
+
+/**
+ * The function convert specified port_id to virtio device structure.
+ * The retured device can be used for vhost library APIs.
+ * To use vhost library APIs and vhost PMD parallely, below API should
+ * not be called, because the API will be called by vhost PMD.
+ * - rte_vhost_driver_session_start()
+ * Once a device is managed by vhost PMD, below API should not be called.
+ * - rte_vhost_driver_unregister()
+ * To unregister the device, call Port Hotplug APIs.
+ *
+ * @param port_id
+ *  port number
+ * @return
+ *  virtio net device structure corresponding to the specified port
+ *  NULL will be returned in error cases.
+ */
+struct virtio_net *rte_eth_vhost_portid2vdev(uint16_t port_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 0/3] Add VHOST PMD
  2015-11-24  9:00                       ` [PATCH v5 0/3] " Tetsuya Mukawa
                                           ` (2 preceding siblings ...)
  2015-11-24  9:00                         ` [PATCH v5 3/3] vhost: Add helper function to convert port id to virtio device pointer Tetsuya Mukawa
@ 2015-12-08  1:12                         ` Tetsuya Mukawa
  2015-12-08  2:03                           ` Yuanhan Liu
  3 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-12-08  1:12 UTC (permalink / raw)
  To: dev, yuanhan.liu, huawei.xie

Hi Xie and Yuanhan,

Please let me make sure whether this patch is differed.
If it is differed, I guess I may need to add ABI breakage notice before
releasing DPDK-2.2, because the patches changes virtio_net structure.

Tetsuya,


On 2015/11/24 18:00, Tetsuya Mukawa wrote:
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost.
>
> PATCH v5 changes:
>  - Rebase on latest master.
>  - Fix RX/TX routine to count RX/TX bytes.
>  - Fix RX/TX routine not to count as error packets if enqueue/dequeue
>    cannot send all packets.
>  - Fix if-condition checking for multiqueues.
>  - Add "static" to pthread variable.
>  - Fix format.
>  - Change default behavior not to receive queueing event from driver.
>  - Split the patch to separate rte_eth_vhost_portid2vdev().
>
> PATCH v4 changes:
>  - Rebase on latest DPDK tree.
>  - Fix cording style.
>  - Fix code not to invoke multiple messaging handling threads.
>  - Fix code to handle vdev parameters correctly.
>  - Remove needless cast.
>  - Remove needless if-condition before rt_free().
>
> PATCH v3 changes:
>  - Rebase on latest matser
>  - Specify correct queue_id in RX/TX function.
>
> PATCH v2 changes:
>  - Remove a below patch that fixes vhost library.
>    The patch was applied as a separate patch.
>    - vhost: fix crash with multiqueue enabled
>  - Fix typos.
>    (Thanks to Thomas, Monjalon)
>  - Rebase on latest tree with above bernard's patches.
>
> PATCH v1 changes:
>  - Support vhost multiple queues.
>  - Rebase on "remove pci driver from vdevs".
>  - Optimize RX/TX functions.
>  - Fix resource leaks.
>  - Fix compile issue.
>  - Add patch to fix vhost library.
>
> RFC PATCH v3 changes:
>  - Optimize performance.
>    In RX/TX functions, change code to access only per core data.
>  - Add below API to allow user to use vhost library APIs for a port managed
>    by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
>     - rte_eth_vhost_portid2vdev()
>    To support this functionality, vhost library is also changed.
>    Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
>  - Add code to support vhost multiple queues.
>    Actually, multiple queues functionality is not enabled so far.
>
> RFC PATCH v2 changes:
>  - Fix issues reported by checkpatch.pl
>    (Thanks to Stephen Hemminger)
>
>
> Tetsuya Mukawa (3):
>   vhost: Add callback and private data for vhost PMD
>   vhost: Add VHOST PMD
>   vhost: Add helper function to convert port id to virtio device pointer
>
>  config/common_linuxapp                        |   6 +
>  doc/guides/nics/index.rst                     |   1 +
>  doc/guides/rel_notes/release_2_2.rst          |   2 +
>  drivers/net/Makefile                          |   4 +
>  drivers/net/vhost/Makefile                    |  62 ++
>  drivers/net/vhost/rte_eth_vhost.c             | 796 ++++++++++++++++++++++++++
>  drivers/net/vhost/rte_eth_vhost.h             |  65 +++
>  drivers/net/vhost/rte_pmd_vhost_version.map   |   8 +
>  lib/librte_vhost/rte_vhost_version.map        |   6 +
>  lib/librte_vhost/rte_virtio_net.h             |   3 +
>  lib/librte_vhost/vhost_user/virtio-net-user.c |  13 +-
>  lib/librte_vhost/virtio-net.c                 |  60 +-
>  lib/librte_vhost/virtio-net.h                 |   4 +-
>  mk/rte.app.mk                                 |   8 +-
>  14 files changed, 1024 insertions(+), 14 deletions(-)
>  create mode 100644 drivers/net/vhost/Makefile
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.h
>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 0/3] Add VHOST PMD
  2015-12-08  1:12                         ` [PATCH v5 0/3] Add VHOST PMD Tetsuya Mukawa
@ 2015-12-08  2:03                           ` Yuanhan Liu
  2015-12-08  2:10                             ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-12-08  2:03 UTC (permalink / raw)
  To: Tetsuya Mukawa, Xie, Huawei; +Cc: dev

On Tue, Dec 08, 2015 at 10:12:52AM +0900, Tetsuya Mukawa wrote:
> Hi Xie and Yuanhan,
> 
> Please let me make sure whether this patch is differed.
> If it is differed, I guess I may need to add ABI breakage notice before

Tetsuya,

What do you mean by "differed"? Do you mean "delayed"?

Per my understanding, it's a bit late for v2.2 (even at few weeks
before). On the other hand, I'm still waiting for comments from
Huawei, for there are still one or two issues need more discussion.

> releasing DPDK-2.2, because the patches changes virtio_net structure.

I had sent a patch (which is just applied by Thomas) for reserving
some spaces for both virtio_net and vhost_virtqueue structure, so
it will not break anything if you simply add few more fields :)

	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 0/3] Add VHOST PMD
  2015-12-08  2:03                           ` Yuanhan Liu
@ 2015-12-08  2:10                             ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-12-08  2:10 UTC (permalink / raw)
  To: Yuanhan Liu, Xie, Huawei; +Cc: dev

On 2015/12/08 11:03, Yuanhan Liu wrote:
> On Tue, Dec 08, 2015 at 10:12:52AM +0900, Tetsuya Mukawa wrote:
>> Hi Xie and Yuanhan,
>>
>> Please let me make sure whether this patch is differed.
>> If it is differed, I guess I may need to add ABI breakage notice before
> Tetsuya,
>
> What do you mean by "differed"? Do you mean "delayed"?

Hi Yuanhan,

I just guess the patch will not be merged in DPDK-2.2.

> Per my understanding, it's a bit late for v2.2 (even at few weeks
> before). On the other hand, I'm still waiting for comments from
> Huawei, for there are still one or two issues need more discussion.

Yes, I agree with you.

>> releasing DPDK-2.2, because the patches changes virtio_net structure.
> I had sent a patch (which is just applied by Thomas) for reserving
> some spaces for both virtio_net and vhost_virtqueue structure, so
> it will not break anything if you simply add few more fields :)

Sounds great idea!
Thanks for handling virtio things.

Tetsuya,

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-11-24  9:00                         ` [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
@ 2015-12-17 11:42                           ` Yuanhan Liu
  2015-12-18  3:15                             ` Tetsuya Mukawa
  2016-02-02 11:18                           ` [PATCH v6 0/2] Add VHOST PMD Tetsuya Mukawa
                                             ` (5 subsequent siblings)
  6 siblings, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-12-17 11:42 UTC (permalink / raw)
  To: Tetsuya Mukawa, Xie, Huawei; +Cc: dev, ann.zhuangyanying

On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
> The vhost PMD will be a wrapper of vhost library, but some of vhost
> library APIs cannot be mapped to ethdev library APIs.
> Becasue of this, in some cases, we still need to use vhost library APIs
> for a port created by the vhost PMD.
> 
> Currently, when virtio device is created and destroyed, vhost library
> will call one of callback handlers. The vhost PMD need to use this
> pair of callback handlers to know which virtio devices are connected
> actually.
> Because we can register only one pair of callbacks to vhost library, if
> the PMD use it, DPDK applications cannot have a way to know the events.
> 
> This may break legacy DPDK applications that uses vhost library. To prevent
> it, this patch adds one more pair of callbacks to vhost library especially
> for the vhost PMD.
> With the patch, legacy applications can use the vhost PMD even if they need
> additional specific handling for virtio device creation and destruction.
> 
> For example, legacy application can call
> rte_vhost_enable_guest_notification() in callbacks to change setting.

TBH, I never liked it since the beginning. Introducing two callbacks
for one event is a bit messy, and therefore error prone.

I have been thinking this occasionally last few weeks, and have came
up something that we may introduce another layer callback based on
the vhost pmd itself, by a new API:

	rte_eth_vhost_register_callback().

And we then call those new callback inside the vhost pmd new_device()
and vhost pmd destroy_device() implementations.

And we could have same callbacks like vhost have, but I'm thinking
that new_device() and destroy_device() doesn't sound like a good name
to a PMD driver. Maybe a name like "link_state_changed" is better?

What do you think of that?


On the other hand, I'm still thinking is that really necessary to let
the application be able to call vhost functions like rte_vhost_enable_guest_notification()
with the vhost PMD driver?

	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 3/3] vhost: Add helper function to convert port id to virtio device pointer
  2015-11-24  9:00                         ` [PATCH v5 3/3] vhost: Add helper function to convert port id to virtio device pointer Tetsuya Mukawa
@ 2015-12-17 11:47                           ` Yuanhan Liu
  2015-12-18  3:15                             ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-12-17 11:47 UTC (permalink / raw)
  To: Tetsuya Mukawa, Xie, Huawei; +Cc: dev, ann.zhuangyanying

On Tue, Nov 24, 2015 at 06:00:03PM +0900, Tetsuya Mukawa wrote:
> This helper function is used to convert port id to virtio device
> pointer. To use this function, a port should be managed by vhost PMD.
> After getting virtio device pointer, it can be used for calling vhost
> library APIs.

I'm thinking why is that necessary. I mean, hey, can we simply treat
it as a normal pmd driver, and don't consider any vhost lib functions
any more while using vhost pmd?

	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-17 11:42                           ` Yuanhan Liu
@ 2015-12-18  3:15                             ` Tetsuya Mukawa
  2015-12-18  3:36                               ` Tetsuya Mukawa
  2015-12-18  4:15                               ` Yuanhan Liu
  0 siblings, 2 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-12-18  3:15 UTC (permalink / raw)
  To: Yuanhan Liu, Xie, Huawei; +Cc: dev, ann.zhuangyanying

On 2015/12/17 20:42, Yuanhan Liu wrote:
> On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
>> The vhost PMD will be a wrapper of vhost library, but some of vhost
>> library APIs cannot be mapped to ethdev library APIs.
>> Becasue of this, in some cases, we still need to use vhost library APIs
>> for a port created by the vhost PMD.
>>
>> Currently, when virtio device is created and destroyed, vhost library
>> will call one of callback handlers. The vhost PMD need to use this
>> pair of callback handlers to know which virtio devices are connected
>> actually.
>> Because we can register only one pair of callbacks to vhost library, if
>> the PMD use it, DPDK applications cannot have a way to know the events.
>>
>> This may break legacy DPDK applications that uses vhost library. To prevent
>> it, this patch adds one more pair of callbacks to vhost library especially
>> for the vhost PMD.
>> With the patch, legacy applications can use the vhost PMD even if they need
>> additional specific handling for virtio device creation and destruction.
>>
>> For example, legacy application can call
>> rte_vhost_enable_guest_notification() in callbacks to change setting.
> TBH, I never liked it since the beginning. Introducing two callbacks
> for one event is a bit messy, and therefore error prone.

I agree with you.

> I have been thinking this occasionally last few weeks, and have came
> up something that we may introduce another layer callback based on
> the vhost pmd itself, by a new API:
>
> 	rte_eth_vhost_register_callback().
>
> And we then call those new callback inside the vhost pmd new_device()
> and vhost pmd destroy_device() implementations.
>
> And we could have same callbacks like vhost have, but I'm thinking
> that new_device() and destroy_device() doesn't sound like a good name
> to a PMD driver. Maybe a name like "link_state_changed" is better?
>
> What do you think of that?

Yes,  "link_state_changed" will be good.

BTW, I thought it was ok that an DPDK app that used vhost PMD called
vhost library APIs directly.
But probably you may feel strangeness about it. Is this correct?

If so, how about implementing legacy status interrupt mechanism to vhost
PMD?
For example, an DPDK app can register callback handler like
"examples/link_status_interrupt".

Also, if the app doesn't call vhost library APIs directly,
rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
need to handle virtio device structure anymore.

>
>
> On the other hand, I'm still thinking is that really necessary to let
> the application be able to call vhost functions like rte_vhost_enable_guest_notification()
> with the vhost PMD driver?

Basic concept of my patch is that vhost PMD will provides the features
that vhost library provides.

How about removing rte_vhost_enable_guest_notification() from "vhost
library"?
(I also not sure what are use cases)
If we can do this, vhost PMD also doesn't need to take care of it.
Or if rte_vhost_enable_guest_notification() will be removed in the
future, vhost PMD is able to ignore it.


Please let me correct up my thinking about your questions.
 - Change concept of patch not to call vhost library APIs directly.
These should be wrapped by ethdev APIs.
 - Remove rte_eth_vhost_portid2vdev(), because of above concept changing.
 - Implement legacy status changed interrupt to vhost PMD instead of
using own callback mechanism.
 - Check if we can remove rte_vhost_enable_guest_notification() from
vhost library.


Hi Xie,

Do you know the use cases of rte_vhost_enable_guest_notification()?

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 3/3] vhost: Add helper function to convert port id to virtio device pointer
  2015-12-17 11:47                           ` Yuanhan Liu
@ 2015-12-18  3:15                             ` Tetsuya Mukawa
  2015-12-18  4:19                               ` Yuanhan Liu
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-12-18  3:15 UTC (permalink / raw)
  To: Yuanhan Liu, Xie, Huawei; +Cc: dev, ann.zhuangyanying

On 2015/12/17 20:47, Yuanhan Liu wrote:
> On Tue, Nov 24, 2015 at 06:00:03PM +0900, Tetsuya Mukawa wrote:
>> This helper function is used to convert port id to virtio device
>> pointer. To use this function, a port should be managed by vhost PMD.
>> After getting virtio device pointer, it can be used for calling vhost
>> library APIs.
> I'm thinking why is that necessary. I mean, hey, can we simply treat
> it as a normal pmd driver, and don't consider any vhost lib functions
> any more while using vhost pmd?

I guess vhost PMD cannot  hide some of vhost features.
Because of this, we may need to add ethdev APIs to wraps these features.
I described more in one more email. Could you please see it also?

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-18  3:15                             ` Tetsuya Mukawa
@ 2015-12-18  3:36                               ` Tetsuya Mukawa
  2015-12-18  4:15                               ` Yuanhan Liu
  1 sibling, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-12-18  3:36 UTC (permalink / raw)
  To: Yuanhan Liu, Xie, Huawei; +Cc: dev, ann.zhuangyanying

On 2015/12/18 12:15, Tetsuya Mukawa wrote:
> On 2015/12/17 20:42, Yuanhan Liu wrote:
>> On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
>>> The vhost PMD will be a wrapper of vhost library, but some of vhost
>>> library APIs cannot be mapped to ethdev library APIs.
>>> Becasue of this, in some cases, we still need to use vhost library APIs
>>> for a port created by the vhost PMD.
>>>
>>> Currently, when virtio device is created and destroyed, vhost library
>>> will call one of callback handlers. The vhost PMD need to use this
>>> pair of callback handlers to know which virtio devices are connected
>>> actually.
>>> Because we can register only one pair of callbacks to vhost library, if
>>> the PMD use it, DPDK applications cannot have a way to know the events.
>>>
>>> This may break legacy DPDK applications that uses vhost library. To prevent
>>> it, this patch adds one more pair of callbacks to vhost library especially
>>> for the vhost PMD.
>>> With the patch, legacy applications can use the vhost PMD even if they need
>>> additional specific handling for virtio device creation and destruction.
>>>
>>> For example, legacy application can call
>>> rte_vhost_enable_guest_notification() in callbacks to change setting.
>> TBH, I never liked it since the beginning. Introducing two callbacks
>> for one event is a bit messy, and therefore error prone.
> I agree with you.
>
>> I have been thinking this occasionally last few weeks, and have came
>> up something that we may introduce another layer callback based on
>> the vhost pmd itself, by a new API:
>>
>> 	rte_eth_vhost_register_callback().
>>
>> And we then call those new callback inside the vhost pmd new_device()
>> and vhost pmd destroy_device() implementations.
>>
>> And we could have same callbacks like vhost have, but I'm thinking
>> that new_device() and destroy_device() doesn't sound like a good name
>> to a PMD driver. Maybe a name like "link_state_changed" is better?
>>
>> What do you think of that?
> Yes,  "link_state_changed" will be good.
>
> BTW, I thought it was ok that an DPDK app that used vhost PMD called
> vhost library APIs directly.
> But probably you may feel strangeness about it. Is this correct?
>
> If so, how about implementing legacy status interrupt mechanism to vhost
> PMD?
> For example, an DPDK app can register callback handler like
> "examples/link_status_interrupt".

Addition:
In this case, we don't need something special pmd_priv field in vhost
library.
vhost PMD will register callbacks.
And in this callbacks, legacy interrupt mechanism will be kicked.
Then user can receive interrupt from ethdev.

Tetsuya

> Also, if the app doesn't call vhost library APIs directly,
> rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
> need to handle virtio device structure anymore.
>
>>
>> On the other hand, I'm still thinking is that really necessary to let
>> the application be able to call vhost functions like rte_vhost_enable_guest_notification()
>> with the vhost PMD driver?
> Basic concept of my patch is that vhost PMD will provides the features
> that vhost library provides.
>
> How about removing rte_vhost_enable_guest_notification() from "vhost
> library"?
> (I also not sure what are use cases)
> If we can do this, vhost PMD also doesn't need to take care of it.
> Or if rte_vhost_enable_guest_notification() will be removed in the
> future, vhost PMD is able to ignore it.
>
>
> Please let me correct up my thinking about your questions.
>  - Change concept of patch not to call vhost library APIs directly.
> These should be wrapped by ethdev APIs.
>  - Remove rte_eth_vhost_portid2vdev(), because of above concept changing.
>  - Implement legacy status changed interrupt to vhost PMD instead of
> using own callback mechanism.
>  - Check if we can remove rte_vhost_enable_guest_notification() from
> vhost library.
>
>
> Hi Xie,
>
> Do you know the use cases of rte_vhost_enable_guest_notification()?
>
> Thanks,
> Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-18  3:15                             ` Tetsuya Mukawa
  2015-12-18  3:36                               ` Tetsuya Mukawa
@ 2015-12-18  4:15                               ` Yuanhan Liu
  2015-12-18  4:28                                 ` Tetsuya Mukawa
  2015-12-18 10:03                                 ` Xie, Huawei
  1 sibling, 2 replies; 200+ messages in thread
From: Yuanhan Liu @ 2015-12-18  4:15 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Fri, Dec 18, 2015 at 12:15:42PM +0900, Tetsuya Mukawa wrote:
> On 2015/12/17 20:42, Yuanhan Liu wrote:
> > On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
> >> The vhost PMD will be a wrapper of vhost library, but some of vhost
> >> library APIs cannot be mapped to ethdev library APIs.
> >> Becasue of this, in some cases, we still need to use vhost library APIs
> >> for a port created by the vhost PMD.
> >>
> >> Currently, when virtio device is created and destroyed, vhost library
> >> will call one of callback handlers. The vhost PMD need to use this
> >> pair of callback handlers to know which virtio devices are connected
> >> actually.
> >> Because we can register only one pair of callbacks to vhost library, if
> >> the PMD use it, DPDK applications cannot have a way to know the events.
> >>
> >> This may break legacy DPDK applications that uses vhost library. To prevent
> >> it, this patch adds one more pair of callbacks to vhost library especially
> >> for the vhost PMD.
> >> With the patch, legacy applications can use the vhost PMD even if they need
> >> additional specific handling for virtio device creation and destruction.
> >>
> >> For example, legacy application can call
> >> rte_vhost_enable_guest_notification() in callbacks to change setting.
> > TBH, I never liked it since the beginning. Introducing two callbacks
> > for one event is a bit messy, and therefore error prone.
> 
> I agree with you.
> 
> > I have been thinking this occasionally last few weeks, and have came
> > up something that we may introduce another layer callback based on
> > the vhost pmd itself, by a new API:
> >
> > 	rte_eth_vhost_register_callback().
> >
> > And we then call those new callback inside the vhost pmd new_device()
> > and vhost pmd destroy_device() implementations.
> >
> > And we could have same callbacks like vhost have, but I'm thinking
> > that new_device() and destroy_device() doesn't sound like a good name
> > to a PMD driver. Maybe a name like "link_state_changed" is better?
> >
> > What do you think of that?
> 
> Yes,  "link_state_changed" will be good.
> 
> BTW, I thought it was ok that an DPDK app that used vhost PMD called
> vhost library APIs directly.
> But probably you may feel strangeness about it. Is this correct?

Unluckily, that's true :)

> 
> If so, how about implementing legacy status interrupt mechanism to vhost
> PMD?
> For example, an DPDK app can register callback handler like
> "examples/link_status_interrupt".
> 
> Also, if the app doesn't call vhost library APIs directly,
> rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
> need to handle virtio device structure anymore.
> 
> >
> >
> > On the other hand, I'm still thinking is that really necessary to let
> > the application be able to call vhost functions like rte_vhost_enable_guest_notification()
> > with the vhost PMD driver?
> 
> Basic concept of my patch is that vhost PMD will provides the features
> that vhost library provides.

I don't think that's necessary. Let's just treat it as a normal pmd
driver, having nothing to do with vhost library.

> How about removing rte_vhost_enable_guest_notification() from "vhost
> library"?
> (I also not sure what are use cases)
> If we can do this, vhost PMD also doesn't need to take care of it.
> Or if rte_vhost_enable_guest_notification() will be removed in the
> future, vhost PMD is able to ignore it.

You could either call it in vhost-pmd (which you already have done that),
or ignore it in vhost-pmd, but dont' remove it from vhost library.

> 
> Please let me correct up my thinking about your questions.
>  - Change concept of patch not to call vhost library APIs directly.
> These should be wrapped by ethdev APIs.
>  - Remove rte_eth_vhost_portid2vdev(), because of above concept changing.
>  - Implement legacy status changed interrupt to vhost PMD instead of
> using own callback mechanism.
>  - Check if we can remove rte_vhost_enable_guest_notification() from
> vhost library.

So, how about making it __fare__ simple as the first step, to get merged
easily, that we don't assume the applications will call any vhost library
functions any more, so that we don't need the callback, and we don't need
the rte_eth_vhost_portid2vdev(), either. Again, just let it be a fare
normal (nothing special) pmd driver.  (UNLESS, there is a real must, which
I don't see so far).

Tetsuya, what do you think of that then?

> 
> Hi Xie,
> 
> Do you know the use cases of rte_vhost_enable_guest_notification()?

Setting the arg to 0 avoids the guest kicking the virtqueue, which
is good for performance, and we should keep it.

	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 3/3] vhost: Add helper function to convert port id to virtio device pointer
  2015-12-18  3:15                             ` Tetsuya Mukawa
@ 2015-12-18  4:19                               ` Yuanhan Liu
  0 siblings, 0 replies; 200+ messages in thread
From: Yuanhan Liu @ 2015-12-18  4:19 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Fri, Dec 18, 2015 at 12:15:49PM +0900, Tetsuya Mukawa wrote:
> On 2015/12/17 20:47, Yuanhan Liu wrote:
> > On Tue, Nov 24, 2015 at 06:00:03PM +0900, Tetsuya Mukawa wrote:
> >> This helper function is used to convert port id to virtio device
> >> pointer. To use this function, a port should be managed by vhost PMD.
> >> After getting virtio device pointer, it can be used for calling vhost
> >> library APIs.
> > I'm thinking why is that necessary. I mean, hey, can we simply treat
> > it as a normal pmd driver, and don't consider any vhost lib functions
> > any more while using vhost pmd?
> 
> I guess vhost PMD cannot  hide some of vhost features.

Sorry, a "guess" could not convice me to have it.

On the other hand, it does no harm but brings great benefit to make
things simple in the first time. __If__ there really is a need for
them, we could add it back, isn't it? ;)

	--yliu

> Because of this, we may need to add ethdev APIs to wraps these features.
> I described more in one more email. Could you please see it also?
> 
> Thanks,
> Tetsuya
> 

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-18  4:15                               ` Yuanhan Liu
@ 2015-12-18  4:28                                 ` Tetsuya Mukawa
  2015-12-18 18:01                                   ` Rich Lane
  2015-12-18 10:03                                 ` Xie, Huawei
  1 sibling, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-12-18  4:28 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, ann.zhuangyanying

On 2015/12/18 13:15, Yuanhan Liu wrote:
> On Fri, Dec 18, 2015 at 12:15:42PM +0900, Tetsuya Mukawa wrote:
>> On 2015/12/17 20:42, Yuanhan Liu wrote:
>>> On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
>>>> The vhost PMD will be a wrapper of vhost library, but some of vhost
>>>> library APIs cannot be mapped to ethdev library APIs.
>>>> Becasue of this, in some cases, we still need to use vhost library APIs
>>>> for a port created by the vhost PMD.
>>>>
>>>> Currently, when virtio device is created and destroyed, vhost library
>>>> will call one of callback handlers. The vhost PMD need to use this
>>>> pair of callback handlers to know which virtio devices are connected
>>>> actually.
>>>> Because we can register only one pair of callbacks to vhost library, if
>>>> the PMD use it, DPDK applications cannot have a way to know the events.
>>>>
>>>> This may break legacy DPDK applications that uses vhost library. To prevent
>>>> it, this patch adds one more pair of callbacks to vhost library especially
>>>> for the vhost PMD.
>>>> With the patch, legacy applications can use the vhost PMD even if they need
>>>> additional specific handling for virtio device creation and destruction.
>>>>
>>>> For example, legacy application can call
>>>> rte_vhost_enable_guest_notification() in callbacks to change setting.
>>> TBH, I never liked it since the beginning. Introducing two callbacks
>>> for one event is a bit messy, and therefore error prone.
>> I agree with you.
>>
>>> I have been thinking this occasionally last few weeks, and have came
>>> up something that we may introduce another layer callback based on
>>> the vhost pmd itself, by a new API:
>>>
>>> 	rte_eth_vhost_register_callback().
>>>
>>> And we then call those new callback inside the vhost pmd new_device()
>>> and vhost pmd destroy_device() implementations.
>>>
>>> And we could have same callbacks like vhost have, but I'm thinking
>>> that new_device() and destroy_device() doesn't sound like a good name
>>> to a PMD driver. Maybe a name like "link_state_changed" is better?
>>>
>>> What do you think of that?
>> Yes,  "link_state_changed" will be good.
>>
>> BTW, I thought it was ok that an DPDK app that used vhost PMD called
>> vhost library APIs directly.
>> But probably you may feel strangeness about it. Is this correct?
> Unluckily, that's true :)
>
>> If so, how about implementing legacy status interrupt mechanism to vhost
>> PMD?
>> For example, an DPDK app can register callback handler like
>> "examples/link_status_interrupt".
>>
>> Also, if the app doesn't call vhost library APIs directly,
>> rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
>> need to handle virtio device structure anymore.
>>
>>>
>>> On the other hand, I'm still thinking is that really necessary to let
>>> the application be able to call vhost functions like rte_vhost_enable_guest_notification()
>>> with the vhost PMD driver?
>> Basic concept of my patch is that vhost PMD will provides the features
>> that vhost library provides.
> I don't think that's necessary. Let's just treat it as a normal pmd
> driver, having nothing to do with vhost library.
>
>> How about removing rte_vhost_enable_guest_notification() from "vhost
>> library"?
>> (I also not sure what are use cases)
>> If we can do this, vhost PMD also doesn't need to take care of it.
>> Or if rte_vhost_enable_guest_notification() will be removed in the
>> future, vhost PMD is able to ignore it.
> You could either call it in vhost-pmd (which you already have done that),
> or ignore it in vhost-pmd, but dont' remove it from vhost library.
>
>> Please let me correct up my thinking about your questions.
>>  - Change concept of patch not to call vhost library APIs directly.
>> These should be wrapped by ethdev APIs.
>>  - Remove rte_eth_vhost_portid2vdev(), because of above concept changing.
>>  - Implement legacy status changed interrupt to vhost PMD instead of
>> using own callback mechanism.
>>  - Check if we can remove rte_vhost_enable_guest_notification() from
>> vhost library.
> So, how about making it __fare__ simple as the first step, to get merged
> easily, that we don't assume the applications will call any vhost library
> functions any more, so that we don't need the callback, and we don't need
> the rte_eth_vhost_portid2vdev(), either. Again, just let it be a fare
> normal (nothing special) pmd driver.  (UNLESS, there is a real must, which
> I don't see so far).
>
> Tetsuya, what do you think of that then?

I agree with you. But will wait a few days.
Because if someone wants to use it from vhost PMD, they probably will
provides use cases.
And if there are no use cases, let's do like above.

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 2/3] vhost: Add VHOST PMD
  2015-11-24  9:00                         ` [PATCH v5 2/3] " Tetsuya Mukawa
@ 2015-12-18  7:45                           ` Yuanhan Liu
  2015-12-18  9:25                             ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-12-18  7:45 UTC (permalink / raw)
  To: Tetsuya Mukawa, Xie, Huawei; +Cc: dev, ann.zhuangyanying, yuanhan.liu

On Tue, Nov 24, 2015 at 06:00:02PM +0900, Tetsuya Mukawa wrote:
> +static uint16_t
> +eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
> +	struct vhost_queue *r = q;
> +	uint16_t i, nb_rx = 0;
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		return 0;
> +
> +	rte_atomic32_set(&r->while_queuing, 1);
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		goto out;

I'm not quite famililar with PMD driver yet, but as far as I know,
rte_eth_rx/tx_burst() does not provide any garantee on concurrent
dequeuing/queuing. If that's true, vhost pmd could (and should)
not do that, to keep the consistency.

On the other hand, rte_vhost_dequeue/enqueue_burst() already has
such support since the beginning: above check is redundant then.
However, FYI, Huawei has just (internally) proposed to remove
it, as he thinks that's application's duty.

So, in neither way, we should not do that.

	--yliu 

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 2/3] vhost: Add VHOST PMD
  2015-12-18  7:45                           ` Yuanhan Liu
@ 2015-12-18  9:25                             ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-12-18  9:25 UTC (permalink / raw)
  To: Yuanhan Liu, Xie, Huawei; +Cc: dev, ann.zhuangyanying, yuanhan.liu

On 2015/12/18 16:45, Yuanhan Liu wrote:
> On Tue, Nov 24, 2015 at 06:00:02PM +0900, Tetsuya Mukawa wrote:
>> +static uint16_t
>> +eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
>> +{
>> +	struct vhost_queue *r = q;
>> +	uint16_t i, nb_rx = 0;
>> +
>> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
>> +		return 0;
>> +
>> +	rte_atomic32_set(&r->while_queuing, 1);
>> +
>> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
>> +		goto out;
> I'm not quite famililar with PMD driver yet, but as far as I know,
> rte_eth_rx/tx_burst() does not provide any garantee on concurrent
> dequeuing/queuing. If that's true, vhost pmd could (and should)
> not do that, to keep the consistency.

Yes you are correct, but this checking isn't for that purpose.
I guess there is no rule that DPDK application should not call
rte_eth_rx/tx_burst() when link status is down.
So the application may call rte_eth_rx/tx_burst() even when vhost
backend connection isn't established.
Not to call rte_vhost_dequeue_burst() in the case, above variables are used.

Tetsuya

> On the other hand, rte_vhost_dequeue/enqueue_burst() already has
> such support since the beginning: above check is redundant then.
> However, FYI, Huawei has just (internally) proposed to remove
> it, as he thinks that's application's duty.
>
> So, in neither way, we should not do that.
>
> 	--yliu 

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-18  4:15                               ` Yuanhan Liu
  2015-12-18  4:28                                 ` Tetsuya Mukawa
@ 2015-12-18 10:03                                 ` Xie, Huawei
  2015-12-21  2:10                                   ` Tetsuya Mukawa
  1 sibling, 1 reply; 200+ messages in thread
From: Xie, Huawei @ 2015-12-18 10:03 UTC (permalink / raw)
  To: Yuanhan Liu, Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On 12/18/2015 12:15 PM, Yuanhan Liu wrote:
> On Fri, Dec 18, 2015 at 12:15:42PM +0900, Tetsuya Mukawa wrote:
>> On 2015/12/17 20:42, Yuanhan Liu wrote:
>>> On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
>>>> The vhost PMD will be a wrapper of vhost library, but some of vhost
>>>> library APIs cannot be mapped to ethdev library APIs.
>>>> Becasue of this, in some cases, we still need to use vhost library APIs
>>>> for a port created by the vhost PMD.
>>>>
>>>> Currently, when virtio device is created and destroyed, vhost library
>>>> will call one of callback handlers. The vhost PMD need to use this
>>>> pair of callback handlers to know which virtio devices are connected
>>>> actually.
>>>> Because we can register only one pair of callbacks to vhost library, if
>>>> the PMD use it, DPDK applications cannot have a way to know the events.
>>>>
>>>> This may break legacy DPDK applications that uses vhost library. To prevent
>>>> it, this patch adds one more pair of callbacks to vhost library especially
>>>> for the vhost PMD.
>>>> With the patch, legacy applications can use the vhost PMD even if they need
>>>> additional specific handling for virtio device creation and destruction.
>>>>
>>>> For example, legacy application can call
>>>> rte_vhost_enable_guest_notification() in callbacks to change setting.
>>> TBH, I never liked it since the beginning. Introducing two callbacks
>>> for one event is a bit messy, and therefore error prone.
>> I agree with you.
>>
>>> I have been thinking this occasionally last few weeks, and have came
>>> up something that we may introduce another layer callback based on
>>> the vhost pmd itself, by a new API:
>>>
>>> 	rte_eth_vhost_register_callback().
>>>
>>> And we then call those new callback inside the vhost pmd new_device()
>>> and vhost pmd destroy_device() implementations.
>>>
>>> And we could have same callbacks like vhost have, but I'm thinking
>>> that new_device() and destroy_device() doesn't sound like a good name
>>> to a PMD driver. Maybe a name like "link_state_changed" is better?
>>>
>>> What do you think of that?
>> Yes,  "link_state_changed" will be good.
>>
>> BTW, I thought it was ok that an DPDK app that used vhost PMD called
>> vhost library APIs directly.
>> But probably you may feel strangeness about it. Is this correct?
> Unluckily, that's true :)
>
>> If so, how about implementing legacy status interrupt mechanism to vhost
>> PMD?
>> For example, an DPDK app can register callback handler like
>> "examples/link_status_interrupt".
>>
>> Also, if the app doesn't call vhost library APIs directly,
>> rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
>> need to handle virtio device structure anymore.
>>
>>>
>>> On the other hand, I'm still thinking is that really necessary to let
>>> the application be able to call vhost functions like rte_vhost_enable_guest_notification()
>>> with the vhost PMD driver?
>> Basic concept of my patch is that vhost PMD will provides the features
>> that vhost library provides.
> I don't think that's necessary. Let's just treat it as a normal pmd
> driver, having nothing to do with vhost library.
>
>> How about removing rte_vhost_enable_guest_notification() from "vhost
>> library"?
>> (I also not sure what are use cases)
>> If we can do this, vhost PMD also doesn't need to take care of it.
>> Or if rte_vhost_enable_guest_notification() will be removed in the
>> future, vhost PMD is able to ignore it.
> You could either call it in vhost-pmd (which you already have done that),
> or ignore it in vhost-pmd, but dont' remove it from vhost library.
>
>> Please let me correct up my thinking about your questions.
>>  - Change concept of patch not to call vhost library APIs directly.
>> These should be wrapped by ethdev APIs.
>>  - Remove rte_eth_vhost_portid2vdev(), because of above concept changing.
>>  - Implement legacy status changed interrupt to vhost PMD instead of
>> using own callback mechanism.
>>  - Check if we can remove rte_vhost_enable_guest_notification() from
>> vhost library.
> So, how about making it __fare__ simple as the first step, to get merged
> easily, that we don't assume the applications will call any vhost library
> functions any more, so that we don't need the callback, and we don't need
> the rte_eth_vhost_portid2vdev(), either. Again, just let it be a fare
> normal (nothing special) pmd driver.  (UNLESS, there is a real must, which
> I don't see so far).
>
> Tetsuya, what do you think of that then?
>
>> Hi Xie,
>>
>> Do you know the use cases of rte_vhost_enable_guest_notification()?
If vhost runs in loop mode, it doesn't need to be notified. You have
wrapped vhost as the PMD, which is nice for OVS integration. If we
require that all PMDs could be polled by select/poll, then we could use
this API for vhost PMD, and wait on the kickfd. For physical nics, we
could wait on the fd for user space interrupt.
> Setting the arg to 0 avoids the guest kicking the virtqueue, which
> is good for performance, and we should keep it.
>
> 	--yliu
>


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-18  4:28                                 ` Tetsuya Mukawa
@ 2015-12-18 18:01                                   ` Rich Lane
  2015-12-21  2:10                                     ` Tetsuya Mukawa
  2015-12-22  3:41                                     ` Yuanhan Liu
  0 siblings, 2 replies; 200+ messages in thread
From: Rich Lane @ 2015-12-18 18:01 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

I'm using the vhost callbacks and struct virtio_net with the vhost PMD in a
few ways:

1. new_device/destroy_device: Link state change (will be covered by the
link status interrupt).
2. new_device: Add first queue to datapath.
3. vring_state_changed: Add/remove queue to datapath.
4. destroy_device: Remove all queues (vring_state_changed is not called
when qemu is killed).
5. new_device and struct virtio_net: Determine NUMA node of the VM.

The vring_state_changed callback is necessary because the VM might not be
using the maximum number of RX queues. If I boot Linux in the VM it will
start out using one RX queue, which can be changed with ethtool. The DPDK
app in the host needs to be notified that it can start sending traffic to
the new queue.

The vring_state_changed callback is also useful for guest TX queues to
avoid reading from an inactive queue.

API I'd like to have:

1. Link status interrupt.
2. New queue_state_changed callback. Unlike vring_state_changed this should
cover the first queue at new_device and removal of all queues at
destroy_device.
3. Per-queue or per-device NUMA node info.

On Thu, Dec 17, 2015 at 8:28 PM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:

> On 2015/12/18 13:15, Yuanhan Liu wrote:
> > On Fri, Dec 18, 2015 at 12:15:42PM +0900, Tetsuya Mukawa wrote:
> >> On 2015/12/17 20:42, Yuanhan Liu wrote:
> >>> On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
> >>>> The vhost PMD will be a wrapper of vhost library, but some of vhost
> >>>> library APIs cannot be mapped to ethdev library APIs.
> >>>> Becasue of this, in some cases, we still need to use vhost library
> APIs
> >>>> for a port created by the vhost PMD.
> >>>>
> >>>> Currently, when virtio device is created and destroyed, vhost library
> >>>> will call one of callback handlers. The vhost PMD need to use this
> >>>> pair of callback handlers to know which virtio devices are connected
> >>>> actually.
> >>>> Because we can register only one pair of callbacks to vhost library,
> if
> >>>> the PMD use it, DPDK applications cannot have a way to know the
> events.
> >>>>
> >>>> This may break legacy DPDK applications that uses vhost library. To
> prevent
> >>>> it, this patch adds one more pair of callbacks to vhost library
> especially
> >>>> for the vhost PMD.
> >>>> With the patch, legacy applications can use the vhost PMD even if
> they need
> >>>> additional specific handling for virtio device creation and
> destruction.
> >>>>
> >>>> For example, legacy application can call
> >>>> rte_vhost_enable_guest_notification() in callbacks to change setting.
> >>> TBH, I never liked it since the beginning. Introducing two callbacks
> >>> for one event is a bit messy, and therefore error prone.
> >> I agree with you.
> >>
> >>> I have been thinking this occasionally last few weeks, and have came
> >>> up something that we may introduce another layer callback based on
> >>> the vhost pmd itself, by a new API:
> >>>
> >>>     rte_eth_vhost_register_callback().
> >>>
> >>> And we then call those new callback inside the vhost pmd new_device()
> >>> and vhost pmd destroy_device() implementations.
> >>>
> >>> And we could have same callbacks like vhost have, but I'm thinking
> >>> that new_device() and destroy_device() doesn't sound like a good name
> >>> to a PMD driver. Maybe a name like "link_state_changed" is better?
> >>>
> >>> What do you think of that?
> >> Yes,  "link_state_changed" will be good.
> >>
> >> BTW, I thought it was ok that an DPDK app that used vhost PMD called
> >> vhost library APIs directly.
> >> But probably you may feel strangeness about it. Is this correct?
> > Unluckily, that's true :)
> >
> >> If so, how about implementing legacy status interrupt mechanism to vhost
> >> PMD?
> >> For example, an DPDK app can register callback handler like
> >> "examples/link_status_interrupt".
> >>
> >> Also, if the app doesn't call vhost library APIs directly,
> >> rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
> >> need to handle virtio device structure anymore.
> >>
> >>>
> >>> On the other hand, I'm still thinking is that really necessary to let
> >>> the application be able to call vhost functions like
> rte_vhost_enable_guest_notification()
> >>> with the vhost PMD driver?
> >> Basic concept of my patch is that vhost PMD will provides the features
> >> that vhost library provides.
> > I don't think that's necessary. Let's just treat it as a normal pmd
> > driver, having nothing to do with vhost library.
> >
> >> How about removing rte_vhost_enable_guest_notification() from "vhost
> >> library"?
> >> (I also not sure what are use cases)
> >> If we can do this, vhost PMD also doesn't need to take care of it.
> >> Or if rte_vhost_enable_guest_notification() will be removed in the
> >> future, vhost PMD is able to ignore it.
> > You could either call it in vhost-pmd (which you already have done that),
> > or ignore it in vhost-pmd, but dont' remove it from vhost library.
> >
> >> Please let me correct up my thinking about your questions.
> >>  - Change concept of patch not to call vhost library APIs directly.
> >> These should be wrapped by ethdev APIs.
> >>  - Remove rte_eth_vhost_portid2vdev(), because of above concept
> changing.
> >>  - Implement legacy status changed interrupt to vhost PMD instead of
> >> using own callback mechanism.
> >>  - Check if we can remove rte_vhost_enable_guest_notification() from
> >> vhost library.
> > So, how about making it __fare__ simple as the first step, to get merged
> > easily, that we don't assume the applications will call any vhost library
> > functions any more, so that we don't need the callback, and we don't need
> > the rte_eth_vhost_portid2vdev(), either. Again, just let it be a fare
> > normal (nothing special) pmd driver.  (UNLESS, there is a real must,
> which
> > I don't see so far).
> >
> > Tetsuya, what do you think of that then?
>
> I agree with you. But will wait a few days.
> Because if someone wants to use it from vhost PMD, they probably will
> provides use cases.
> And if there are no use cases, let's do like above.
>
> Thanks,
> Tetsuya
>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-18 10:03                                 ` Xie, Huawei
@ 2015-12-21  2:10                                   ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-12-21  2:10 UTC (permalink / raw)
  To: Xie, Huawei, Yuanhan Liu; +Cc: dev, ann.zhuangyanying

On 2015/12/18 19:03, Xie, Huawei wrote:
> On 12/18/2015 12:15 PM, Yuanhan Liu wrote:
>> On Fri, Dec 18, 2015 at 12:15:42PM +0900, Tetsuya Mukawa wrote:
>>> On 2015/12/17 20:42, Yuanhan Liu wrote:
>>>> On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
>>>>> The vhost PMD will be a wrapper of vhost library, but some of vhost
>>>>> library APIs cannot be mapped to ethdev library APIs.
>>>>> Becasue of this, in some cases, we still need to use vhost library APIs
>>>>> for a port created by the vhost PMD.
>>>>>
>>>>> Currently, when virtio device is created and destroyed, vhost library
>>>>> will call one of callback handlers. The vhost PMD need to use this
>>>>> pair of callback handlers to know which virtio devices are connected
>>>>> actually.
>>>>> Because we can register only one pair of callbacks to vhost library, if
>>>>> the PMD use it, DPDK applications cannot have a way to know the events.
>>>>>
>>>>> This may break legacy DPDK applications that uses vhost library. To prevent
>>>>> it, this patch adds one more pair of callbacks to vhost library especially
>>>>> for the vhost PMD.
>>>>> With the patch, legacy applications can use the vhost PMD even if they need
>>>>> additional specific handling for virtio device creation and destruction.
>>>>>
>>>>> For example, legacy application can call
>>>>> rte_vhost_enable_guest_notification() in callbacks to change setting.
>>>> TBH, I never liked it since the beginning. Introducing two callbacks
>>>> for one event is a bit messy, and therefore error prone.
>>> I agree with you.
>>>
>>>> I have been thinking this occasionally last few weeks, and have came
>>>> up something that we may introduce another layer callback based on
>>>> the vhost pmd itself, by a new API:
>>>>
>>>> 	rte_eth_vhost_register_callback().
>>>>
>>>> And we then call those new callback inside the vhost pmd new_device()
>>>> and vhost pmd destroy_device() implementations.
>>>>
>>>> And we could have same callbacks like vhost have, but I'm thinking
>>>> that new_device() and destroy_device() doesn't sound like a good name
>>>> to a PMD driver. Maybe a name like "link_state_changed" is better?
>>>>
>>>> What do you think of that?
>>> Yes,  "link_state_changed" will be good.
>>>
>>> BTW, I thought it was ok that an DPDK app that used vhost PMD called
>>> vhost library APIs directly.
>>> But probably you may feel strangeness about it. Is this correct?
>> Unluckily, that's true :)
>>
>>> If so, how about implementing legacy status interrupt mechanism to vhost
>>> PMD?
>>> For example, an DPDK app can register callback handler like
>>> "examples/link_status_interrupt".
>>>
>>> Also, if the app doesn't call vhost library APIs directly,
>>> rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
>>> need to handle virtio device structure anymore.
>>>
>>>> On the other hand, I'm still thinking is that really necessary to let
>>>> the application be able to call vhost functions like rte_vhost_enable_guest_notification()
>>>> with the vhost PMD driver?
>>> Basic concept of my patch is that vhost PMD will provides the features
>>> that vhost library provides.
>> I don't think that's necessary. Let's just treat it as a normal pmd
>> driver, having nothing to do with vhost library.
>>
>>> How about removing rte_vhost_enable_guest_notification() from "vhost
>>> library"?
>>> (I also not sure what are use cases)
>>> If we can do this, vhost PMD also doesn't need to take care of it.
>>> Or if rte_vhost_enable_guest_notification() will be removed in the
>>> future, vhost PMD is able to ignore it.
>> You could either call it in vhost-pmd (which you already have done that),
>> or ignore it in vhost-pmd, but dont' remove it from vhost library.
>>
>>> Please let me correct up my thinking about your questions.
>>>  - Change concept of patch not to call vhost library APIs directly.
>>> These should be wrapped by ethdev APIs.
>>>  - Remove rte_eth_vhost_portid2vdev(), because of above concept changing.
>>>  - Implement legacy status changed interrupt to vhost PMD instead of
>>> using own callback mechanism.
>>>  - Check if we can remove rte_vhost_enable_guest_notification() from
>>> vhost library.
>> So, how about making it __fare__ simple as the first step, to get merged
>> easily, that we don't assume the applications will call any vhost library
>> functions any more, so that we don't need the callback, and we don't need
>> the rte_eth_vhost_portid2vdev(), either. Again, just let it be a fare
>> normal (nothing special) pmd driver.  (UNLESS, there is a real must, which
>> I don't see so far).
>>
>> Tetsuya, what do you think of that then?
>>
>>> Hi Xie,
>>>
>>> Do you know the use cases of rte_vhost_enable_guest_notification()?
> If vhost runs in loop mode, it doesn't need to be notified. You have
> wrapped vhost as the PMD, which is nice for OVS integration. If we
> require that all PMDs could be polled by select/poll, then we could use
> this API for vhost PMD, and wait on the kickfd. For physical nics, we
> could wait on the fd for user space interrupt.

Thanks for clarification.
I will ignore the function for first release of vhost PMD.

Thanks,
Tetsuya


>> Setting the arg to 0 avoids the guest kicking the virtqueue, which
>> is good for performance, and we should keep it.
>>
>> 	--yliu
>>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-18 18:01                                   ` Rich Lane
@ 2015-12-21  2:10                                     ` Tetsuya Mukawa
  2015-12-22  4:36                                       ` Yuanhan Liu
  2015-12-22  3:41                                     ` Yuanhan Liu
  1 sibling, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-12-21  2:10 UTC (permalink / raw)
  To: Rich Lane, Yuanhan Liu; +Cc: dev, ann.zhuangyanying

On 2015/12/19 3:01, Rich Lane wrote:
> I'm using the vhost callbacks and struct virtio_net with the vhost PMD in a
> few ways:
>
> 1. new_device/destroy_device: Link state change (will be covered by the
> link status interrupt).
> 2. new_device: Add first queue to datapath.
> 3. vring_state_changed: Add/remove queue to datapath.
> 4. destroy_device: Remove all queues (vring_state_changed is not called
> when qemu is killed).
> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
>
> The vring_state_changed callback is necessary because the VM might not be
> using the maximum number of RX queues. If I boot Linux in the VM it will
> start out using one RX queue, which can be changed with ethtool. The DPDK
> app in the host needs to be notified that it can start sending traffic to
> the new queue.
>
> The vring_state_changed callback is also useful for guest TX queues to
> avoid reading from an inactive queue.
>
> API I'd like to have:
>
> 1. Link status interrupt.
> 2. New queue_state_changed callback. Unlike vring_state_changed this should
> cover the first queue at new_device and removal of all queues at
> destroy_device.
> 3. Per-queue or per-device NUMA node info.

Hi Rich and Yuanhan,

As Rich described, some users needs more information when the interrupts
comes.
And the virtio_net structure contains the information.

I guess it's very similar to interrupt handling of normal hardware.
First, a interrupt comes, then an interrupt handler checks status
register of the device to know actually what was happened.
In vhost PMD case, reading status register equals reading virtio_net
structure.

So how about below specification?

1. The link status interrupt of vhost PMD will occurs when new_device,
destroy_device and vring_state_changed events are happened.
2. Vhost PMD provides a function to let the users know virtio_net
structure of the interrupted port.
   (Probably almost same as "rte_eth_vhost_portid2vdev" that I described
in "[PATCH v5 3/3] vhost: Add helper function to convert port id to
virtio device pointer")

I guess what kind of information the users need will depends on their
environments.
So just providing virtio_net structure may be good.
What do you think?

Tetsuya,

>
> On Thu, Dec 17, 2015 at 8:28 PM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:
>
>> On 2015/12/18 13:15, Yuanhan Liu wrote:
>>> On Fri, Dec 18, 2015 at 12:15:42PM +0900, Tetsuya Mukawa wrote:
>>>> On 2015/12/17 20:42, Yuanhan Liu wrote:
>>>>> On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
>>>>>> The vhost PMD will be a wrapper of vhost library, but some of vhost
>>>>>> library APIs cannot be mapped to ethdev library APIs.
>>>>>> Becasue of this, in some cases, we still need to use vhost library
>> APIs
>>>>>> for a port created by the vhost PMD.
>>>>>>
>>>>>> Currently, when virtio device is created and destroyed, vhost library
>>>>>> will call one of callback handlers. The vhost PMD need to use this
>>>>>> pair of callback handlers to know which virtio devices are connected
>>>>>> actually.
>>>>>> Because we can register only one pair of callbacks to vhost library,
>> if
>>>>>> the PMD use it, DPDK applications cannot have a way to know the
>> events.
>>>>>> This may break legacy DPDK applications that uses vhost library. To
>> prevent
>>>>>> it, this patch adds one more pair of callbacks to vhost library
>> especially
>>>>>> for the vhost PMD.
>>>>>> With the patch, legacy applications can use the vhost PMD even if
>> they need
>>>>>> additional specific handling for virtio device creation and
>> destruction.
>>>>>> For example, legacy application can call
>>>>>> rte_vhost_enable_guest_notification() in callbacks to change setting.
>>>>> TBH, I never liked it since the beginning. Introducing two callbacks
>>>>> for one event is a bit messy, and therefore error prone.
>>>> I agree with you.
>>>>
>>>>> I have been thinking this occasionally last few weeks, and have came
>>>>> up something that we may introduce another layer callback based on
>>>>> the vhost pmd itself, by a new API:
>>>>>
>>>>>     rte_eth_vhost_register_callback().
>>>>>
>>>>> And we then call those new callback inside the vhost pmd new_device()
>>>>> and vhost pmd destroy_device() implementations.
>>>>>
>>>>> And we could have same callbacks like vhost have, but I'm thinking
>>>>> that new_device() and destroy_device() doesn't sound like a good name
>>>>> to a PMD driver. Maybe a name like "link_state_changed" is better?
>>>>>
>>>>> What do you think of that?
>>>> Yes,  "link_state_changed" will be good.
>>>>
>>>> BTW, I thought it was ok that an DPDK app that used vhost PMD called
>>>> vhost library APIs directly.
>>>> But probably you may feel strangeness about it. Is this correct?
>>> Unluckily, that's true :)
>>>
>>>> If so, how about implementing legacy status interrupt mechanism to vhost
>>>> PMD?
>>>> For example, an DPDK app can register callback handler like
>>>> "examples/link_status_interrupt".
>>>>
>>>> Also, if the app doesn't call vhost library APIs directly,
>>>> rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
>>>> need to handle virtio device structure anymore.
>>>>
>>>>> On the other hand, I'm still thinking is that really necessary to let
>>>>> the application be able to call vhost functions like
>> rte_vhost_enable_guest_notification()
>>>>> with the vhost PMD driver?
>>>> Basic concept of my patch is that vhost PMD will provides the features
>>>> that vhost library provides.
>>> I don't think that's necessary. Let's just treat it as a normal pmd
>>> driver, having nothing to do with vhost library.
>>>
>>>> How about removing rte_vhost_enable_guest_notification() from "vhost
>>>> library"?
>>>> (I also not sure what are use cases)
>>>> If we can do this, vhost PMD also doesn't need to take care of it.
>>>> Or if rte_vhost_enable_guest_notification() will be removed in the
>>>> future, vhost PMD is able to ignore it.
>>> You could either call it in vhost-pmd (which you already have done that),
>>> or ignore it in vhost-pmd, but dont' remove it from vhost library.
>>>
>>>> Please let me correct up my thinking about your questions.
>>>>  - Change concept of patch not to call vhost library APIs directly.
>>>> These should be wrapped by ethdev APIs.
>>>>  - Remove rte_eth_vhost_portid2vdev(), because of above concept
>> changing.
>>>>  - Implement legacy status changed interrupt to vhost PMD instead of
>>>> using own callback mechanism.
>>>>  - Check if we can remove rte_vhost_enable_guest_notification() from
>>>> vhost library.
>>> So, how about making it __fare__ simple as the first step, to get merged
>>> easily, that we don't assume the applications will call any vhost library
>>> functions any more, so that we don't need the callback, and we don't need
>>> the rte_eth_vhost_portid2vdev(), either. Again, just let it be a fare
>>> normal (nothing special) pmd driver.  (UNLESS, there is a real must,
>> which
>>> I don't see so far).
>>>
>>> Tetsuya, what do you think of that then?
>> I agree with you. But will wait a few days.
>> Because if someone wants to use it from vhost PMD, they probably will
>> provides use cases.
>> And if there are no use cases, let's do like above.
>>
>> Thanks,
>> Tetsuya
>>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-18 18:01                                   ` Rich Lane
  2015-12-21  2:10                                     ` Tetsuya Mukawa
@ 2015-12-22  3:41                                     ` Yuanhan Liu
  2015-12-22  4:47                                       ` Rich Lane
  1 sibling, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-12-22  3:41 UTC (permalink / raw)
  To: Rich Lane; +Cc: dev, ann.zhuangyanying

On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
> I'm using the vhost callbacks and struct virtio_net with the vhost PMD in a few
> ways:

Rich, thanks for the info!

> 
> 1. new_device/destroy_device: Link state change (will be covered by the link
> status interrupt).
> 2. new_device: Add first queue to datapath.

I'm wondering why vring_state_changed() is not used, as it will also be
triggered at the beginning, when the default queue (the first queue) is
enabled.

> 3. vring_state_changed: Add/remove queue to datapath.
> 4. destroy_device: Remove all queues (vring_state_changed is not called when
> qemu is killed).

I had a plan to invoke vring_state_changed() to disable all vrings
when destroy_device() is called.

> 5. new_device and struct virtio_net: Determine NUMA node of the VM.

You can get the 'struct virtio_net' dev from all above callbacks.
 
> 
> The vring_state_changed callback is necessary because the VM might not be using
> the maximum number of RX queues. If I boot Linux in the VM it will start out
> using one RX queue, which can be changed with ethtool. The DPDK app in the host
> needs to be notified that it can start sending traffic to the new queue.
> 
> The vring_state_changed callback is also useful for guest TX queues to avoid
> reading from an inactive queue.
> 
> API I'd like to have:
> 
> 1. Link status interrupt.

To vhost pmd, new_device()/destroy_device() equals to the link status
interrupt, where new_device() is a link up, and destroy_device() is link
down().


> 2. New queue_state_changed callback. Unlike vring_state_changed this should
> cover the first queue at new_device and removal of all queues at
> destroy_device.

As stated above, vring_state_changed() should be able to do that, except
the one on destroy_device(), which is not done yet.

> 3. Per-queue or per-device NUMA node info.

You can query the NUMA node info implicitly by get_mempolicy(); check
numa_realloc() at lib/librte_vhost/virtio-net.c for reference.

	--yliu
> 
> On Thu, Dec 17, 2015 at 8:28 PM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:
> 
>     On 2015/12/18 13:15, Yuanhan Liu wrote:
>     > On Fri, Dec 18, 2015 at 12:15:42PM +0900, Tetsuya Mukawa wrote:
>     >> On 2015/12/17 20:42, Yuanhan Liu wrote:
>     >>> On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
>     >>>> The vhost PMD will be a wrapper of vhost library, but some of vhost
>     >>>> library APIs cannot be mapped to ethdev library APIs.
>     >>>> Becasue of this, in some cases, we still need to use vhost library
>     APIs
>     >>>> for a port created by the vhost PMD.
>     >>>>
>     >>>> Currently, when virtio device is created and destroyed, vhost library
>     >>>> will call one of callback handlers. The vhost PMD need to use this
>     >>>> pair of callback handlers to know which virtio devices are connected
>     >>>> actually.
>     >>>> Because we can register only one pair of callbacks to vhost library,
>     if
>     >>>> the PMD use it, DPDK applications cannot have a way to know the
>     events.
>     >>>>
>     >>>> This may break legacy DPDK applications that uses vhost library. To
>     prevent
>     >>>> it, this patch adds one more pair of callbacks to vhost library
>     especially
>     >>>> for the vhost PMD.
>     >>>> With the patch, legacy applications can use the vhost PMD even if they
>     need
>     >>>> additional specific handling for virtio device creation and
>     destruction.
>     >>>>
>     >>>> For example, legacy application can call
>     >>>> rte_vhost_enable_guest_notification() in callbacks to change setting.
>     >>> TBH, I never liked it since the beginning. Introducing two callbacks
>     >>> for one event is a bit messy, and therefore error prone.
>     >> I agree with you.
>     >>
>     >>> I have been thinking this occasionally last few weeks, and have came
>     >>> up something that we may introduce another layer callback based on
>     >>> the vhost pmd itself, by a new API:
>     >>>
>     >>>     rte_eth_vhost_register_callback().
>     >>>
>     >>> And we then call those new callback inside the vhost pmd new_device()
>     >>> and vhost pmd destroy_device() implementations.
>     >>>
>     >>> And we could have same callbacks like vhost have, but I'm thinking
>     >>> that new_device() and destroy_device() doesn't sound like a good name
>     >>> to a PMD driver. Maybe a name like "link_state_changed" is better?
>     >>>
>     >>> What do you think of that?
>     >> Yes,  "link_state_changed" will be good.
>     >>
>     >> BTW, I thought it was ok that an DPDK app that used vhost PMD called
>     >> vhost library APIs directly.
>     >> But probably you may feel strangeness about it. Is this correct?
>     > Unluckily, that's true :)
>     >
>     >> If so, how about implementing legacy status interrupt mechanism to vhost
>     >> PMD?
>     >> For example, an DPDK app can register callback handler like
>     >> "examples/link_status_interrupt".
>     >>
>     >> Also, if the app doesn't call vhost library APIs directly,
>     >> rte_eth_vhost_portid2vdev() will be needless, because the app doesn't
>     >> need to handle virtio device structure anymore.
>     >>
>     >>>
>     >>> On the other hand, I'm still thinking is that really necessary to let
>     >>> the application be able to call vhost functions like
>     rte_vhost_enable_guest_notification()
>     >>> with the vhost PMD driver?
>     >> Basic concept of my patch is that vhost PMD will provides the features
>     >> that vhost library provides.
>     > I don't think that's necessary. Let's just treat it as a normal pmd
>     > driver, having nothing to do with vhost library.
>     >
>     >> How about removing rte_vhost_enable_guest_notification() from "vhost
>     >> library"?
>     >> (I also not sure what are use cases)
>     >> If we can do this, vhost PMD also doesn't need to take care of it.
>     >> Or if rte_vhost_enable_guest_notification() will be removed in the
>     >> future, vhost PMD is able to ignore it.
>     > You could either call it in vhost-pmd (which you already have done that),
>     > or ignore it in vhost-pmd, but dont' remove it from vhost library.
>     >
>     >> Please let me correct up my thinking about your questions.
>     >>  - Change concept of patch not to call vhost library APIs directly.
>     >> These should be wrapped by ethdev APIs.
>     >>  - Remove rte_eth_vhost_portid2vdev(), because of above concept
>     changing.
>     >>  - Implement legacy status changed interrupt to vhost PMD instead of
>     >> using own callback mechanism.
>     >>  - Check if we can remove rte_vhost_enable_guest_notification() from
>     >> vhost library.
>     > So, how about making it __fare__ simple as the first step, to get merged
>     > easily, that we don't assume the applications will call any vhost library
>     > functions any more, so that we don't need the callback, and we don't need
>     > the rte_eth_vhost_portid2vdev(), either. Again, just let it be a fare
>     > normal (nothing special) pmd driver.  (UNLESS, there is a real must,
>     which
>     > I don't see so far).
>     >
>     > Tetsuya, what do you think of that then?
> 
>     I agree with you. But will wait a few days.
>     Because if someone wants to use it from vhost PMD, they probably will
>     provides use cases.
>     And if there are no use cases, let's do like above.
> 
>     Thanks,
>     Tetsuya
> 
> 

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-21  2:10                                     ` Tetsuya Mukawa
@ 2015-12-22  4:36                                       ` Yuanhan Liu
  0 siblings, 0 replies; 200+ messages in thread
From: Yuanhan Liu @ 2015-12-22  4:36 UTC (permalink / raw)
  To: Tetsuya Mukawa, Xie, Huawei; +Cc: dev, ann.zhuangyanying

On Mon, Dec 21, 2015 at 11:10:10AM +0900, Tetsuya Mukawa wrote:
> nes: 168
> 
> On 2015/12/19 3:01, Rich Lane wrote:
> > I'm using the vhost callbacks and struct virtio_net with the vhost PMD in a
> > few ways:
> >
> > 1. new_device/destroy_device: Link state change (will be covered by the
> > link status interrupt).
> > 2. new_device: Add first queue to datapath.
> > 3. vring_state_changed: Add/remove queue to datapath.
> > 4. destroy_device: Remove all queues (vring_state_changed is not called
> > when qemu is killed).
> > 5. new_device and struct virtio_net: Determine NUMA node of the VM.
> >
> > The vring_state_changed callback is necessary because the VM might not be
> > using the maximum number of RX queues. If I boot Linux in the VM it will
> > start out using one RX queue, which can be changed with ethtool. The DPDK
> > app in the host needs to be notified that it can start sending traffic to
> > the new queue.
> >
> > The vring_state_changed callback is also useful for guest TX queues to
> > avoid reading from an inactive queue.
> >
> > API I'd like to have:
> >
> > 1. Link status interrupt.
> > 2. New queue_state_changed callback. Unlike vring_state_changed this should
> > cover the first queue at new_device and removal of all queues at
> > destroy_device.
> > 3. Per-queue or per-device NUMA node info.
> 
> Hi Rich and Yuanhan,
> 
> As Rich described, some users needs more information when the interrupts
> comes.
> And the virtio_net structure contains the information.
> 
> I guess it's very similar to interrupt handling of normal hardware.
> First, a interrupt comes, then an interrupt handler checks status
> register of the device to know actually what was happened.
> In vhost PMD case, reading status register equals reading virtio_net
> structure.
> 
> So how about below specification?
> 
> 1. The link status interrupt of vhost PMD will occurs when new_device,
> destroy_device and vring_state_changed events are happened.
> 2. Vhost PMD provides a function to let the users know virtio_net
> structure of the interrupted port.
>    (Probably almost same as "rte_eth_vhost_portid2vdev" that I described
> in "[PATCH v5 3/3] vhost: Add helper function to convert port id to
> virtio device pointer")

That is one option, to wrapper everything into the link status interrupt
handler, and let it to query the virtio_net structure to know what exactly
happened, and then take the proper action.

With that, we could totally get rid of the "two sets of vhost callbacks".
The interface is a also clean. However, there is a drawback: it's not
that extensible: what if vhost introduces a new vhost callback, and it
does not literally belong to a link status interrupt event?

The another options is to introduce a new set of callbacks (not based on
vhost, but based on vhost-pmd as I suggested before). Here we could
rename the callback to make it looks more reasonable to a pmd driver,
say, remove the new_device()/destroy_device() and combine them as one
callback: link_status_changed.

The good thing about that is it's extensible; we could easily add a
new callback when there is a new one at vhost. However, it's not that
clean as the first one. Besides that, two sets of callback for vhost
is always weird to me.

Thoughts, comments?

	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-22  3:41                                     ` Yuanhan Liu
@ 2015-12-22  4:47                                       ` Rich Lane
  2015-12-22  5:47                                         ` Yuanhan Liu
  2015-12-24  3:09                                         ` Tetsuya Mukawa
  0 siblings, 2 replies; 200+ messages in thread
From: Rich Lane @ 2015-12-22  4:47 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, ann.zhuangyanying

On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:

> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
> > I'm using the vhost callbacks and struct virtio_net with the vhost PMD
> in a few
> > ways:
>
> Rich, thanks for the info!
>
> >
> > 1. new_device/destroy_device: Link state change (will be covered by the
> link
> > status interrupt).
> > 2. new_device: Add first queue to datapath.
>
> I'm wondering why vring_state_changed() is not used, as it will also be
> triggered at the beginning, when the default queue (the first queue) is
> enabled.
>

Turns out I'd misread the code and it's already using the
vring_state_changed callback for the
first queue. Not sure if this is intentional but vring_state_changed is
called for the first queue
before new_device.


> > 3. vring_state_changed: Add/remove queue to datapath.
> > 4. destroy_device: Remove all queues (vring_state_changed is not called
> when
> > qemu is killed).
>
> I had a plan to invoke vring_state_changed() to disable all vrings
> when destroy_device() is called.
>

That would be good.


> > 5. new_device and struct virtio_net: Determine NUMA node of the VM.
>
> You can get the 'struct virtio_net' dev from all above callbacks.



> 1. Link status interrupt.
>
> To vhost pmd, new_device()/destroy_device() equals to the link status
> interrupt, where new_device() is a link up, and destroy_device() is link
> down().
>
>
> > 2. New queue_state_changed callback. Unlike vring_state_changed this
> should
> > cover the first queue at new_device and removal of all queues at
> > destroy_device.
>
> As stated above, vring_state_changed() should be able to do that, except
> the one on destroy_device(), which is not done yet.
>
> > 3. Per-queue or per-device NUMA node info.
>
> You can query the NUMA node info implicitly by get_mempolicy(); check
> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
>

Your suggestions are exactly how my application is already working. I was
commenting on the
proposed changes to the vhost PMD API. I would prefer to
use RTE_ETH_EVENT_INTR_LSC
and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
of these vhost-specific
hacks. The queue state change callback is the one new API that needs to be
added because
normal NICs don't have this behavior.

You could add another rte_eth_event_type for the queue state change
callback, and pass the
queue ID, RX/TX direction, and enable bit through cb_arg. The application
would never need
to touch struct virtio_net.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-22  4:47                                       ` Rich Lane
@ 2015-12-22  5:47                                         ` Yuanhan Liu
  2015-12-22  9:38                                           ` Rich Lane
  2015-12-24  3:09                                         ` Tetsuya Mukawa
  1 sibling, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-12-22  5:47 UTC (permalink / raw)
  To: Rich Lane; +Cc: dev, ann.zhuangyanying

On Mon, Dec 21, 2015 at 08:47:28PM -0800, Rich Lane wrote:
> On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> wrote:
> 
>     On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
>     > I'm using the vhost callbacks and struct virtio_net with the vhost PMD in
>     a few
>     > ways:
> 
>     Rich, thanks for the info!
>    
>     >
>     > 1. new_device/destroy_device: Link state change (will be covered by the
>     link
>     > status interrupt).
>     > 2. new_device: Add first queue to datapath.
> 
>     I'm wondering why vring_state_changed() is not used, as it will also be
>     triggered at the beginning, when the default queue (the first queue) is
>     enabled.
> 
> 
> Turns out I'd misread the code and it's already using the vring_state_changed
> callback for the
> first queue. Not sure if this is intentional but vring_state_changed is called
> for the first queue
> before new_device.

Yeah, you were right, we can't count on this vring_state_changed(), for
it's sent earlier than vring has been initialized on vhost side. Maybe
we should invoke vring_state_changed() callback at new_device() as well.

>  
> 
>     > 3. vring_state_changed: Add/remove queue to datapath.
>     > 4. destroy_device: Remove all queues (vring_state_changed is not called
>     when
>     > qemu is killed).
> 
>     I had a plan to invoke vring_state_changed() to disable all vrings
>     when destroy_device() is called.
> 
> 
> That would be good.
>  
> 
>     > 5. new_device and struct virtio_net: Determine NUMA node of the VM.
> 
>     You can get the 'struct virtio_net' dev from all above callbacks.
> 
>      
> 
>     > 1. Link status interrupt.
> 
>     To vhost pmd, new_device()/destroy_device() equals to the link status
>     interrupt, where new_device() is a link up, and destroy_device() is link
>     down().
>    
> 
>     > 2. New queue_state_changed callback. Unlike vring_state_changed this
>     should
>     > cover the first queue at new_device and removal of all queues at
>     > destroy_device.
> 
>     As stated above, vring_state_changed() should be able to do that, except
>     the one on destroy_device(), which is not done yet.
>    
>     > 3. Per-queue or per-device NUMA node info.
> 
>     You can query the NUMA node info implicitly by get_mempolicy(); check
>     numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
> 
> 
> Your suggestions are exactly how my application is already working. I was
> commenting on the
> proposed changes to the vhost PMD API. I would prefer to
> use RTE_ETH_EVENT_INTR_LSC
> and rte_eth_dev_socket_id for consistency with other NIC drivers, instead of
> these vhost-specific
> hacks.

That's a good suggestion.

> The queue state change callback is the one new API that needs to be
> added because
> normal NICs don't have this behavior.

Again I'd ask, will vring_state_changed() be enough, when above issues
are resolved: vring_state_changed() will be invoked at new_device()/
destroy_device(), and of course, ethtool change?

	--yliu

> 
> You could add another rte_eth_event_type for the queue state change callback,
> and pass the
> queue ID, RX/TX direction, and enable bit through cb_arg. The application would
> never need
> to touch struct virtio_net.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-22  5:47                                         ` Yuanhan Liu
@ 2015-12-22  9:38                                           ` Rich Lane
  2015-12-23  2:44                                             ` Yuanhan Liu
  0 siblings, 1 reply; 200+ messages in thread
From: Rich Lane @ 2015-12-22  9:38 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, ann.zhuangyanying

On Mon, Dec 21, 2015 at 9:47 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:

> On Mon, Dec 21, 2015 at 08:47:28PM -0800, Rich Lane wrote:
> > The queue state change callback is the one new API that needs to be
> > added because
> > normal NICs don't have this behavior.
>
> Again I'd ask, will vring_state_changed() be enough, when above issues
> are resolved: vring_state_changed() will be invoked at new_device()/
> destroy_device(), and of course, ethtool change?


It would be sufficient. It is not a great API though, because it requires
the
application to do the conversion from struct virtio_net to a DPDK port
number,
and from a virtqueue index to a DPDK queue id and direction. Also, the
current
implementation often makes this callback when the vring state has not
actually
changed (enabled -> enabled and disabled -> disabled).

If you're asking about using vring_state_changed() _instead_ of the link
status
event and rte_eth_dev_socket_id(), then yes, it still works. I'd only
consider
that a stopgap until the real ethdev APIs are implemented.

I'd suggest to add RTE_ETH_EVENT_QUEUE_STATE_CHANGE rather than
create another callback registration API.

Perhaps we could merge the basic PMD which I think is pretty solid and then
continue the API discussion with patches to it.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-22  9:38                                           ` Rich Lane
@ 2015-12-23  2:44                                             ` Yuanhan Liu
  2015-12-23 22:00                                               ` Thomas Monjalon
  0 siblings, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-12-23  2:44 UTC (permalink / raw)
  To: Rich Lane, Thomas Monjalon; +Cc: dev, ann.zhuangyanying

On Tue, Dec 22, 2015 at 01:38:29AM -0800, Rich Lane wrote:
> On Mon, Dec 21, 2015 at 9:47 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> wrote:
> 
>     On Mon, Dec 21, 2015 at 08:47:28PM -0800, Rich Lane wrote:
>     > The queue state change callback is the one new API that needs to be
>     > added because
>     > normal NICs don't have this behavior.
> 
>     Again I'd ask, will vring_state_changed() be enough, when above issues
>     are resolved: vring_state_changed() will be invoked at new_device()/
>     destroy_device(), and of course, ethtool change?
> 
> 
> It would be sufficient. It is not a great API though, because it requires the
> application to do the conversion from struct virtio_net to a DPDK port number,
> and from a virtqueue index to a DPDK queue id and direction. Also, the current
> implementation often makes this callback when the vring state has not actually
> changed (enabled -> enabled and disabled -> disabled).
> 
> If you're asking about using vring_state_changed() _instead_ of the link status
> event and rte_eth_dev_socket_id(),

No, I like the idea of link status event and rte_eth_dev_socket_id();
I was just wondering why a new API is needed. Both Tetsuya and I
were thinking to leverage the link status event to represent the
queue stats change (triggered by vring_state_changed()) as well,
so that we don't need to introduce another eth event. However, I'd
agree that it's better if we could have a new dedicate event.

Thomas, here is some background for you. For vhost pmd and linux
virtio-net combo, the queue can be dynamically changed by ethtool,
therefore, the application wishes to have another eth event, say
RTE_ETH_EVENT_QUEUE_STATE_CHANGE, so that the application can
add/remove corresponding queue to the datapath when that happens.
What do you think of that?

> then yes, it still works. I'd only consider
> that a stopgap until the real ethdev APIs are implemented.
> 
> I'd suggest to add RTE_ETH_EVENT_QUEUE_STATE_CHANGE rather than
> create another callback registration API.
> 
> Perhaps we could merge the basic PMD which I think is pretty solid and then
> continue the API discussion with patches to it.

Perhaps, but let's see how hard it could be for the new eth event
discussion then.

	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-23  2:44                                             ` Yuanhan Liu
@ 2015-12-23 22:00                                               ` Thomas Monjalon
  2015-12-24  3:51                                                 ` Yuanhan Liu
  0 siblings, 1 reply; 200+ messages in thread
From: Thomas Monjalon @ 2015-12-23 22:00 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, ann.zhuangyanying

2015-12-23 10:44, Yuanhan Liu:
> On Tue, Dec 22, 2015 at 01:38:29AM -0800, Rich Lane wrote:
> > On Mon, Dec 21, 2015 at 9:47 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> > wrote:
> > 
> >     On Mon, Dec 21, 2015 at 08:47:28PM -0800, Rich Lane wrote:
> >     > The queue state change callback is the one new API that needs to be
> >     > added because
> >     > normal NICs don't have this behavior.
> > 
> >     Again I'd ask, will vring_state_changed() be enough, when above issues
> >     are resolved: vring_state_changed() will be invoked at new_device()/
> >     destroy_device(), and of course, ethtool change?
> > 
> > 
> > It would be sufficient. It is not a great API though, because it requires the
> > application to do the conversion from struct virtio_net to a DPDK port number,
> > and from a virtqueue index to a DPDK queue id and direction. Also, the current
> > implementation often makes this callback when the vring state has not actually
> > changed (enabled -> enabled and disabled -> disabled).
> > 
> > If you're asking about using vring_state_changed() _instead_ of the link status
> > event and rte_eth_dev_socket_id(),
> 
> No, I like the idea of link status event and rte_eth_dev_socket_id();
> I was just wondering why a new API is needed. Both Tetsuya and I
> were thinking to leverage the link status event to represent the
> queue stats change (triggered by vring_state_changed()) as well,
> so that we don't need to introduce another eth event. However, I'd
> agree that it's better if we could have a new dedicate event.
> 
> Thomas, here is some background for you. For vhost pmd and linux
> virtio-net combo, the queue can be dynamically changed by ethtool,
> therefore, the application wishes to have another eth event, say
> RTE_ETH_EVENT_QUEUE_STATE_CHANGE, so that the application can
> add/remove corresponding queue to the datapath when that happens.
> What do you think of that?

Yes it is an event. So I don't understand the question.
What may be better than a specific rte_eth_event_type?

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-22  4:47                                       ` Rich Lane
  2015-12-22  5:47                                         ` Yuanhan Liu
@ 2015-12-24  3:09                                         ` Tetsuya Mukawa
  2015-12-24  3:54                                           ` Tetsuya Mukawa
                                                             ` (2 more replies)
  1 sibling, 3 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-12-24  3:09 UTC (permalink / raw)
  To: Rich Lane, Yuanhan Liu; +Cc: dev, ann.zhuangyanying

On 2015/12/22 13:47, Rich Lane wrote:
> On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> wrote:
>
>> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
>>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
>> in a few
>>> ways:
>> Rich, thanks for the info!
>>
>>> 1. new_device/destroy_device: Link state change (will be covered by the
>> link
>>> status interrupt).
>>> 2. new_device: Add first queue to datapath.
>> I'm wondering why vring_state_changed() is not used, as it will also be
>> triggered at the beginning, when the default queue (the first queue) is
>> enabled.
>>
> Turns out I'd misread the code and it's already using the
> vring_state_changed callback for the
> first queue. Not sure if this is intentional but vring_state_changed is
> called for the first queue
> before new_device.
>
>
>>> 3. vring_state_changed: Add/remove queue to datapath.
>>> 4. destroy_device: Remove all queues (vring_state_changed is not called
>> when
>>> qemu is killed).
>> I had a plan to invoke vring_state_changed() to disable all vrings
>> when destroy_device() is called.
>>
> That would be good.
>
>
>>> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
>> You can get the 'struct virtio_net' dev from all above callbacks.
>
>
>> 1. Link status interrupt.
>>
>> To vhost pmd, new_device()/destroy_device() equals to the link status
>> interrupt, where new_device() is a link up, and destroy_device() is link
>> down().
>>
>>
>>> 2. New queue_state_changed callback. Unlike vring_state_changed this
>> should
>>> cover the first queue at new_device and removal of all queues at
>>> destroy_device.
>> As stated above, vring_state_changed() should be able to do that, except
>> the one on destroy_device(), which is not done yet.
>>
>>> 3. Per-queue or per-device NUMA node info.
>> You can query the NUMA node info implicitly by get_mempolicy(); check
>> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
>>
> Your suggestions are exactly how my application is already working. I was
> commenting on the
> proposed changes to the vhost PMD API. I would prefer to
> use RTE_ETH_EVENT_INTR_LSC
> and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
> of these vhost-specific
> hacks. The queue state change callback is the one new API that needs to be
> added because
> normal NICs don't have this behavior.
>
> You could add another rte_eth_event_type for the queue state change
> callback, and pass the
> queue ID, RX/TX direction, and enable bit through cb_arg. 

Hi Rich,

So far, EAL provides rte_eth_dev_callback_register() for event handling.
DPDK app can register callback handler and "callback argument".
And EAL will call callback handler with the argument.
Anyway, vhost library and PMD cannot change the argument.

I guess the callback handler will need to call ethdev APIs to know what
causes the interrupt.
Probably rte_eth_dev_socket_id() is to know numa_node, and
rte_eth_dev_info_get() is to know the number of queues.
Is this okay for your case?

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-23 22:00                                               ` Thomas Monjalon
@ 2015-12-24  3:51                                                 ` Yuanhan Liu
  2015-12-24  4:07                                                   ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-12-24  3:51 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, ann.zhuangyanying

On Wed, Dec 23, 2015 at 11:00:15PM +0100, Thomas Monjalon wrote:
> 2015-12-23 10:44, Yuanhan Liu:
> > On Tue, Dec 22, 2015 at 01:38:29AM -0800, Rich Lane wrote:
> > > On Mon, Dec 21, 2015 at 9:47 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> > > wrote:
> > > 
> > >     On Mon, Dec 21, 2015 at 08:47:28PM -0800, Rich Lane wrote:
> > >     > The queue state change callback is the one new API that needs to be
> > >     > added because
> > >     > normal NICs don't have this behavior.
> > > 
> > >     Again I'd ask, will vring_state_changed() be enough, when above issues
> > >     are resolved: vring_state_changed() will be invoked at new_device()/
> > >     destroy_device(), and of course, ethtool change?
> > > 
> > > 
> > > It would be sufficient. It is not a great API though, because it requires the
> > > application to do the conversion from struct virtio_net to a DPDK port number,
> > > and from a virtqueue index to a DPDK queue id and direction. Also, the current
> > > implementation often makes this callback when the vring state has not actually
> > > changed (enabled -> enabled and disabled -> disabled).
> > > 
> > > If you're asking about using vring_state_changed() _instead_ of the link status
> > > event and rte_eth_dev_socket_id(),
> > 
> > No, I like the idea of link status event and rte_eth_dev_socket_id();
> > I was just wondering why a new API is needed. Both Tetsuya and I
> > were thinking to leverage the link status event to represent the
> > queue stats change (triggered by vring_state_changed()) as well,
> > so that we don't need to introduce another eth event. However, I'd
> > agree that it's better if we could have a new dedicate event.
> > 
> > Thomas, here is some background for you. For vhost pmd and linux
> > virtio-net combo, the queue can be dynamically changed by ethtool,
> > therefore, the application wishes to have another eth event, say
> > RTE_ETH_EVENT_QUEUE_STATE_CHANGE, so that the application can
> > add/remove corresponding queue to the datapath when that happens.
> > What do you think of that?
> 
> Yes it is an event. So I don't understand the question.
> What may be better than a specific rte_eth_event_type?

The alternative is a new set of callbacks, but judging that we already
have a set of callback for vhost libraray, and adding a new set to vhost
pmd doesn't seem elegant to me.

Therefore, I'd prefer a new eth event.

	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-24  3:09                                         ` Tetsuya Mukawa
@ 2015-12-24  3:54                                           ` Tetsuya Mukawa
  2015-12-24  4:00                                           ` Yuanhan Liu
  2015-12-24  5:37                                           ` Rich Lane
  2 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-12-24  3:54 UTC (permalink / raw)
  To: Rich Lane, Yuanhan Liu; +Cc: dev, ann.zhuangyanying

On 2015/12/24 12:09, Tetsuya Mukawa wrote:
> On 2015/12/22 13:47, Rich Lane wrote:
>> On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
>> wrote:
>>
>>> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
>>>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
>>> in a few
>>>> ways:
>>> Rich, thanks for the info!
>>>
>>>> 1. new_device/destroy_device: Link state change (will be covered by the
>>> link
>>>> status interrupt).
>>>> 2. new_device: Add first queue to datapath.
>>> I'm wondering why vring_state_changed() is not used, as it will also be
>>> triggered at the beginning, when the default queue (the first queue) is
>>> enabled.
>>>
>> Turns out I'd misread the code and it's already using the
>> vring_state_changed callback for the
>> first queue. Not sure if this is intentional but vring_state_changed is
>> called for the first queue
>> before new_device.
>>
>>
>>>> 3. vring_state_changed: Add/remove queue to datapath.
>>>> 4. destroy_device: Remove all queues (vring_state_changed is not called
>>> when
>>>> qemu is killed).
>>> I had a plan to invoke vring_state_changed() to disable all vrings
>>> when destroy_device() is called.
>>>
>> That would be good.
>>
>>
>>>> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
>>> You can get the 'struct virtio_net' dev from all above callbacks.
>>
>>> 1. Link status interrupt.
>>>
>>> To vhost pmd, new_device()/destroy_device() equals to the link status
>>> interrupt, where new_device() is a link up, and destroy_device() is link
>>> down().
>>>
>>>
>>>> 2. New queue_state_changed callback. Unlike vring_state_changed this
>>> should
>>>> cover the first queue at new_device and removal of all queues at
>>>> destroy_device.
>>> As stated above, vring_state_changed() should be able to do that, except
>>> the one on destroy_device(), which is not done yet.
>>>
>>>> 3. Per-queue or per-device NUMA node info.
>>> You can query the NUMA node info implicitly by get_mempolicy(); check
>>> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
>>>
>> Your suggestions are exactly how my application is already working. I was
>> commenting on the
>> proposed changes to the vhost PMD API. I would prefer to
>> use RTE_ETH_EVENT_INTR_LSC
>> and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
>> of these vhost-specific
>> hacks. The queue state change callback is the one new API that needs to be
>> added because
>> normal NICs don't have this behavior.
>>
>> You could add another rte_eth_event_type for the queue state change
>> callback, and pass the
>> queue ID, RX/TX direction, and enable bit through cb_arg. 
> Hi Rich,
>
> So far, EAL provides rte_eth_dev_callback_register() for event handling.
> DPDK app can register callback handler and "callback argument".
> And EAL will call callback handler with the argument.
> Anyway, vhost library and PMD cannot change the argument.
>
> I guess the callback handler will need to call ethdev APIs to know what
> causes the interrupt.
> Probably rte_eth_dev_socket_id() is to know numa_node, and
> rte_eth_dev_info_get() is to know the number of queues.

It seems rte_eth_dev_info_get() is not enough, because DPDK application
needs not only the number of queues, but also which queues are enabled
or disabled at least.
I guess current interrupt mechanism and ethdev APIs cannot provides such
information, because it's something special for vhost case.

Tetsuya

> Is this okay for your case?
>
> Thanks,
> Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-24  3:09                                         ` Tetsuya Mukawa
  2015-12-24  3:54                                           ` Tetsuya Mukawa
@ 2015-12-24  4:00                                           ` Yuanhan Liu
  2015-12-24  4:23                                             ` Tetsuya Mukawa
  2015-12-24  5:37                                           ` Rich Lane
  2 siblings, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2015-12-24  4:00 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Thu, Dec 24, 2015 at 12:09:10PM +0900, Tetsuya Mukawa wrote:
> On 2015/12/22 13:47, Rich Lane wrote:
> > On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> > wrote:
> >
> >> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
> >>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
> >> in a few
> >>> ways:
> >> Rich, thanks for the info!
> >>
> >>> 1. new_device/destroy_device: Link state change (will be covered by the
> >> link
> >>> status interrupt).
> >>> 2. new_device: Add first queue to datapath.
> >> I'm wondering why vring_state_changed() is not used, as it will also be
> >> triggered at the beginning, when the default queue (the first queue) is
> >> enabled.
> >>
> > Turns out I'd misread the code and it's already using the
> > vring_state_changed callback for the
> > first queue. Not sure if this is intentional but vring_state_changed is
> > called for the first queue
> > before new_device.
> >
> >
> >>> 3. vring_state_changed: Add/remove queue to datapath.
> >>> 4. destroy_device: Remove all queues (vring_state_changed is not called
> >> when
> >>> qemu is killed).
> >> I had a plan to invoke vring_state_changed() to disable all vrings
> >> when destroy_device() is called.
> >>
> > That would be good.
> >
> >
> >>> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
> >> You can get the 'struct virtio_net' dev from all above callbacks.
> >
> >
> >> 1. Link status interrupt.
> >>
> >> To vhost pmd, new_device()/destroy_device() equals to the link status
> >> interrupt, where new_device() is a link up, and destroy_device() is link
> >> down().
> >>
> >>
> >>> 2. New queue_state_changed callback. Unlike vring_state_changed this
> >> should
> >>> cover the first queue at new_device and removal of all queues at
> >>> destroy_device.
> >> As stated above, vring_state_changed() should be able to do that, except
> >> the one on destroy_device(), which is not done yet.
> >>
> >>> 3. Per-queue or per-device NUMA node info.
> >> You can query the NUMA node info implicitly by get_mempolicy(); check
> >> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
> >>
> > Your suggestions are exactly how my application is already working. I was
> > commenting on the
> > proposed changes to the vhost PMD API. I would prefer to
> > use RTE_ETH_EVENT_INTR_LSC
> > and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
> > of these vhost-specific
> > hacks. The queue state change callback is the one new API that needs to be
> > added because
> > normal NICs don't have this behavior.
> >
> > You could add another rte_eth_event_type for the queue state change
> > callback, and pass the
> > queue ID, RX/TX direction, and enable bit through cb_arg. 
> 
> Hi Rich,
> 
> So far, EAL provides rte_eth_dev_callback_register() for event handling.
> DPDK app can register callback handler and "callback argument".
> And EAL will call callback handler with the argument.
> Anyway, vhost library and PMD cannot change the argument.

Yes, the event callback argument is provided from application, where it
has no way to know info you mentioned above, like queue ID, RX/TX direction.

For that, we may need introduce another structure to note all above
info, and embed it to virtio_net struct.

> I guess the callback handler will need to call ethdev APIs to know what
> causes the interrupt.
> Probably rte_eth_dev_socket_id() is to know numa_node, and
> rte_eth_dev_info_get() is to know the number of queues.
> Is this okay for your case?

I don't think that's enough, and I think Rich has already given all info
needed to a queue change interrupt to you. (That would also answer your
questions in another email)

	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-24  3:51                                                 ` Yuanhan Liu
@ 2015-12-24  4:07                                                   ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-12-24  4:07 UTC (permalink / raw)
  To: Yuanhan Liu, Thomas Monjalon, Rich Lane; +Cc: dev, ann.zhuangyanying

On 2015/12/24 12:51, Yuanhan Liu wrote:
> On Wed, Dec 23, 2015 at 11:00:15PM +0100, Thomas Monjalon wrote:
>> 2015-12-23 10:44, Yuanhan Liu:
>>> On Tue, Dec 22, 2015 at 01:38:29AM -0800, Rich Lane wrote:
>>>> On Mon, Dec 21, 2015 at 9:47 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
>>>> wrote:
>>>>
>>>>     On Mon, Dec 21, 2015 at 08:47:28PM -0800, Rich Lane wrote:
>>>>     > The queue state change callback is the one new API that needs to be
>>>>     > added because
>>>>     > normal NICs don't have this behavior.
>>>>
>>>>     Again I'd ask, will vring_state_changed() be enough, when above issues
>>>>     are resolved: vring_state_changed() will be invoked at new_device()/
>>>>     destroy_device(), and of course, ethtool change?
>>>>
>>>>
>>>> It would be sufficient. It is not a great API though, because it requires the
>>>> application to do the conversion from struct virtio_net to a DPDK port number,
>>>> and from a virtqueue index to a DPDK queue id and direction. Also, the current
>>>> implementation often makes this callback when the vring state has not actually
>>>> changed (enabled -> enabled and disabled -> disabled).
>>>>
>>>> If you're asking about using vring_state_changed() _instead_ of the link status
>>>> event and rte_eth_dev_socket_id(),
>>> No, I like the idea of link status event and rte_eth_dev_socket_id();
>>> I was just wondering why a new API is needed. Both Tetsuya and I
>>> were thinking to leverage the link status event to represent the
>>> queue stats change (triggered by vring_state_changed()) as well,
>>> so that we don't need to introduce another eth event. However, I'd
>>> agree that it's better if we could have a new dedicate event.
>>>
>>> Thomas, here is some background for you. For vhost pmd and linux
>>> virtio-net combo, the queue can be dynamically changed by ethtool,
>>> therefore, the application wishes to have another eth event, say
>>> RTE_ETH_EVENT_QUEUE_STATE_CHANGE, so that the application can
>>> add/remove corresponding queue to the datapath when that happens.
>>> What do you think of that?
>> Yes it is an event. So I don't understand the question.
>> What may be better than a specific rte_eth_event_type?
> The alternative is a new set of callbacks, but judging that we already
> have a set of callback for vhost libraray, and adding a new set to vhost
> pmd doesn't seem elegant to me.
>
> Therefore, I'd prefer a new eth event.
>
> 	--yliu

I am ok to have one more event type.

BTW, I have questions about numa_node.
I guess "rte_eth_dev_socket_id()" can only return numa_node of specified
port.
If multiple queues are used in one device(port), can we say all queues
are always in same numa_node?

If the answer is no, I am still not sure we can remove "struct
virtio_net" from DPDK application callback handling.
I agree we can add RTE_ETH_EVENT_QUEUE_STATE_CHANGE for interrupt notice.
But current ethdev APIs may not be able to hide vhost specific
properties, then the callback hander needs to handle "struct virtio_net"
directly.

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-24  4:00                                           ` Yuanhan Liu
@ 2015-12-24  4:23                                             ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-12-24  4:23 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, ann.zhuangyanying

On 2015/12/24 13:00, Yuanhan Liu wrote:
> On Thu, Dec 24, 2015 at 12:09:10PM +0900, Tetsuya Mukawa wrote:
>> On 2015/12/22 13:47, Rich Lane wrote:
>>> On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
>>> wrote:
>>>
>>>> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
>>>>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
>>>> in a few
>>>>> ways:
>>>> Rich, thanks for the info!
>>>>
>>>>> 1. new_device/destroy_device: Link state change (will be covered by the
>>>> link
>>>>> status interrupt).
>>>>> 2. new_device: Add first queue to datapath.
>>>> I'm wondering why vring_state_changed() is not used, as it will also be
>>>> triggered at the beginning, when the default queue (the first queue) is
>>>> enabled.
>>>>
>>> Turns out I'd misread the code and it's already using the
>>> vring_state_changed callback for the
>>> first queue. Not sure if this is intentional but vring_state_changed is
>>> called for the first queue
>>> before new_device.
>>>
>>>
>>>>> 3. vring_state_changed: Add/remove queue to datapath.
>>>>> 4. destroy_device: Remove all queues (vring_state_changed is not called
>>>> when
>>>>> qemu is killed).
>>>> I had a plan to invoke vring_state_changed() to disable all vrings
>>>> when destroy_device() is called.
>>>>
>>> That would be good.
>>>
>>>
>>>>> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
>>>> You can get the 'struct virtio_net' dev from all above callbacks.
>>>
>>>> 1. Link status interrupt.
>>>>
>>>> To vhost pmd, new_device()/destroy_device() equals to the link status
>>>> interrupt, where new_device() is a link up, and destroy_device() is link
>>>> down().
>>>>
>>>>
>>>>> 2. New queue_state_changed callback. Unlike vring_state_changed this
>>>> should
>>>>> cover the first queue at new_device and removal of all queues at
>>>>> destroy_device.
>>>> As stated above, vring_state_changed() should be able to do that, except
>>>> the one on destroy_device(), which is not done yet.
>>>>
>>>>> 3. Per-queue or per-device NUMA node info.
>>>> You can query the NUMA node info implicitly by get_mempolicy(); check
>>>> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
>>>>
>>> Your suggestions are exactly how my application is already working. I was
>>> commenting on the
>>> proposed changes to the vhost PMD API. I would prefer to
>>> use RTE_ETH_EVENT_INTR_LSC
>>> and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
>>> of these vhost-specific
>>> hacks. The queue state change callback is the one new API that needs to be
>>> added because
>>> normal NICs don't have this behavior.
>>>
>>> You could add another rte_eth_event_type for the queue state change
>>> callback, and pass the
>>> queue ID, RX/TX direction, and enable bit through cb_arg. 
>> Hi Rich,
>>
>> So far, EAL provides rte_eth_dev_callback_register() for event handling.
>> DPDK app can register callback handler and "callback argument".
>> And EAL will call callback handler with the argument.
>> Anyway, vhost library and PMD cannot change the argument.
> Yes, the event callback argument is provided from application, where it
> has no way to know info you mentioned above, like queue ID, RX/TX direction.
>
> For that, we may need introduce another structure to note all above
> info, and embed it to virtio_net struct.
>
>> I guess the callback handler will need to call ethdev APIs to know what
>> causes the interrupt.
>> Probably rte_eth_dev_socket_id() is to know numa_node, and
>> rte_eth_dev_info_get() is to know the number of queues.
>> Is this okay for your case?
> I don't think that's enough, and I think Rich has already given all info
> needed to a queue change interrupt to you. (That would also answer your
> questions in another email)
>
> 	--yliu

Thanks. Yes, I agree we need to provides such information through
"struct virtio_net".
And the callback handler needs to handle the vhost specific structure
directly.

Probably it's difficult to remove "struct virtio_net" from callback
handler of DPDK app.
This is the point I want to describe.

Tetsuya,

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-24  3:09                                         ` Tetsuya Mukawa
  2015-12-24  3:54                                           ` Tetsuya Mukawa
  2015-12-24  4:00                                           ` Yuanhan Liu
@ 2015-12-24  5:37                                           ` Rich Lane
  2015-12-24  7:58                                             ` Tetsuya Mukawa
  2 siblings, 1 reply; 200+ messages in thread
From: Rich Lane @ 2015-12-24  5:37 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Wed, Dec 23, 2015 at 7:09 PM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:

> On 2015/12/22 13:47, Rich Lane wrote:
> > On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <
> yuanhan.liu@linux.intel.com>
> > wrote:
> >
> >> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
> >>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
> >> in a few
> >>> ways:
> >> Rich, thanks for the info!
> >>
> >>> 1. new_device/destroy_device: Link state change (will be covered by the
> >> link
> >>> status interrupt).
> >>> 2. new_device: Add first queue to datapath.
> >> I'm wondering why vring_state_changed() is not used, as it will also be
> >> triggered at the beginning, when the default queue (the first queue) is
> >> enabled.
> >>
> > Turns out I'd misread the code and it's already using the
> > vring_state_changed callback for the
> > first queue. Not sure if this is intentional but vring_state_changed is
> > called for the first queue
> > before new_device.
> >
> >
> >>> 3. vring_state_changed: Add/remove queue to datapath.
> >>> 4. destroy_device: Remove all queues (vring_state_changed is not called
> >> when
> >>> qemu is killed).
> >> I had a plan to invoke vring_state_changed() to disable all vrings
> >> when destroy_device() is called.
> >>
> > That would be good.
> >
> >
> >>> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
> >> You can get the 'struct virtio_net' dev from all above callbacks.
> >
> >
> >> 1. Link status interrupt.
> >>
> >> To vhost pmd, new_device()/destroy_device() equals to the link status
> >> interrupt, where new_device() is a link up, and destroy_device() is link
> >> down().
> >>
> >>
> >>> 2. New queue_state_changed callback. Unlike vring_state_changed this
> >> should
> >>> cover the first queue at new_device and removal of all queues at
> >>> destroy_device.
> >> As stated above, vring_state_changed() should be able to do that, except
> >> the one on destroy_device(), which is not done yet.
> >>
> >>> 3. Per-queue or per-device NUMA node info.
> >> You can query the NUMA node info implicitly by get_mempolicy(); check
> >> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
> >>
> > Your suggestions are exactly how my application is already working. I was
> > commenting on the
> > proposed changes to the vhost PMD API. I would prefer to
> > use RTE_ETH_EVENT_INTR_LSC
> > and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
> > of these vhost-specific
> > hacks. The queue state change callback is the one new API that needs to
> be
> > added because
> > normal NICs don't have this behavior.
> >
> > You could add another rte_eth_event_type for the queue state change
> > callback, and pass the
> > queue ID, RX/TX direction, and enable bit through cb_arg.
>
> Hi Rich,
>
> So far, EAL provides rte_eth_dev_callback_register() for event handling.
> DPDK app can register callback handler and "callback argument".
> And EAL will call callback handler with the argument.
> Anyway, vhost library and PMD cannot change the argument.
>

You're right, I'd mistakenly thought that the PMD controlled the void *
passed to the callback.

Here's a thought:

    struct rte_eth_vhost_queue_event {
        uint16_t queue_id;
        bool rx;
        bool enable;
    };

    int rte_eth_vhost_get_queue_event(uint8_t port_id, struct
rte_eth_vhost_queue_event *event);

On receiving the ethdev event the application could repeatedly call
rte_eth_vhost_get_queue_event
to find out what happened.

An issue with having the application dig into struct virtio_net is that it
can only be safely accessed from
a callback on the vhost thread. A typical application running its control
plane on lcore 0 would need to
copy all the relevant info from struct virtio_net before sending it over.

As you mentioned, queues for a single vhost port could be located on
different NUMA nodes. I think this
is an uncommon scenario but if needed you could add an API to retrieve the
NUMA node for a given port
and queue.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-24  5:37                                           ` Rich Lane
@ 2015-12-24  7:58                                             ` Tetsuya Mukawa
  2015-12-28 21:59                                               ` Rich Lane
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2015-12-24  7:58 UTC (permalink / raw)
  To: Rich Lane; +Cc: dev, ann.zhuangyanying

On 2015/12/24 14:37, Rich Lane wrote:
> On Wed, Dec 23, 2015 at 7:09 PM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:
>
>> On 2015/12/22 13:47, Rich Lane wrote:
>>> On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <
>> yuanhan.liu@linux.intel.com>
>>> wrote:
>>>
>>>> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
>>>>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
>>>> in a few
>>>>> ways:
>>>> Rich, thanks for the info!
>>>>
>>>>> 1. new_device/destroy_device: Link state change (will be covered by the
>>>> link
>>>>> status interrupt).
>>>>> 2. new_device: Add first queue to datapath.
>>>> I'm wondering why vring_state_changed() is not used, as it will also be
>>>> triggered at the beginning, when the default queue (the first queue) is
>>>> enabled.
>>>>
>>> Turns out I'd misread the code and it's already using the
>>> vring_state_changed callback for the
>>> first queue. Not sure if this is intentional but vring_state_changed is
>>> called for the first queue
>>> before new_device.
>>>
>>>
>>>>> 3. vring_state_changed: Add/remove queue to datapath.
>>>>> 4. destroy_device: Remove all queues (vring_state_changed is not called
>>>> when
>>>>> qemu is killed).
>>>> I had a plan to invoke vring_state_changed() to disable all vrings
>>>> when destroy_device() is called.
>>>>
>>> That would be good.
>>>
>>>
>>>>> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
>>>> You can get the 'struct virtio_net' dev from all above callbacks.
>>>
>>>> 1. Link status interrupt.
>>>>
>>>> To vhost pmd, new_device()/destroy_device() equals to the link status
>>>> interrupt, where new_device() is a link up, and destroy_device() is link
>>>> down().
>>>>
>>>>
>>>>> 2. New queue_state_changed callback. Unlike vring_state_changed this
>>>> should
>>>>> cover the first queue at new_device and removal of all queues at
>>>>> destroy_device.
>>>> As stated above, vring_state_changed() should be able to do that, except
>>>> the one on destroy_device(), which is not done yet.
>>>>
>>>>> 3. Per-queue or per-device NUMA node info.
>>>> You can query the NUMA node info implicitly by get_mempolicy(); check
>>>> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
>>>>
>>> Your suggestions are exactly how my application is already working. I was
>>> commenting on the
>>> proposed changes to the vhost PMD API. I would prefer to
>>> use RTE_ETH_EVENT_INTR_LSC
>>> and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
>>> of these vhost-specific
>>> hacks. The queue state change callback is the one new API that needs to
>> be
>>> added because
>>> normal NICs don't have this behavior.
>>>
>>> You could add another rte_eth_event_type for the queue state change
>>> callback, and pass the
>>> queue ID, RX/TX direction, and enable bit through cb_arg.
>> Hi Rich,
>>
>> So far, EAL provides rte_eth_dev_callback_register() for event handling.
>> DPDK app can register callback handler and "callback argument".
>> And EAL will call callback handler with the argument.
>> Anyway, vhost library and PMD cannot change the argument.
>>
> You're right, I'd mistakenly thought that the PMD controlled the void *
> passed to the callback.
>
> Here's a thought:
>
>     struct rte_eth_vhost_queue_event {
>         uint16_t queue_id;
>         bool rx;
>         bool enable;
>     };
>
>     int rte_eth_vhost_get_queue_event(uint8_t port_id, struct
> rte_eth_vhost_queue_event *event);
>
> On receiving the ethdev event the application could repeatedly call
> rte_eth_vhost_get_queue_event
> to find out what happened.

Hi Rich and Yuanhan,

I guess we have 2 implementations here.

1. rte_eth_vhost_get_queue_event() returns each event.
2. rte_eth_vhost_get_queue_status() returns current status of the queues.

I guess option "2" is more generic manner to handle interrupts from
device driver.
In the case of option "1", if DPDK application doesn't call
rte_eth_vhost_get_queue_event(), the vhost PMD needs to keep all events.
This may exhaust memory.

One more example is current link status interrupt handling.
Actually ethdev API just returns current status of the port.
What do you think?

>
> An issue with having the application dig into struct virtio_net is that it
> can only be safely accessed from
> a callback on the vhost thread.

Here is one of example how to invoke a callback handler registered by
DPDK application from the PMD.

  _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC);

Above function is called by interrupt handling thread of the PMDs.

Please check implementation of above function.
A callback handler that DPDK application registers is called in
"interrupt handling context".
(I mean the interrupt handling thread of the PMD calls the callback
handler of DPDK application also.)
Anyway, I guess the callback handler of DPDK application can access to
"struct virtio_net" safely.

> A typical application running its control
> plane on lcore 0 would need to
> copy all the relevant info from struct virtio_net before sending it over.

Could you please describe it more?
Sorry, probably I don't understand correctly which restriction make you
copy data.
(As described above, the callback handler registered by DPDK application
can safely access "to struct virtio_net". Does this solve the copy issue?)

> As you mentioned, queues for a single vhost port could be located on
> different NUMA nodes. I think this
> is an uncommon scenario but if needed you could add an API to retrieve the
> NUMA node for a given port
> and queue.
>

I agree this is very specific for vhost, because in the case of generic
PCI device, all queues of a port are on same NUMA node.
Anyway, because it's very specific for vhost, I am not sure we should
add ethdev API to handle this.

If we handle it by vhost PMD API, we probably have 2 options also here.

1. Extend "struct rte_eth_vhost_queue_event , and use
rte_eth_vhost_get_queue_event() like you described.
struct rte_eth_vhost_queue_event
{
        uint16_t queue_id;
        bool rx;
        bool enable;
+      int socket_id;
};

2. rte_eth_vhost_get_queue_status() returns current socket_ids of all
queues.

Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-24  7:58                                             ` Tetsuya Mukawa
@ 2015-12-28 21:59                                               ` Rich Lane
  2016-01-06  3:56                                                 ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Rich Lane @ 2015-12-28 21:59 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Wed, Dec 23, 2015 at 11:58 PM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:
>
> Hi Rich and Yuanhan,
>
> I guess we have 2 implementations here.
>
> 1. rte_eth_vhost_get_queue_event() returns each event.
> 2. rte_eth_vhost_get_queue_status() returns current status of the queues.
>
> I guess option "2" is more generic manner to handle interrupts from
> device driver.
> In the case of option "1", if DPDK application doesn't call
> rte_eth_vhost_get_queue_event(), the vhost PMD needs to keep all events.
> This may exhaust memory.
>

Option 1 can be implemented in constant space by only tracking the latest
state of each
queue. I pushed a rough implementation to https://github.com/rlane/dpdk
vhost-queue-callback.

One more example is current link status interrupt handling.
> Actually ethdev API just returns current status of the port.
> What do you think?
>

Option 2 adds a small burden to the application but I'm fine with this as
long as it's
thread-safe (see my comments below).


> > An issue with having the application dig into struct virtio_net is that
> it
> > can only be safely accessed from
> > a callback on the vhost thread.
>
> Here is one of example how to invoke a callback handler registered by
> DPDK application from the PMD.
>
>   _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC);
>
> Above function is called by interrupt handling thread of the PMDs.
>
> Please check implementation of above function.
> A callback handler that DPDK application registers is called in
> "interrupt handling context".
> (I mean the interrupt handling thread of the PMD calls the callback
> handler of DPDK application also.)
> Anyway, I guess the callback handler of DPDK application can access to
> "struct virtio_net" safely.


> > A typical application running its control
> > plane on lcore 0 would need to
> > copy all the relevant info from struct virtio_net before sending it over.
>
> Could you please describe it more?
> Sorry, probably I don't understand correctly which restriction make you
> copy data.
> (As described above, the callback handler registered by DPDK application
> can safely access "to struct virtio_net". Does this solve the copy issue?)
>

The ethdev event callback can safely access struct virtio_net, yes. The
problem is
that a real application will likely want to handle queue state changes as
part
of its main event loop running on a separate thread. Because no other thread
can safely access struct virtio_net. the callback would need to copy the
queue
states out of struct virtio_net into another datastructure before sending
it to
the main thread.

Instead of having the callback muck around with struct virtio_net, I would
prefer
an API that I could call from any thread to get the current queue states.
This
also removes struct virtio_net from the PMD's API which I think is a win.


> > As you mentioned, queues for a single vhost port could be located on
> > different NUMA nodes. I think this
> > is an uncommon scenario but if needed you could add an API to retrieve
> the
> > NUMA node for a given port
> > and queue.
> >
>
> I agree this is very specific for vhost, because in the case of generic
> PCI device, all queues of a port are on same NUMA node.
> Anyway, because it's very specific for vhost, I am not sure we should
> add ethdev API to handle this.
>
> If we handle it by vhost PMD API, we probably have 2 options also here.
>
> 1. Extend "struct rte_eth_vhost_queue_event , and use
> rte_eth_vhost_get_queue_event() like you described.
> struct rte_eth_vhost_queue_event
> {
>         uint16_t queue_id;
>         bool rx;
>         bool enable;
> +      int socket_id;
> };
>
> 2. rte_eth_vhost_get_queue_status() returns current socket_ids of all
> queues.
>

Up to you, but I think we can skip this for the time being because it would
be unusual
for a guest to place virtqueues for one PCI device on different NUMA nodes.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2015-12-28 21:59                                               ` Rich Lane
@ 2016-01-06  3:56                                                 ` Tetsuya Mukawa
  2016-01-06  7:38                                                   ` Yuanhan Liu
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-01-06  3:56 UTC (permalink / raw)
  To: Rich Lane, Yuanhan Liu; +Cc: dev, ann.zhuangyanying

On 2015/12/29 6:59, Rich Lane wrote:
> On Wed, Dec 23, 2015 at 11:58 PM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:
>> Hi Rich and Yuanhan,
>>
>> I guess we have 2 implementations here.
>>
>> 1. rte_eth_vhost_get_queue_event() returns each event.
>> 2. rte_eth_vhost_get_queue_status() returns current status of the queues.
>>
>> I guess option "2" is more generic manner to handle interrupts from
>> device driver.
>> In the case of option "1", if DPDK application doesn't call
>> rte_eth_vhost_get_queue_event(), the vhost PMD needs to keep all events.
>> This may exhaust memory.
>>
> Option 1 can be implemented in constant space by only tracking the latest
> state of each
> queue. I pushed a rough implementation to https://github.com/rlane/dpdk
> vhost-queue-callback.
>
> One more example is current link status interrupt handling.

Hi Rich,

I appreciate your implementation.
I can understand what's your idea, and agree with it.


Hi Yuanhan,

What do you think his implementation?

Thanks.
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
  2016-01-06  3:56                                                 ` Tetsuya Mukawa
@ 2016-01-06  7:38                                                   ` Yuanhan Liu
  0 siblings, 0 replies; 200+ messages in thread
From: Yuanhan Liu @ 2016-01-06  7:38 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Wed, Jan 06, 2016 at 12:56:58PM +0900, Tetsuya Mukawa wrote:
> On 2015/12/29 6:59, Rich Lane wrote:
> > On Wed, Dec 23, 2015 at 11:58 PM, Tetsuya Mukawa <mukawa@igel.co.jp> wrote:
> >> Hi Rich and Yuanhan,
> >>
> >> I guess we have 2 implementations here.
> >>
> >> 1. rte_eth_vhost_get_queue_event() returns each event.
> >> 2. rte_eth_vhost_get_queue_status() returns current status of the queues.
> >>
> >> I guess option "2" is more generic manner to handle interrupts from
> >> device driver.
> >> In the case of option "1", if DPDK application doesn't call
> >> rte_eth_vhost_get_queue_event(), the vhost PMD needs to keep all events.
> >> This may exhaust memory.
> >>
> > Option 1 can be implemented in constant space by only tracking the latest
> > state of each
> > queue. I pushed a rough implementation to https://github.com/rlane/dpdk
> > vhost-queue-callback.
> >
> > One more example is current link status interrupt handling.
> 
> Hi Rich,
> 
> I appreciate your implementation.
> I can understand what's your idea, and agree with it.
> 
> 
> Hi Yuanhan,
> 
> What do you think his implementation?

With a quick glimpse, it looks good to me.

	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v6 0/2] Add VHOST PMD
  2015-11-24  9:00                         ` [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
  2015-12-17 11:42                           ` Yuanhan Liu
@ 2016-02-02 11:18                           ` Tetsuya Mukawa
  2016-02-02 19:52                             ` Rich Lane
  2016-02-02 11:18                           ` [PATCH v6 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
                                             ` (4 subsequent siblings)
  6 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-02 11:18 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ann.zhuangyanying

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.

PATCH v6 changes:
 - Remove rte_vhost_driver_pmd_callback_registe().
 - Support link status interrupt.
 - Support queue state changed interrupt.
 - Add rte_eth_vhost_get_queue_event().
 - Support numa node detection when new device is connected.

PATCH v5 changes:
 - Rebase on latest master.
 - Fix RX/TX routine to count RX/TX bytes.
 - Fix RX/TX routine not to count as error packets if enqueue/dequeue
   cannot send all packets.
 - Fix if-condition checking for multiqueues.
 - Add "static" to pthread variable.
 - Fix format.
 - Change default behavior not to receive queueing event from driver.
 - Split the patch to separate rte_eth_vhost_portid2vdev().

PATCH v4 changes:
 - Rebase on latest DPDK tree.
 - Fix cording style.
 - Fix code not to invoke multiple messaging handling threads.
 - Fix code to handle vdev parameters correctly.
 - Remove needless cast.
 - Remove needless if-condition before rt_free().

PATCH v3 changes:
 - Rebase on latest matser
 - Specify correct queue_id in RX/TX function.

PATCH v2 changes:
 - Remove a below patch that fixes vhost library.
   The patch was applied as a separate patch.
   - vhost: fix crash with multiqueue enabled
 - Fix typos.
   (Thanks to Thomas, Monjalon)
 - Rebase on latest tree with above bernard's patches.

PATCH v1 changes:
 - Support vhost multiple queues.
 - Rebase on "remove pci driver from vdevs".
 - Optimize RX/TX functions.
 - Fix resource leaks.
 - Fix compile issue.
 - Add patch to fix vhost library.

RFC PATCH v3 changes:
 - Optimize performance.
   In RX/TX functions, change code to access only per core data.
 - Add below API to allow user to use vhost library APIs for a port managed
   by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
    - rte_eth_vhost_portid2vdev()
   To support this functionality, vhost library is also changed.
   Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
 - Add code to support vhost multiple queues.
   Actually, multiple queues functionality is not enabled so far.

RFC PATCH v2 changes:
 - Fix issues reported by checkpatch.pl
   (Thanks to Stephen Hemminger)



Tetsuya Mukawa (2):
  ethdev: Add a new event type to notify a queue state changed event
  vhost: Add VHOST PMD

 config/common_linuxapp                      |   6 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/rel_notes/release_2_2.rst        |   2 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c           | 920 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           |  73 +++
 drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
 lib/librte_ether/rte_ethdev.h               |   2 +
 mk/rte.app.mk                               |   8 +-
 10 files changed, 1088 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

-- 
2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v6 1/2] ethdev: Add a new event type to notify a queue state changed event
  2015-11-24  9:00                         ` [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
  2015-12-17 11:42                           ` Yuanhan Liu
  2016-02-02 11:18                           ` [PATCH v6 0/2] Add VHOST PMD Tetsuya Mukawa
@ 2016-02-02 11:18                           ` Tetsuya Mukawa
  2016-02-02 11:18                           ` [PATCH v6 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
                                             ` (3 subsequent siblings)
  6 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-02 11:18 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ann.zhuangyanying

This patch adds a below event type.
 - RTE_ETH_EVENT_QUEUE_STATE_CHANGE
This event is used for notifying a queue state changed event.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 lib/librte_ether/rte_ethdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 8710dd7..2fbf42a 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2661,6 +2661,8 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 enum rte_eth_event_type {
 	RTE_ETH_EVENT_UNKNOWN,  /**< unknown event type */
 	RTE_ETH_EVENT_INTR_LSC, /**< lsc interrupt event */
+	RTE_ETH_EVENT_QUEUE_STATE_CHANGE,
+				/**< queue state changed interrupt */
 	RTE_ETH_EVENT_MAX       /**< max value of this enum */
 };
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v6 2/2] vhost: Add VHOST PMD
  2015-11-24  9:00                         ` [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
                                             ` (2 preceding siblings ...)
  2016-02-02 11:18                           ` [PATCH v6 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
@ 2016-02-02 11:18                           ` Tetsuya Mukawa
  2016-02-02 23:43                             ` Ferruh Yigit
  2016-02-04  7:26                           ` [PATCH v7 0/2] " Tetsuya Mukawa
                                             ` (2 subsequent siblings)
  6 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-02 11:18 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ann.zhuangyanying

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
 - iface:  The parameter is used to specify a path to connect to a
           virtio-net device.
 - queues: The parameter is used to specify the number of the queues
           virtio-net device has.
           (Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
        <snip>
        -chardev socket,id=chr0,path=/tmp/sock0 \
        -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
        -device virtio-net-pci,netdev=net0,mq=on

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 config/common_linuxapp                      |   6 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/rel_notes/release_2_2.rst        |   2 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c           | 920 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           |  73 +++
 drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
 mk/rte.app.mk                               |   8 +-
 9 files changed, 1086 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 74bc515..357b557 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -514,6 +514,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 
 #
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 33c9cea..5819cdb 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -47,6 +47,7 @@ Network Interface Controller Drivers
     nfp
     szedata2
     virtio
+    vhost
     vmxnet3
     pcap_ring
 
diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index bb7d15a..6390b44 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -168,6 +168,8 @@ New Features
 
 * **Added vhost-user multiple queue support.**
 
+* **Added vhost PMD.**
+
 * **Added port hotplug support to vmxnet3.**
 
 * **Added port hotplug support to xenvirt.**
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 6e4497e..4300b93 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -52,5 +52,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 include $(RTE_SDK)/mk/rte.sharelib.mk
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..f49a69b
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+#   BSD LICENSE
+#
+#   Copyright (c) 2010-2016 Intel Corporation.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..94ab8a6
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,920 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2016 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co.,Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+#include <stdbool.h>
+#ifdef RTE_LIBRTE_VHOST_NUMA
+#include <numaif.h>
+#endif
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+#include <rte_spinlock.h>
+
+#include "rte_eth_vhost.h"
+
+#define ETH_VHOST_IFACE_ARG		"iface"
+#define ETH_VHOST_QUEUES_ARG		"queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+	ETH_VHOST_IFACE_ARG,
+	ETH_VHOST_QUEUES_ARG,
+	NULL
+};
+
+static struct ether_addr base_eth_addr = {
+	.addr_bytes = {
+		0x56 /* V */,
+		0x48 /* H */,
+		0x4F /* O */,
+		0x53 /* S */,
+		0x54 /* T */,
+		0x00
+	}
+};
+
+struct vhost_queue {
+	rte_atomic32_t allow_queuing;
+	rte_atomic32_t while_queuing;
+	struct virtio_net *device;
+	struct pmd_internal *internal;
+	struct rte_mempool *mb_pool;
+	uint16_t virtqueue_id;
+	uint64_t rx_pkts;
+	uint64_t tx_pkts;
+	uint64_t missed_pkts;
+	uint64_t rx_bytes;
+	uint64_t tx_bytes;
+};
+
+struct pmd_internal {
+	TAILQ_ENTRY(pmd_internal) next;
+	char *dev_name;
+	char *iface_name;
+	unsigned nb_rx_queues;
+	unsigned nb_tx_queues;
+	uint8_t port_id;
+
+	struct vhost_queue *rx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+	struct vhost_queue *tx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
+
+	volatile uint16_t once;
+};
+
+TAILQ_HEAD(pmd_internal_head, pmd_internal);
+static struct pmd_internal_head internals_list =
+	TAILQ_HEAD_INITIALIZER(internals_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static rte_atomic16_t nb_started_ports;
+static pthread_t session_th;
+
+static struct rte_eth_link pmd_link = {
+		.link_speed = 10000,
+		.link_duplex = ETH_LINK_FULL_DUPLEX,
+		.link_status = 0
+};
+
+struct rte_vhost_vring_state {
+	rte_spinlock_t lock;
+
+	bool cur[RTE_MAX_QUEUES_PER_PORT * 2];
+	bool seen[RTE_MAX_QUEUES_PER_PORT * 2];
+	unsigned int index;
+	unsigned int max_vring;
+};
+
+static struct rte_vhost_vring_state *vring_states[RTE_MAX_ETHPORTS];
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_rx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Dequeue packets from guest TX queue */
+	nb_rx = rte_vhost_dequeue_burst(r->device,
+			r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
+
+	r->rx_pkts += nb_rx;
+
+	for (i = 0; likely(i < nb_rx); i++)
+		r->rx_bytes += bufs[i]->pkt_len;
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_tx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Enqueue packets to guest RX queue */
+	nb_tx = rte_vhost_enqueue_burst(r->device,
+			r->virtqueue_id, bufs, nb_bufs);
+
+	r->tx_pkts += nb_tx;
+	r->missed_pkts += nb_bufs - nb_tx;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		r->tx_bytes += bufs[i]->pkt_len;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		rte_pktmbuf_free(bufs[i]);
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	return 0;
+}
+
+static inline struct pmd_internal *
+find_internal_resource(char *ifname)
+{
+	int found = 0;
+	struct pmd_internal *internal;
+
+	if (ifname == NULL)
+		return NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(internal, &internals_list, next) {
+		if (!strcmp(internal->iface_name, ifname)) {
+			found = 1;
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	if (!found)
+		return NULL;
+
+	return internal;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	internal = find_internal_resource(dev->ifname);
+	if (internal == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid device name\n");
+		return -1;
+	}
+
+	eth_dev = rte_eth_dev_allocated(internal->dev_name);
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+		return -1;
+	}
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+		rte_vhost_enable_guest_notification(
+				dev, vq->virtqueue_id, 0);
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+		rte_vhost_enable_guest_notification(
+				dev, vq->virtqueue_id, 0);
+	}
+
+	dev->flags |= VIRTIO_DEV_RUNNING;
+	dev->priv = eth_dev;
+	eth_dev->data->dev_link.link_status = 1;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+
+	RTE_LOG(INFO, PMD, "New connection established\n");
+
+	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
+
+	return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return;
+	}
+
+	eth_dev = (struct rte_eth_dev *)dev->priv;
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+		return;
+	}
+
+	internal = eth_dev->data->dev_private;
+
+	/* Wait until rx/tx_pkt_burst stops accessing vhost device */
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+
+	eth_dev->data->dev_link.link_status = 0;
+
+	dev->priv = NULL;
+	dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		vq = internal->rx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		vq = internal->tx_vhost_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+
+	RTE_LOG(INFO, PMD, "Connection closed\n");
+
+	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
+
+}
+
+static int
+vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
+{
+	struct rte_vhost_vring_state *state;
+	struct pmd_internal *internal;
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	int newnode, ret;
+#endif
+
+	if (dev == NULL) {
+		RTE_LOG(ERR, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	internal = find_internal_resource(dev->ifname);
+	state = vring_states[internal->port_id];
+	if (!state) {
+		RTE_LOG(ERR, PMD, "Unused port\n");
+		return -1;
+	}
+
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	ret  = get_mempolicy(&newnode, NULL, 0, dev,
+			MPOL_F_NODE | MPOL_F_ADDR);
+	if (ret < 0) {
+		RTE_LOG(ERR, PMD, "Unknow numa node\n");
+		return -1;
+	}
+
+	rte_eth_devices[internal->port_id].data->numa_node = newnode;
+#endif
+	rte_spinlock_lock(&state->lock);
+	state->cur[vring] = enable;
+	state->max_vring = RTE_MAX(vring, state->max_vring);
+	rte_spinlock_unlock(&state->lock);
+
+
+	RTE_LOG(INFO, PMD, "vring%u is %s\n",
+			vring, enable ? "enabled" : "disabled");
+
+	_rte_eth_dev_callback_process(&rte_eth_devices[internal->port_id],
+			RTE_ETH_EVENT_QUEUE_STATE_CHANGE);
+
+	return 0;
+}
+
+int
+rte_eth_vhost_get_queue_event(uint8_t port_id,
+		struct rte_eth_vhost_queue_event *event)
+{
+	struct rte_vhost_vring_state *state;
+	unsigned int i;
+	int idx;
+
+	if (port_id >= RTE_MAX_ETHPORTS) {
+		RTE_LOG(ERR, PMD, "Invalid port id\n");
+		return -1;
+	}
+
+	state = vring_states[port_id];
+	if (!state) {
+		RTE_LOG(ERR, PMD, "Unused port\n");
+		return -1;
+	}
+
+	rte_spinlock_lock(&state->lock);
+	for (i = 0; i <= state->max_vring; i++) {
+		idx = state->index++ % (state->max_vring + 1);
+
+		if (state->cur[idx] != state->seen[idx]) {
+			state->seen[idx] = state->cur[idx];
+			event->queue_id = idx / 2;
+			event->rx = idx & 1;
+			event->enable = state->cur[idx];
+			rte_spinlock_unlock(&state->lock);
+			return 0;
+		}
+	}
+	rte_spinlock_unlock(&state->lock);
+
+	return -1;
+}
+
+static void *
+vhost_driver_session(void *param __rte_unused)
+{
+	static struct virtio_net_device_ops vhost_ops;
+
+	/* set vhost arguments */
+	vhost_ops.new_device = new_device;
+	vhost_ops.destroy_device = destroy_device;
+	vhost_ops.vring_state_changed = vring_state_changed;
+	if (rte_vhost_driver_callback_register(&vhost_ops) < 0)
+		rte_panic("Can't register callbacks\n");
+
+	/* start event handling */
+	rte_vhost_driver_session_start();
+
+	pthread_exit(0);
+}
+
+static void
+vhost_driver_session_start(void)
+{
+	int ret;
+
+	ret = pthread_create(&session_th,
+			NULL, vhost_driver_session, NULL);
+	if (ret)
+		rte_panic("Can't create a thread\n");
+}
+
+static void
+vhost_driver_session_stop(void)
+{
+	int ret;
+
+	ret = pthread_cancel(session_th);
+	if (ret)
+		rte_panic("Can't cancel the thread\n");
+
+	ret = pthread_join(session_th, NULL);
+	if (ret)
+		rte_panic("Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	int ret;
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+		ret = rte_vhost_driver_register(internal->iface_name);
+		if (ret)
+			return ret;
+	}
+
+	/* We need only one message handling thread */
+	if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
+		vhost_driver_session_start();
+
+	return 0;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 1, 0))
+		rte_vhost_driver_unregister(internal->iface_name);
+
+	if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
+		vhost_driver_session_stop();
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+	struct vhost_queue *vq;
+
+	rte_free(internal->rx_vhost_queues[rx_queue_id]);
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->mb_pool = mb_pool;
+	vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
+	internal->rx_vhost_queues[rx_queue_id] = vq;
+	dev->data->rx_queues[rx_queue_id] = vq;
+	return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+	struct vhost_queue *vq;
+
+	rte_free(internal->tx_vhost_queues[tx_queue_id]);
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
+	internal->tx_vhost_queues[tx_queue_id] = vq;
+	dev->data->tx_queues[tx_queue_id] = vq;
+	return 0;
+}
+
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+	     struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	dev_info->driver_name = drivername;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = (uint32_t)-1;
+	dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
+	dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;
+	dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+	unsigned i;
+	unsigned long rx_total = 0, tx_total = 0, tx_missed_total = 0;
+	unsigned long rx_total_bytes = 0, tx_total_bytes = 0;
+	const struct pmd_internal *internal = dev->data->dev_private;
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+	     i < internal->nb_rx_queues; i++) {
+		if (internal->rx_vhost_queues[i] == NULL)
+			continue;
+		igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i]->rx_pkts;
+		rx_total += igb_stats->q_ipackets[i];
+
+		igb_stats->q_ibytes[i] = internal->rx_vhost_queues[i]->rx_bytes;
+		rx_total_bytes += igb_stats->q_ibytes[i];
+	}
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+	     i < internal->nb_tx_queues; i++) {
+		if (internal->tx_vhost_queues[i] == NULL)
+			continue;
+		igb_stats->q_opackets[i] = internal->tx_vhost_queues[i]->tx_pkts;
+		tx_missed_total += internal->tx_vhost_queues[i]->missed_pkts;
+		tx_total += igb_stats->q_opackets[i];
+
+		igb_stats->q_obytes[i] = internal->tx_vhost_queues[i]->tx_bytes;
+		tx_total_bytes += igb_stats->q_obytes[i];
+	}
+
+	igb_stats->ipackets = rx_total;
+	igb_stats->opackets = tx_total;
+	igb_stats->imissed = tx_missed_total;
+	igb_stats->ibytes = rx_total_bytes;
+	igb_stats->obytes = tx_total_bytes;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	unsigned i;
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	for (i = 0; i < internal->nb_rx_queues; i++) {
+		if (internal->rx_vhost_queues[i] == NULL)
+			continue;
+		internal->rx_vhost_queues[i]->rx_pkts = 0;
+		internal->rx_vhost_queues[i]->rx_bytes = 0;
+	}
+	for (i = 0; i < internal->nb_tx_queues; i++) {
+		if (internal->tx_vhost_queues[i] == NULL)
+			continue;
+		internal->tx_vhost_queues[i]->tx_pkts = 0;
+		internal->tx_vhost_queues[i]->tx_bytes = 0;
+		internal->tx_vhost_queues[i]->missed_pkts = 0;
+	}
+}
+
+static void
+eth_queue_release(void *q __rte_unused)
+{
+	return;
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+/**
+ * Disable features in feature_mask. Returns 0 on success.
+ */
+int
+rte_eth_vhost_feature_disable(uint64_t feature_mask)
+{
+	return rte_vhost_feature_disable(feature_mask);
+}
+
+/**
+ * Enable features in feature_mask. Returns 0 on success.
+ */
+int
+rte_eth_vhost_feature_enable(uint64_t feature_mask)
+{
+	return rte_vhost_feature_enable(feature_mask);
+}
+
+/* Returns currently supported vhost features */
+uint64_t
+rte_eth_vhost_feature_get(void)
+{
+	return rte_vhost_feature_get();
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, int index,
+		     char *iface_name,
+		     int16_t queues,
+		     const unsigned numa_node)
+{
+	struct rte_eth_dev_data *data = NULL;
+	struct pmd_internal *internal = NULL;
+	struct rte_eth_dev *eth_dev = NULL;
+	struct ether_addr *eth_addr = NULL;
+	struct rte_vhost_vring_state *vring_state = NULL;
+
+	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+		numa_node);
+
+	/* now do all data allocation - for eth_dev structure, dummy pci driver
+	 * and internal (private) data
+	 */
+	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+	if (data == NULL)
+		goto error;
+
+	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+	if (internal == NULL)
+		goto error;
+
+	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+	if (eth_addr == NULL)
+		goto error;
+	*eth_addr = base_eth_addr;
+	eth_addr->addr_bytes[5] = index;
+
+	/* reserve an ethdev entry */
+	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+	if (eth_dev == NULL)
+		goto error;
+
+	TAILQ_INIT(&(eth_dev->link_intr_cbs));
+
+	/* now put it all together
+	 * - store queue data in internal,
+	 * - store numa_node info in ethdev data
+	 * - point eth_dev_data to internals
+	 * - and point eth_dev structure to new eth_dev_data structure
+	 */
+	internal->nb_rx_queues = queues;
+	internal->nb_tx_queues = queues;
+	internal->dev_name = strdup(name);
+	if (internal->dev_name == NULL)
+		goto error;
+	internal->iface_name = strdup(iface_name);
+	if (internal->iface_name == NULL) {
+		free(internal->dev_name);
+		goto error;
+	}
+	internal->port_id = eth_dev->data->port_id;
+
+	vring_state = rte_zmalloc_socket(name,
+			sizeof(*vring_state), 0, numa_node);
+	if (vring_state == NULL)
+		goto error;
+	rte_spinlock_init(&vring_state->lock);
+	vring_states[eth_dev->data->port_id] = vring_state;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internals_list, internal, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	data->dev_private = internal;
+	data->port_id = eth_dev->data->port_id;
+	memmove(data->name, eth_dev->data->name, sizeof(data->name));
+	data->nb_rx_queues = queues;
+	data->nb_tx_queues = queues;
+	data->dev_link = pmd_link;
+	data->mac_addrs = eth_addr;
+
+	/* We'll replace the 'data' originally allocated by eth_dev. So the
+	 * vhost PMD resources won't be shared between multi processes.
+	 */
+	eth_dev->data = data;
+	eth_dev->dev_ops = &ops;
+	eth_dev->driver = NULL;
+	eth_dev->data->dev_flags =
+		RTE_ETH_DEV_DETACHABLE | RTE_ETH_DEV_INTR_LSC;
+	eth_dev->data->kdrv = RTE_KDRV_NONE;
+	eth_dev->data->drv_name = internal->dev_name;
+	eth_dev->data->numa_node = numa_node;
+
+	/* finally assign rx and tx ops */
+	eth_dev->rx_pkt_burst = eth_vhost_rx;
+	eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+	return data->port_id;
+
+error:
+	rte_free(data);
+	rte_free(internal);
+	rte_free(eth_addr);
+
+	return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	const char **iface_name = extra_args;
+
+	if (value == NULL)
+		return -1;
+
+	*iface_name = value;
+
+	return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *q = extra_args;
+
+	if ((value == NULL) || (extra_args == NULL))
+		return -EINVAL;
+
+	*q = (uint16_t)strtoul(value, NULL, 0);
+	if ((*q == USHRT_MAX) && (errno == ERANGE))
+		return -1;
+
+	if (*q > RTE_MAX_QUEUES_PER_PORT)
+		return -1;
+
+	return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
+	int index;
+	char *iface_name;
+	uint16_t queues;
+
+	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+	if (strlen(name) < strlen("eth_vhost"))
+		return -1;
+
+	index = strtol(name + strlen("eth_vhost"), NULL, 0);
+	if (errno == ERANGE)
+		return -1;
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+		return -1;
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+					 &open_iface, &iface_name);
+		if (ret < 0)
+			goto out_free;
+	} else {
+		ret = -1;
+		goto out_free;
+	}
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+					 &open_queues, &queues);
+		if (ret < 0)
+			goto out_free;
+
+	} else
+		queues = 1;
+
+	eth_dev_vhost_create(name, index,
+			iface_name, queues, rte_socket_id());
+
+out_free:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internal *internal;
+	unsigned int i;
+
+	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+	if (name == NULL)
+		return -EINVAL;
+
+	/* find an ethdev entry */
+	eth_dev = rte_eth_dev_allocated(name);
+	if (eth_dev == NULL)
+		return -ENODEV;
+
+	internal = eth_dev->data->dev_private;
+
+	rte_free(vring_states[internal->port_id]);
+	vring_states[internal->port_id] = NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_REMOVE(&internals_list, internal, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	eth_dev_stop(eth_dev);
+
+	if ((internal) && (internal->dev_name))
+		free(internal->dev_name);
+	if ((internal) && (internal->iface_name))
+		free(internal->iface_name);
+
+	rte_free(eth_dev->data->mac_addrs);
+	rte_free(eth_dev->data);
+
+	for (i = 0; i < internal->nb_rx_queues; i++)
+		rte_free(internal->rx_vhost_queues[i]);
+	for (i = 0; i < internal->nb_tx_queues; i++)
+		rte_free(internal->tx_vhost_queues[i]);
+	rte_free(internal);
+
+	rte_eth_dev_release_port(eth_dev);
+
+	return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+	.name = "eth_vhost",
+	.type = PMD_VDEV,
+	.init = rte_pmd_vhost_devinit,
+	.uninit = rte_pmd_vhost_devuninit,
+};
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..8aa894a
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,73 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co., Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_VHOST_H_
+#define _RTE_ETH_VHOST_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+#include <stdbool.h>
+
+#include <rte_virtio_net.h>
+
+/**
+ * Disable features in feature_mask. Returns 0 on success.
+ */
+int rte_eth_vhost_feature_disable(uint64_t feature_mask);
+
+/**
+ *  Enable features in feature_mask. Returns 0 on success.
+ */
+int rte_eth_vhost_feature_enable(uint64_t feature_mask);
+
+/* Returns currently supported vhost features */
+uint64_t rte_eth_vhost_feature_get(void);
+
+struct rte_eth_vhost_queue_event {
+	uint16_t queue_id;
+	bool rx;
+	bool enable;
+};
+
+/* Returns queue events */
+int rte_eth_vhost_get_queue_event(uint8_t port_id,
+		struct rte_eth_vhost_queue_event *event);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..3280b0d
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,11 @@
+DPDK_2.3 {
+
+	global:
+
+	rte_eth_vhost_feature_disable;
+	rte_eth_vhost_feature_enable;
+	rte_eth_vhost_feature_get;
+	rte_eth_vhost_get_queue_event;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 8ecab41..04f7087 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -159,7 +159,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_QAT)        += -lrte_pmd_qat
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -lrte_pmd_aesni_mb
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
 
-endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
+
+endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
+
+endif # $(CONFIG_RTE_BUILD_SHARED_LIB)
 
 endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [PATCH v6 0/2] Add VHOST PMD
  2016-02-02 11:18                           ` [PATCH v6 0/2] Add VHOST PMD Tetsuya Mukawa
@ 2016-02-02 19:52                             ` Rich Lane
  0 siblings, 0 replies; 200+ messages in thread
From: Rich Lane @ 2016-02-02 19:52 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: yuanhan.liu, dev, ann.zhuangyanying

Looks good. I tested the queue state change code in particular and didn't
find any problems.

Reviewed-by: Rich Lane <rlane@bigswitch.com>
Tested-by: Rich Lane <rlane@bigswitch.com>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v6 2/2] vhost: Add VHOST PMD
  2016-02-02 11:18                           ` [PATCH v6 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
@ 2016-02-02 23:43                             ` Ferruh Yigit
  2016-02-03  2:13                               ` Tetsuya Mukawa
  2016-02-03  7:48                               ` Tetsuya Mukawa
  0 siblings, 2 replies; 200+ messages in thread
From: Ferruh Yigit @ 2016-02-02 23:43 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying, yuanhan.liu

On Tue, Feb 02, 2016 at 08:18:42PM +0900, Tetsuya Mukawa wrote:
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> The vhost messages will be handled only when a port is started. So start
> a port first, then invoke QEMU.
> 
> The PMD has 2 parameters.
>  - iface:  The parameter is used to specify a path to connect to a
>            virtio-net device.
>  - queues: The parameter is used to specify the number of the queues
>            virtio-net device has.
>            (Default: 1)
> 
> Here is an example.
> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
> 
> To connect above testpmd, here is qemu command example.
> 
> $ qemu-system-x86_64 \
>         <snip>
>         -chardev socket,id=chr0,path=/tmp/sock0 \
>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
>         -device virtio-net-pci,netdev=net0,mq=on
> 
> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> ---
>  config/common_linuxapp                      |   6 +
>  doc/guides/nics/index.rst                   |   1 +
>  doc/guides/rel_notes/release_2_2.rst        |   2 +
>  drivers/net/Makefile                        |   4 +
>  drivers/net/vhost/Makefile                  |  62 ++
>  drivers/net/vhost/rte_eth_vhost.c           | 920 ++++++++++++++++++++++++++++
>  drivers/net/vhost/rte_eth_vhost.h           |  73 +++
>  drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
>  mk/rte.app.mk                               |   8 +-
>  9 files changed, 1086 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/vhost/Makefile
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.h
>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
> 
> diff --git a/config/common_linuxapp b/config/common_linuxapp
> index 74bc515..357b557 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -514,6 +514,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
>  CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
>  
>  #
> +# Compile vhost PMD
> +# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
> +#
> +CONFIG_RTE_LIBRTE_PMD_VHOST=y
> +
> +#
>  #Compile Xen domain0 support
>  #
>  CONFIG_RTE_LIBRTE_XEN_DOM0=n
> diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
> index 33c9cea..5819cdb 100644
> --- a/doc/guides/nics/index.rst
> +++ b/doc/guides/nics/index.rst
> @@ -47,6 +47,7 @@ Network Interface Controller Drivers
>      nfp
>      szedata2
>      virtio
> +    vhost
>      vmxnet3
>      pcap_ring
>  
> diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst

Should this be 2.3 release notes?

> index bb7d15a..6390b44 100644
> --- a/doc/guides/rel_notes/release_2_2.rst
> +++ b/doc/guides/rel_notes/release_2_2.rst
> @@ -168,6 +168,8 @@ New Features
>  
>  * **Added vhost-user multiple queue support.**
>  
> +* **Added vhost PMD.**
> +
>  * **Added port hotplug support to vmxnet3.**
>  
>  * **Added port hotplug support to xenvirt.**
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index 6e4497e..4300b93 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -52,5 +52,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
>  DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
>  
> +ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
> +endif # $(CONFIG_RTE_LIBRTE_VHOST)
> +

Not directly related to the your patch, but since I saw here:
I think we should have resolve dependencies in our "config tool"/"make system" instead of having them in makefiles, for long term.
But there is nothing you can do right now, since there is nothing available to resolve configuration dependencies.

>  include $(RTE_SDK)/mk/rte.sharelib.mk
>  include $(RTE_SDK)/mk/rte.subdir.mk
> diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
> new file mode 100644
> index 0000000..f49a69b
> --- /dev/null
> +++ b/drivers/net/vhost/Makefile
> @@ -0,0 +1,62 @@
> +#   BSD LICENSE
> +#
> +#   Copyright (c) 2010-2016 Intel Corporation.
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +#     * Redistributions of source code must retain the above copyright
> +#       notice, this list of conditions and the following disclaimer.
> +#     * Redistributions in binary form must reproduce the above copyright
> +#       notice, this list of conditions and the following disclaimer in
> +#       the documentation and/or other materials provided with the
> +#       distribution.
> +#     * Neither the name of Intel corporation nor the names of its
> +#       contributors may be used to endorse or promote products derived
> +#       from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +#
> +# library name
> +#
> +LIB = librte_pmd_vhost.a
> +
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS)
> +
> +EXPORT_MAP := rte_pmd_vhost_version.map
> +
> +LIBABIVER := 1
> +
> +#
> +# all source are stored in SRCS-y
> +#
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
> +
> +#
> +# Export include files
> +#
> +SYMLINK-y-include += rte_eth_vhost.h
> +
> +# this lib depends upon:
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
> new file mode 100644
> index 0000000..94ab8a6
> --- /dev/null
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -0,0 +1,920 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright (c) 2016 IGEL Co., Ltd.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of IGEL Co.,Ltd. nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +#include <unistd.h>
> +#include <pthread.h>
> +#include <stdbool.h>
> +#ifdef RTE_LIBRTE_VHOST_NUMA
> +#include <numaif.h>
> +#endif
> +
> +#include <rte_mbuf.h>
> +#include <rte_ethdev.h>
> +#include <rte_malloc.h>
> +#include <rte_memcpy.h>
> +#include <rte_dev.h>
> +#include <rte_kvargs.h>
> +#include <rte_virtio_net.h>
> +#include <rte_spinlock.h>
> +
> +#include "rte_eth_vhost.h"
> +
> +#define ETH_VHOST_IFACE_ARG		"iface"
> +#define ETH_VHOST_QUEUES_ARG		"queues"
> +
> +static const char *drivername = "VHOST PMD";
> +
> +static const char *valid_arguments[] = {
> +	ETH_VHOST_IFACE_ARG,
> +	ETH_VHOST_QUEUES_ARG,
> +	NULL
> +};
> +
> +static struct ether_addr base_eth_addr = {
> +	.addr_bytes = {
> +		0x56 /* V */,
> +		0x48 /* H */,
> +		0x4F /* O */,
> +		0x53 /* S */,
> +		0x54 /* T */,
> +		0x00
> +	}
> +};
> +
> +struct vhost_queue {
> +	rte_atomic32_t allow_queuing;
> +	rte_atomic32_t while_queuing;
> +	struct virtio_net *device;
> +	struct pmd_internal *internal;
> +	struct rte_mempool *mb_pool;
> +	uint16_t virtqueue_id;
> +	uint64_t rx_pkts;
> +	uint64_t tx_pkts;
> +	uint64_t missed_pkts;
> +	uint64_t rx_bytes;
> +	uint64_t tx_bytes;
> +};
> +
> +struct pmd_internal {
> +	TAILQ_ENTRY(pmd_internal) next;
> +	char *dev_name;
> +	char *iface_name;
> +	unsigned nb_rx_queues;
> +	unsigned nb_tx_queues;
> +	uint8_t port_id;

nb_rx_queuues, nb_tx_queues and port_id are duplicated in rte_eth_dev_data, there is no reason to not use them but create new ones.
But you may need to keep list of eth devices instead of internals_list for this update, not sure.
please check: http://dpdk.org/dev/patchwork/patch/10284/

> +
> +	struct vhost_queue *rx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];
> +	struct vhost_queue *tx_vhost_queues[RTE_MAX_QUEUES_PER_PORT];

If you use pointer array, these are also duplication of data->rx/tx_queues,
but you may prefer to have vhost_queue array here to escape from dynamic memory allocation.

> +
> +	volatile uint16_t once;
> +};
> +
> +TAILQ_HEAD(pmd_internal_head, pmd_internal);
> +static struct pmd_internal_head internals_list =
> +	TAILQ_HEAD_INITIALIZER(internals_list);
> +
> +static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
> +
> +static rte_atomic16_t nb_started_ports;
> +static pthread_t session_th;
> +
> +static struct rte_eth_link pmd_link = {
> +		.link_speed = 10000,
> +		.link_duplex = ETH_LINK_FULL_DUPLEX,
> +		.link_status = 0
> +};
> +
> +struct rte_vhost_vring_state {
> +	rte_spinlock_t lock;
> +
> +	bool cur[RTE_MAX_QUEUES_PER_PORT * 2];
> +	bool seen[RTE_MAX_QUEUES_PER_PORT * 2];
> +	unsigned int index;
> +	unsigned int max_vring;
> +};
> +
> +static struct rte_vhost_vring_state *vring_states[RTE_MAX_ETHPORTS];
> +
> +static uint16_t
> +eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
> +	struct vhost_queue *r = q;
> +	uint16_t i, nb_rx = 0;
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		return 0;
> +
> +	rte_atomic32_set(&r->while_queuing, 1);
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		goto out;
> +
> +	/* Dequeue packets from guest TX queue */
> +	nb_rx = rte_vhost_dequeue_burst(r->device,
> +			r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
> +
> +	r->rx_pkts += nb_rx;
> +
> +	for (i = 0; likely(i < nb_rx); i++)
> +		r->rx_bytes += bufs[i]->pkt_len;
> +
> +out:
> +	rte_atomic32_set(&r->while_queuing, 0);
> +
> +	return nb_rx;
> +}
> +
> +static uint16_t
> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
> +	struct vhost_queue *r = q;
> +	uint16_t i, nb_tx = 0;
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		return 0;
> +
> +	rte_atomic32_set(&r->while_queuing, 1);
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		goto out;
> +
> +	/* Enqueue packets to guest RX queue */
> +	nb_tx = rte_vhost_enqueue_burst(r->device,
> +			r->virtqueue_id, bufs, nb_bufs);
> +
> +	r->tx_pkts += nb_tx;
> +	r->missed_pkts += nb_bufs - nb_tx;
> +
> +	for (i = 0; likely(i < nb_tx); i++)
> +		r->tx_bytes += bufs[i]->pkt_len;
> +
> +	for (i = 0; likely(i < nb_tx); i++)
> +		rte_pktmbuf_free(bufs[i]);
> +out:
> +	rte_atomic32_set(&r->while_queuing, 0);
> +
> +	return nb_tx;
> +}
> +
> +static int
> +eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
> +{
> +	return 0;
> +}
> +
> +static inline struct pmd_internal *
> +find_internal_resource(char *ifname)
> +{
> +	int found = 0;
> +	struct pmd_internal *internal;
> +
> +	if (ifname == NULL)
> +		return NULL;
> +
> +	pthread_mutex_lock(&internal_list_lock);
> +
> +	TAILQ_FOREACH(internal, &internals_list, next) {
> +		if (!strcmp(internal->iface_name, ifname)) {
> +			found = 1;
> +			break;
> +		}
> +	}
> +
> +	pthread_mutex_unlock(&internal_list_lock);
> +
> +	if (!found)
> +		return NULL;
> +
> +	return internal;
> +}
> +
> +static int
> +new_device(struct virtio_net *dev)
> +{
> +	struct rte_eth_dev *eth_dev;
> +	struct pmd_internal *internal;
> +	struct vhost_queue *vq;
> +	unsigned i;
> +
> +	if (dev == NULL) {
> +		RTE_LOG(INFO, PMD, "Invalid argument\n");
> +		return -1;
> +	}
> +
> +	internal = find_internal_resource(dev->ifname);
> +	if (internal == NULL) {
> +		RTE_LOG(INFO, PMD, "Invalid device name\n");
> +		return -1;
> +	}
> +
> +	eth_dev = rte_eth_dev_allocated(internal->dev_name);
> +	if (eth_dev == NULL) {
> +		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
> +		return -1;
> +	}
> +
> +	for (i = 0; i < internal->nb_rx_queues; i++) {
> +		vq = internal->rx_vhost_queues[i];
> +		if (vq == NULL)
> +			continue;
> +		vq->device = dev;
> +		vq->internal = internal;
> +		rte_vhost_enable_guest_notification(
> +				dev, vq->virtqueue_id, 0);

syntax: no line wrap required here

> +	}
> +	for (i = 0; i < internal->nb_tx_queues; i++) {
> +		vq = internal->tx_vhost_queues[i];
> +		if (vq == NULL)
> +			continue;
> +		vq->device = dev;
> +		vq->internal = internal;
> +		rte_vhost_enable_guest_notification(
> +				dev, vq->virtqueue_id, 0);

syntax: no line wrap required here

> +	}
> +
> +	dev->flags |= VIRTIO_DEV_RUNNING;
> +	dev->priv = eth_dev;
> +	eth_dev->data->dev_link.link_status = 1;
> +
> +	for (i = 0; i < internal->nb_rx_queues; i++) {
> +		vq = internal->rx_vhost_queues[i];
> +		if (vq == NULL)
> +			continue;
> +		rte_atomic32_set(&vq->allow_queuing, 1);
> +	}
> +	for (i = 0; i < internal->nb_tx_queues; i++) {
> +		vq = internal->tx_vhost_queues[i];
> +		if (vq == NULL)
> +			continue;
> +		rte_atomic32_set(&vq->allow_queuing, 1);
> +	}
> +
> +	RTE_LOG(INFO, PMD, "New connection established\n");
> +
> +	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
> +
> +	return 0;
> +}
> +
> +static void
> +destroy_device(volatile struct virtio_net *dev)
> +{
> +	struct rte_eth_dev *eth_dev;
> +	struct pmd_internal *internal;
> +	struct vhost_queue *vq;
> +	unsigned i;
> +
> +	if (dev == NULL) {
> +		RTE_LOG(INFO, PMD, "Invalid argument\n");
> +		return;
> +	}
> +
> +	eth_dev = (struct rte_eth_dev *)dev->priv;
> +	if (eth_dev == NULL) {
> +		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
> +		return;
> +	}
> +
> +	internal = eth_dev->data->dev_private;
> +
> +	/* Wait until rx/tx_pkt_burst stops accessing vhost device */
> +	for (i = 0; i < internal->nb_rx_queues; i++) {
> +		vq = internal->rx_vhost_queues[i];
> +		if (vq == NULL)
> +			continue;
> +		rte_atomic32_set(&vq->allow_queuing, 0);
> +		while (rte_atomic32_read(&vq->while_queuing))
> +			rte_pause();
> +	}
> +	for (i = 0; i < internal->nb_tx_queues; i++) {
> +		vq = internal->tx_vhost_queues[i];
> +		if (vq == NULL)
> +			continue;
> +		rte_atomic32_set(&vq->allow_queuing, 0);
> +		while (rte_atomic32_read(&vq->while_queuing))
> +			rte_pause();
> +	}
> +
> +	eth_dev->data->dev_link.link_status = 0;
> +
> +	dev->priv = NULL;
> +	dev->flags &= ~VIRTIO_DEV_RUNNING;
> +
> +	for (i = 0; i < internal->nb_rx_queues; i++) {
> +		vq = internal->rx_vhost_queues[i];
> +		if (vq == NULL)
> +			continue;
> +		vq->device = NULL;
> +	}
> +	for (i = 0; i < internal->nb_tx_queues; i++) {
> +		vq = internal->tx_vhost_queues[i];
> +		if (vq == NULL)
> +			continue;
> +		vq->device = NULL;
> +	}
> +
> +	RTE_LOG(INFO, PMD, "Connection closed\n");
> +
> +	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
> +
> +}
> +
> +static int
> +vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
> +{
> +	struct rte_vhost_vring_state *state;
> +	struct pmd_internal *internal;
> +#ifdef RTE_LIBRTE_VHOST_NUMA
> +	int newnode, ret;
> +#endif
> +
> +	if (dev == NULL) {
> +		RTE_LOG(ERR, PMD, "Invalid argument\n");
> +		return -1;
> +	}
> +
> +	internal = find_internal_resource(dev->ifname);

Can find_internal_resource() return NULL here?

> +	state = vring_states[internal->port_id];
> +	if (!state) {
> +		RTE_LOG(ERR, PMD, "Unused port\n");
> +		return -1;
> +	}
> +
> +#ifdef RTE_LIBRTE_VHOST_NUMA
> +	ret  = get_mempolicy(&newnode, NULL, 0, dev,
> +			MPOL_F_NODE | MPOL_F_ADDR);
> +	if (ret < 0) {
> +		RTE_LOG(ERR, PMD, "Unknow numa node\n");
> +		return -1;
> +	}
> +
> +	rte_eth_devices[internal->port_id].data->numa_node = newnode;

Isn't dev->priv already has eth_dev, do we need to access to eth_dev as rte_eth_devices[...]
valid for above, instead of find_internal_resource() can't we use dev->priv->data->dev_private

> +#endif
> +	rte_spinlock_lock(&state->lock);
> +	state->cur[vring] = enable;
> +	state->max_vring = RTE_MAX(vring, state->max_vring);
> +	rte_spinlock_unlock(&state->lock);
> +
> +
> +	RTE_LOG(INFO, PMD, "vring%u is %s\n",
> +			vring, enable ? "enabled" : "disabled");
> +
> +	_rte_eth_dev_callback_process(&rte_eth_devices[internal->port_id],
> +			RTE_ETH_EVENT_QUEUE_STATE_CHANGE);
> +
> +	return 0;
> +}
> +
> +int
> +rte_eth_vhost_get_queue_event(uint8_t port_id,
> +		struct rte_eth_vhost_queue_event *event)
> +{
> +	struct rte_vhost_vring_state *state;
> +	unsigned int i;
> +	int idx;
> +
> +	if (port_id >= RTE_MAX_ETHPORTS) {
> +		RTE_LOG(ERR, PMD, "Invalid port id\n");
> +		return -1;
> +	}
> +
> +	state = vring_states[port_id];
> +	if (!state) {
> +		RTE_LOG(ERR, PMD, "Unused port\n");
> +		return -1;
> +	}
> +
> +	rte_spinlock_lock(&state->lock);
> +	for (i = 0; i <= state->max_vring; i++) {
> +		idx = state->index++ % (state->max_vring + 1);
> +
> +		if (state->cur[idx] != state->seen[idx]) {
> +			state->seen[idx] = state->cur[idx];
> +			event->queue_id = idx / 2;
> +			event->rx = idx & 1;
> +			event->enable = state->cur[idx];
> +			rte_spinlock_unlock(&state->lock);
> +			return 0;
> +		}
> +	}
> +	rte_spinlock_unlock(&state->lock);
> +
> +	return -1;
> +}
> +
> +static void *
> +vhost_driver_session(void *param __rte_unused)
> +{
> +	static struct virtio_net_device_ops vhost_ops;
> +
> +	/* set vhost arguments */
> +	vhost_ops.new_device = new_device;
> +	vhost_ops.destroy_device = destroy_device;
> +	vhost_ops.vring_state_changed = vring_state_changed;
> +	if (rte_vhost_driver_callback_register(&vhost_ops) < 0)
> +		rte_panic("Can't register callbacks\n");
> +
> +	/* start event handling */
> +	rte_vhost_driver_session_start();
> +
> +	pthread_exit(0);
> +}
> +
> +static void
> +vhost_driver_session_start(void)
> +{
> +	int ret;
> +
> +	ret = pthread_create(&session_th,
> +			NULL, vhost_driver_session, NULL);
> +	if (ret)
> +		rte_panic("Can't create a thread\n");
> +}
> +
> +static void
> +vhost_driver_session_stop(void)
> +{
> +	int ret;
> +
> +	ret = pthread_cancel(session_th);
> +	if (ret)
> +		rte_panic("Can't cancel the thread\n");
> +
> +	ret = pthread_join(session_th, NULL);
> +	if (ret)
> +		rte_panic("Can't join the thread\n");
> +}
> +
> +static int
> +eth_dev_start(struct rte_eth_dev *dev)
> +{
> +	int ret;
> +	struct pmd_internal *internal = dev->data->dev_private;
> +
> +	if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
> +		ret = rte_vhost_driver_register(internal->iface_name);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	/* We need only one message handling thread */
> +	if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
> +		vhost_driver_session_start();
> +
> +	return 0;
> +}
> +
> +static void
> +eth_dev_stop(struct rte_eth_dev *dev)
> +{
> +	struct pmd_internal *internal = dev->data->dev_private;
> +
> +	if (rte_atomic16_cmpset(&internal->once, 1, 0))
> +		rte_vhost_driver_unregister(internal->iface_name);
> +
> +	if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
> +		vhost_driver_session_stop();
> +}
> +
> +static int
> +eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
> +		   uint16_t nb_rx_desc __rte_unused,
> +		   unsigned int socket_id,
> +		   const struct rte_eth_rxconf *rx_conf __rte_unused,
> +		   struct rte_mempool *mb_pool)
> +{
> +	struct pmd_internal *internal = dev->data->dev_private;
> +	struct vhost_queue *vq;
> +
> +	rte_free(internal->rx_vhost_queues[rx_queue_id]);

Why free here, initially this value already should be NULL?
If possible to call queue_setup multiple times, does make sense to free in rx/tx_queue_release() functions?

> +
> +	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
> +			RTE_CACHE_LINE_SIZE, socket_id);
> +	if (vq == NULL) {
> +		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
> +		return -ENOMEM;
> +	}

Other vPMDs use static memory for queues in internals struct to escape from allocate/free with a cost of memory consumption
Just another option if you prefer.

> +
> +	vq->mb_pool = mb_pool;
> +	vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
> +	internal->rx_vhost_queues[rx_queue_id] = vq;
> +	dev->data->rx_queues[rx_queue_id] = vq;

data->rx_queues and internal->rx_vhost_queues are duplicate here, and data->rx_queues not used at all.
you can remove internal->rx_vhost_queus and use only data->rxqueues. Sama valid for tx_queues..

> +	return 0;
> +}
> +
> +static int
> +eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
> +		   uint16_t nb_tx_desc __rte_unused,
> +		   unsigned int socket_id,
> +		   const struct rte_eth_txconf *tx_conf __rte_unused)
> +{
> +	struct pmd_internal *internal = dev->data->dev_private;
> +	struct vhost_queue *vq;
> +
> +	rte_free(internal->tx_vhost_queues[tx_queue_id]);
> +
> +	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
> +			RTE_CACHE_LINE_SIZE, socket_id);
> +	if (vq == NULL) {
> +		RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
> +		return -ENOMEM;
> +	}
> +
> +	vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
> +	internal->tx_vhost_queues[tx_queue_id] = vq;
> +	dev->data->tx_queues[tx_queue_id] = vq;
> +	return 0;
> +}
> +
> +
> +static void
> +eth_dev_info(struct rte_eth_dev *dev,
> +	     struct rte_eth_dev_info *dev_info)
> +{
> +	struct pmd_internal *internal = dev->data->dev_private;
> +
> +	dev_info->driver_name = drivername;
> +	dev_info->max_mac_addrs = 1;
> +	dev_info->max_rx_pktlen = (uint32_t)-1;
> +	dev_info->max_rx_queues = (uint16_t)internal->nb_rx_queues;
> +	dev_info->max_tx_queues = (uint16_t)internal->nb_tx_queues;

again, internal->nb_rx_queueus and data->nb_rx_queues are duplicate

> +	dev_info->min_rx_bufsize = 0;
> +}
> +
> +static void
> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)

why name is igb_stats, historical?

> +{
> +	unsigned i;
> +	unsigned long rx_total = 0, tx_total = 0, tx_missed_total = 0;
> +	unsigned long rx_total_bytes = 0, tx_total_bytes = 0;
> +	const struct pmd_internal *internal = dev->data->dev_private;
> +
> +	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
> +	     i < internal->nb_rx_queues; i++) {
> +		if (internal->rx_vhost_queues[i] == NULL)
> +			continue;
> +		igb_stats->q_ipackets[i] = internal->rx_vhost_queues[i]->rx_pkts;
> +		rx_total += igb_stats->q_ipackets[i];
> +
> +		igb_stats->q_ibytes[i] = internal->rx_vhost_queues[i]->rx_bytes;
> +		rx_total_bytes += igb_stats->q_ibytes[i];
> +	}
> +
> +	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
> +	     i < internal->nb_tx_queues; i++) {
> +		if (internal->tx_vhost_queues[i] == NULL)
> +			continue;
> +		igb_stats->q_opackets[i] = internal->tx_vhost_queues[i]->tx_pkts;
> +		tx_missed_total += internal->tx_vhost_queues[i]->missed_pkts;
> +		tx_total += igb_stats->q_opackets[i];
> +
> +		igb_stats->q_obytes[i] = internal->tx_vhost_queues[i]->tx_bytes;
> +		tx_total_bytes += igb_stats->q_obytes[i];
> +	}
> +
> +	igb_stats->ipackets = rx_total;
> +	igb_stats->opackets = tx_total;
> +	igb_stats->imissed = tx_missed_total;
> +	igb_stats->ibytes = rx_total_bytes;
> +	igb_stats->obytes = tx_total_bytes;
> +}
> +
> +static void
> +eth_stats_reset(struct rte_eth_dev *dev)
> +{
> +	unsigned i;
> +	struct pmd_internal *internal = dev->data->dev_private;
> +
> +	for (i = 0; i < internal->nb_rx_queues; i++) {
> +		if (internal->rx_vhost_queues[i] == NULL)
> +			continue;
> +		internal->rx_vhost_queues[i]->rx_pkts = 0;
> +		internal->rx_vhost_queues[i]->rx_bytes = 0;
> +	}
> +	for (i = 0; i < internal->nb_tx_queues; i++) {
> +		if (internal->tx_vhost_queues[i] == NULL)
> +			continue;
> +		internal->tx_vhost_queues[i]->tx_pkts = 0;
> +		internal->tx_vhost_queues[i]->tx_bytes = 0;
> +		internal->tx_vhost_queues[i]->missed_pkts = 0;
> +	}
> +}
> +
> +static void
> +eth_queue_release(void *q __rte_unused)
> +{
> +	return;
> +}
> +
> +static int
> +eth_link_update(struct rte_eth_dev *dev __rte_unused,
> +		int wait_to_complete __rte_unused)
> +{
> +	return 0;
> +}
> +
> +/**
> + * Disable features in feature_mask. Returns 0 on success.
> + */
> +int
> +rte_eth_vhost_feature_disable(uint64_t feature_mask)
> +{
> +	return rte_vhost_feature_disable(feature_mask);
> +}
> +
> +/**
> + * Enable features in feature_mask. Returns 0 on success.
> + */
> +int
> +rte_eth_vhost_feature_enable(uint64_t feature_mask)
> +{
> +	return rte_vhost_feature_enable(feature_mask);
> +}
> +
> +/* Returns currently supported vhost features */
> +uint64_t
> +rte_eth_vhost_feature_get(void)
> +{
> +	return rte_vhost_feature_get();
> +}
> +
> +static const struct eth_dev_ops ops = {
> +	.dev_start = eth_dev_start,
> +	.dev_stop = eth_dev_stop,
> +	.dev_configure = eth_dev_configure,
> +	.dev_infos_get = eth_dev_info,
> +	.rx_queue_setup = eth_rx_queue_setup,
> +	.tx_queue_setup = eth_tx_queue_setup,
> +	.rx_queue_release = eth_queue_release,
> +	.tx_queue_release = eth_queue_release,
> +	.link_update = eth_link_update,
> +	.stats_get = eth_stats_get,
> +	.stats_reset = eth_stats_reset,
> +};
> +
> +static int
> +eth_dev_vhost_create(const char *name, int index,
> +		     char *iface_name,
> +		     int16_t queues,
> +		     const unsigned numa_node)
> +{
> +	struct rte_eth_dev_data *data = NULL;
> +	struct pmd_internal *internal = NULL;
> +	struct rte_eth_dev *eth_dev = NULL;
> +	struct ether_addr *eth_addr = NULL;
> +	struct rte_vhost_vring_state *vring_state = NULL;
> +
> +	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
> +		numa_node);
> +
> +	/* now do all data allocation - for eth_dev structure, dummy pci driver
> +	 * and internal (private) data
> +	 */
> +	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
> +	if (data == NULL)
> +		goto error;
> +
> +	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
> +	if (internal == NULL)
> +		goto error;
> +
> +	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
> +	if (eth_addr == NULL)
> +		goto error;
> +	*eth_addr = base_eth_addr;
> +	eth_addr->addr_bytes[5] = index;
> +
> +	/* reserve an ethdev entry */
> +	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
> +	if (eth_dev == NULL)
> +		goto error;
> +
> +	TAILQ_INIT(&(eth_dev->link_intr_cbs));
> +
> +	/* now put it all together
> +	 * - store queue data in internal,
> +	 * - store numa_node info in ethdev data
> +	 * - point eth_dev_data to internals
> +	 * - and point eth_dev structure to new eth_dev_data structure
> +	 */
> +	internal->nb_rx_queues = queues;
> +	internal->nb_tx_queues = queues;
> +	internal->dev_name = strdup(name);
> +	if (internal->dev_name == NULL)
> +		goto error;

eth_dev successfully allocated, do we need to do something in error case?

> +	internal->iface_name = strdup(iface_name);
> +	if (internal->iface_name == NULL) {
> +		free(internal->dev_name);
> +		goto error;
> +	}
> +	internal->port_id = eth_dev->data->port_id;
> +
> +	vring_state = rte_zmalloc_socket(name,
> +			sizeof(*vring_state), 0, numa_node);
> +	if (vring_state == NULL)

free dev_name & iface_name.

> +		goto error;
> +	rte_spinlock_init(&vring_state->lock);
> +	vring_states[eth_dev->data->port_id] = vring_state;
> +
> +	pthread_mutex_lock(&internal_list_lock);
> +	TAILQ_INSERT_TAIL(&internals_list, internal, next);
> +	pthread_mutex_unlock(&internal_list_lock);
> +
> +	data->dev_private = internal;
> +	data->port_id = eth_dev->data->port_id;
> +	memmove(data->name, eth_dev->data->name, sizeof(data->name));
> +	data->nb_rx_queues = queues;
> +	data->nb_tx_queues = queues;
> +	data->dev_link = pmd_link;
> +	data->mac_addrs = eth_addr;
> +
> +	/* We'll replace the 'data' originally allocated by eth_dev. So the
> +	 * vhost PMD resources won't be shared between multi processes.
> +	 */
> +	eth_dev->data = data;
> +	eth_dev->dev_ops = &ops;
> +	eth_dev->driver = NULL;
> +	eth_dev->data->dev_flags =
> +		RTE_ETH_DEV_DETACHABLE | RTE_ETH_DEV_INTR_LSC;
> +	eth_dev->data->kdrv = RTE_KDRV_NONE;
> +	eth_dev->data->drv_name = internal->dev_name;
> +	eth_dev->data->numa_node = numa_node;

Cosmetics but you can access as data->xxx instead of eth_dev->data->xxx

> +
> +	/* finally assign rx and tx ops */
> +	eth_dev->rx_pkt_burst = eth_vhost_rx;
> +	eth_dev->tx_pkt_burst = eth_vhost_tx;
> +
> +	return data->port_id;
> +
> +error:
> +	rte_free(data);
> +	rte_free(internal);
> +	rte_free(eth_addr);
> +
> +	return -1;
> +}
> +
> +static inline int
> +open_iface(const char *key __rte_unused, const char *value, void *extra_args)
> +{
> +	const char **iface_name = extra_args;
> +
> +	if (value == NULL)
> +		return -1;
> +
> +	*iface_name = value;
> +
> +	return 0;
> +}
> +
> +static inline int
> +open_queues(const char *key __rte_unused, const char *value, void *extra_args)
> +{
> +	uint16_t *q = extra_args;
> +
> +	if ((value == NULL) || (extra_args == NULL))
> +		return -EINVAL;
> +
> +	*q = (uint16_t)strtoul(value, NULL, 0);
> +	if ((*q == USHRT_MAX) && (errno == ERANGE))
> +		return -1;
> +
> +	if (*q > RTE_MAX_QUEUES_PER_PORT)
> +		return -1;
> +
> +	return 0;
> +}
> +
> +static int
> +rte_pmd_vhost_devinit(const char *name, const char *params)
> +{
> +	struct rte_kvargs *kvlist = NULL;
> +	int ret = 0;
> +	int index;
> +	char *iface_name;
> +	uint16_t queues;
> +
> +	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
> +
> +	if (strlen(name) < strlen("eth_vhost"))
> +		return -1;

No need to do this check, rte_eal_vdev_init() already checks name and this functions called only if there is a match.

> +
> +	index = strtol(name + strlen("eth_vhost"), NULL, 0);
> +	if (errno == ERANGE)
> +		return -1;

Does device name has to contain integer, does "eth_vhostA" valid name? If so does it make sense to use port_id instead of index?

> +
> +	kvlist = rte_kvargs_parse(params, valid_arguments);
> +	if (kvlist == NULL)
> +		return -1;
> +
> +	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
> +		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
> +					 &open_iface, &iface_name);
> +		if (ret < 0)
> +			goto out_free;
> +	} else {
> +		ret = -1;
> +		goto out_free;
> +	}
> +
> +	if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
> +		ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
> +					 &open_queues, &queues);
> +		if (ret < 0)
> +			goto out_free;
> +
> +	} else
> +		queues = 1;
> +
> +	eth_dev_vhost_create(name, index,
> +			iface_name, queues, rte_socket_id());

syntax: no line wrap required here

> +
> +out_free:
> +	rte_kvargs_free(kvlist);
> +	return ret;
> +}
> +
> +static int
> +rte_pmd_vhost_devuninit(const char *name)
> +{
> +	struct rte_eth_dev *eth_dev = NULL;
> +	struct pmd_internal *internal;
> +	unsigned int i;
> +
> +	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
> +
> +	if (name == NULL)
> +		return -EINVAL;

This check is not required, already done in rte_eal_vdev_uninit()

> +
> +	/* find an ethdev entry */
> +	eth_dev = rte_eth_dev_allocated(name);
> +	if (eth_dev == NULL)
> +		return -ENODEV;
> +
> +	internal = eth_dev->data->dev_private;
> +
> +	rte_free(vring_states[internal->port_id]);
> +	vring_states[internal->port_id] = NULL;
> +
> +	pthread_mutex_lock(&internal_list_lock);
> +	TAILQ_REMOVE(&internals_list, internal, next);
> +	pthread_mutex_unlock(&internal_list_lock);
> +
> +	eth_dev_stop(eth_dev);
> +
> +	if ((internal) && (internal->dev_name))

if "internal" can be NULL, above internal->port_id reference will crash, if can't be NULL no need to check here.

> +		free(internal->dev_name);
> +	if ((internal) && (internal->iface_name))

Do we need parentheses around internal->iface_name (and internal if it will stay)

> +		free(internal->iface_name);
> +
> +	rte_free(eth_dev->data->mac_addrs);
> +	rte_free(eth_dev->data);
> +
> +	for (i = 0; i < internal->nb_rx_queues; i++)
> +		rte_free(internal->rx_vhost_queues[i]);
> +	for (i = 0; i < internal->nb_tx_queues; i++)
> +		rte_free(internal->tx_vhost_queues[i]);
> +	rte_free(internal);
> +
> +	rte_eth_dev_release_port(eth_dev);
> +
> +	return 0;
> +}
> +
> +static struct rte_driver pmd_vhost_drv = {
> +	.name = "eth_vhost",
> +	.type = PMD_VDEV,
> +	.init = rte_pmd_vhost_devinit,
> +	.uninit = rte_pmd_vhost_devuninit,
> +};
> +
> +PMD_REGISTER_DRIVER(pmd_vhost_drv);
> diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
> new file mode 100644
> index 0000000..8aa894a
> --- /dev/null
> +++ b/drivers/net/vhost/rte_eth_vhost.h
> @@ -0,0 +1,73 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2016 IGEL Co., Ltd.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of IGEL Co., Ltd. nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_ETH_VHOST_H_
> +#define _RTE_ETH_VHOST_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <stdint.h>
> +#include <stdbool.h>
> +
> +#include <rte_virtio_net.h>
> +
> +/**
> + * Disable features in feature_mask. Returns 0 on success.
> + */
> +int rte_eth_vhost_feature_disable(uint64_t feature_mask);
> +
> +/**
> + *  Enable features in feature_mask. Returns 0 on success.
> + */
> +int rte_eth_vhost_feature_enable(uint64_t feature_mask);
> +
> +/* Returns currently supported vhost features */

This also can be commented in doxygen style.

> +uint64_t rte_eth_vhost_feature_get(void);
> +
> +struct rte_eth_vhost_queue_event {
> +	uint16_t queue_id;
> +	bool rx;
> +	bool enable;
> +};
> +
> +/* Returns queue events */
> +int rte_eth_vhost_get_queue_event(uint8_t port_id,
> +		struct rte_eth_vhost_queue_event *event);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif
> diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
> new file mode 100644
> index 0000000..3280b0d
> --- /dev/null
> +++ b/drivers/net/vhost/rte_pmd_vhost_version.map
> @@ -0,0 +1,11 @@
> +DPDK_2.3 {
> +
> +	global:
> +
> +	rte_eth_vhost_feature_disable;
> +	rte_eth_vhost_feature_enable;
> +	rte_eth_vhost_feature_get;
> +	rte_eth_vhost_get_queue_event;
> +
> +	local: *;
> +};
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index 8ecab41..04f7087 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -159,7 +159,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_QAT)        += -lrte_pmd_qat
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -lrte_pmd_aesni_mb
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
>  
> -endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
> +ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
> +
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
> +
> +endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
> +
> +endif # $(CONFIG_RTE_BUILD_SHARED_LIB)

I guess "!" in comment is to say this if block is for SHARED_LIB==n, if so we shouldn't update the comment to remove "!",
And the line you have added should have "!" in comment : "endif # $(CONFIG_RTE_LIBRTE_VHOST)"

>  
>  endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
>  
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v6 2/2] vhost: Add VHOST PMD
  2016-02-02 23:43                             ` Ferruh Yigit
@ 2016-02-03  2:13                               ` Tetsuya Mukawa
  2016-02-03  7:48                               ` Tetsuya Mukawa
  1 sibling, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-03  2:13 UTC (permalink / raw)
  To: ferruh.yigit; +Cc: dev, ann.zhuangyanying, yuanhan.liu

On 2016/02/03 8:43, Ferruh Yigit wrote:
> On Tue, Feb 02, 2016 at 08:18:42PM +0900, Tetsuya Mukawa wrote:
>> diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
>> index 33c9cea..5819cdb 100644
>> --- a/doc/guides/nics/index.rst
>> +++ b/doc/guides/nics/index.rst
>> @@ -47,6 +47,7 @@ Network Interface Controller Drivers
>>      nfp
>>      szedata2
>>      virtio
>> +    vhost
>>      vmxnet3
>>      pcap_ring
>>  
>> diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
> Should this be 2.3 release notes?

Hi Ferruh,

Thanks for your checking.
Yes, it should be. I will fix it.

>> +
>> +struct pmd_internal {
>> +	TAILQ_ENTRY(pmd_internal) next;
>> +	char *dev_name;
>> +	char *iface_name;
>> +	unsigned nb_rx_queues;
>> +	unsigned nb_tx_queues;
>> +	uint8_t port_id;
> nb_rx_queuues, nb_tx_queues and port_id are duplicated in rte_eth_dev_data, there is no reason to not use them but create new ones.
> But you may need to keep list of eth devices instead of internals_list for this update, not sure.
> please check: http://dpdk.org/dev/patchwork/patch/10284/

It seems I wrongly understand how to use queues in virtual PMD, then
duplicates some values.
Sure, I will follow above patch, and fix all same issues in my patch.
Also will check Null PMD fixing.

>> +	for (i = 0; i < internal->nb_rx_queues; i++) {
>> +		vq = internal->rx_vhost_queues[i];
>> +		if (vq == NULL)
>> +			continue;
>> +		vq->device = dev;
>> +		vq->internal = internal;
>> +		rte_vhost_enable_guest_notification(
>> +				dev, vq->virtqueue_id, 0);
> syntax: no line wrap required here

Will fix.

>
>> +	}
>> +	for (i = 0; i < internal->nb_tx_queues; i++) {
>> +		vq = internal->tx_vhost_queues[i];
>> +		if (vq == NULL)
>> +			continue;
>> +		vq->device = dev;
>> +		vq->internal = internal;
>> +		rte_vhost_enable_guest_notification(
>> +				dev, vq->virtqueue_id, 0);
> syntax: no line wrap required here

Will fix it also.

>> +
>> +static int
>> +vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
>> +{
>> +	struct rte_vhost_vring_state *state;
>> +	struct pmd_internal *internal;
>> +#ifdef RTE_LIBRTE_VHOST_NUMA
>> +	int newnode, ret;
>> +#endif
>> +
>> +	if (dev == NULL) {
>> +		RTE_LOG(ERR, PMD, "Invalid argument\n");
>> +		return -1;
>> +	}
>> +
>> +	internal = find_internal_resource(dev->ifname);
> Can find_internal_resource() return NULL here?

Will add null pointer checking here.

>> +	state = vring_states[internal->port_id];
>> +	if (!state) {
>> +		RTE_LOG(ERR, PMD, "Unused port\n");
>> +		return -1;
>> +	}
>> +
>> +#ifdef RTE_LIBRTE_VHOST_NUMA
>> +	ret  = get_mempolicy(&newnode, NULL, 0, dev,
>> +			MPOL_F_NODE | MPOL_F_ADDR);
>> +	if (ret < 0) {
>> +		RTE_LOG(ERR, PMD, "Unknow numa node\n");
>> +		return -1;
>> +	}
>> +
>> +	rte_eth_devices[internal->port_id].data->numa_node = newnode;
> Isn't dev->priv already has eth_dev, do we need to access to eth_dev as rte_eth_devices[...]
> valid for above, instead of find_internal_resource() can't we use dev->priv->data->dev_private

'dev->priv' will be filled in new_device(). And this event will be come
before calling new_device().
Because of this, the function accesses rte_eth_devices[].

>
>> +static int
>> +eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
>> +		   uint16_t nb_rx_desc __rte_unused,
>> +		   unsigned int socket_id,
>> +		   const struct rte_eth_rxconf *rx_conf __rte_unused,
>> +		   struct rte_mempool *mb_pool)
>> +{
>> +	struct pmd_internal *internal = dev->data->dev_private;
>> +	struct vhost_queue *vq;
>> +
>> +	rte_free(internal->rx_vhost_queues[rx_queue_id]);
> Why free here, initially this value already should be NULL?
> If possible to call queue_setup multiple times, does make sense to free in rx/tx_queue_release() functions?

Yes, you are right.
I forgot to delete debug code.
Will remove it.

>> +
>> +	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
>> +			RTE_CACHE_LINE_SIZE, socket_id);
>> +	if (vq == NULL) {
>> +		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
>> +		return -ENOMEM;
>> +	}
> Other vPMDs use static memory for queues in internals struct to escape from allocate/free with a cost of memory consumption
> Just another option if you prefer.

This is because queues may be in different numa node in vhost PMD case.

>> +	dev_info->min_rx_bufsize = 0;
>> +}
>> +
>> +static void
>> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
> why name is igb_stats, historical?

Yeah, it's came from copy and paste.
I will fix it and below issues you  pointed out.
I appreciate your carefully reviewing!

Tetsuya

>> +static int
>> +eth_dev_vhost_create(const char *name, int index,
>> +		     char *iface_name,
>> +		     int16_t queues,
>> +		     const unsigned numa_node)
>> +{
>> +	struct rte_eth_dev_data *data = NULL;
>> +	struct pmd_internal *internal = NULL;
>> +	struct rte_eth_dev *eth_dev = NULL;
>> +	struct ether_addr *eth_addr = NULL;
>> +	struct rte_vhost_vring_state *vring_state = NULL;
>> +
>> +	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
>> +		numa_node);
>> +
>> +	/* now do all data allocation - for eth_dev structure, dummy pci driver
>> +	 * and internal (private) data
>> +	 */
>> +	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
>> +	if (data == NULL)
>> +		goto error;
>> +
>> +	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
>> +	if (internal == NULL)
>> +		goto error;
>> +
>> +	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
>> +	if (eth_addr == NULL)
>> +		goto error;
>> +	*eth_addr = base_eth_addr;
>> +	eth_addr->addr_bytes[5] = index;
>> +
>> +	/* reserve an ethdev entry */
>> +	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
>> +	if (eth_dev == NULL)
>> +		goto error;
>> +
>> +	TAILQ_INIT(&(eth_dev->link_intr_cbs));
>> +
>> +	/* now put it all together
>> +	 * - store queue data in internal,
>> +	 * - store numa_node info in ethdev data
>> +	 * - point eth_dev_data to internals
>> +	 * - and point eth_dev structure to new eth_dev_data structure
>> +	 */
>> +	internal->nb_rx_queues = queues;
>> +	internal->nb_tx_queues = queues;
>> +	internal->dev_name = strdup(name);
>> +	if (internal->dev_name == NULL)
>> +		goto error;
> eth_dev successfully allocated, do we need to do something in error case?
>
>> +	internal->iface_name = strdup(iface_name);
>> +	if (internal->iface_name == NULL) {
>> +		free(internal->dev_name);
>> +		goto error;
>> +	}
>> +	internal->port_id = eth_dev->data->port_id;
>> +
>> +	vring_state = rte_zmalloc_socket(name,
>> +			sizeof(*vring_state), 0, numa_node);
>> +	if (vring_state == NULL)
> free dev_name & iface_name.
>
>> +		goto error;
>> +	rte_spinlock_init(&vring_state->lock);
>> +	vring_states[eth_dev->data->port_id] = vring_state;
>> +
>> +	pthread_mutex_lock(&internal_list_lock);
>> +	TAILQ_INSERT_TAIL(&internals_list, internal, next);
>> +	pthread_mutex_unlock(&internal_list_lock);
>> +
>> +	data->dev_private = internal;
>> +	data->port_id = eth_dev->data->port_id;
>> +	memmove(data->name, eth_dev->data->name, sizeof(data->name));
>> +	data->nb_rx_queues = queues;
>> +	data->nb_tx_queues = queues;
>> +	data->dev_link = pmd_link;
>> +	data->mac_addrs = eth_addr;
>> +
>> +	/* We'll replace the 'data' originally allocated by eth_dev. So the
>> +	 * vhost PMD resources won't be shared between multi processes.
>> +	 */
>> +	eth_dev->data = data;
>> +	eth_dev->dev_ops = &ops;
>> +	eth_dev->driver = NULL;
>> +	eth_dev->data->dev_flags =
>> +		RTE_ETH_DEV_DETACHABLE | RTE_ETH_DEV_INTR_LSC;
>> +	eth_dev->data->kdrv = RTE_KDRV_NONE;
>> +	eth_dev->data->drv_name = internal->dev_name;
>> +	eth_dev->data->numa_node = numa_node;
> Cosmetics but you can access as data->xxx instead of eth_dev->data->xxx
>
>> +static int
>> +rte_pmd_vhost_devinit(const char *name, const char *params)
>> +{
>> +	struct rte_kvargs *kvlist = NULL;
>> +	int ret = 0;
>> +	int index;
>> +	char *iface_name;
>> +	uint16_t queues;
>> +
>> +	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
>> +
>> +	if (strlen(name) < strlen("eth_vhost"))
>> +		return -1;
> No need to do this check, rte_eal_vdev_init() already checks name and this functions called only if there is a match.
>
>> +
>> +	index = strtol(name + strlen("eth_vhost"), NULL, 0);
>> +	if (errno == ERANGE)
>> +		return -1;
> Does device name has to contain integer, does "eth_vhostA" valid name? If so does it make sense to use port_id instead of index?
>
>> +
>> +	kvlist = rte_kvargs_parse(params, valid_arguments);
>> +	if (kvlist == NULL)
>> +		return -1;
>> +
>> +	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
>> +		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
>> +					 &open_iface, &iface_name);
>> +		if (ret < 0)
>> +			goto out_free;
>> +	} else {
>> +		ret = -1;
>> +		goto out_free;
>> +	}
>> +
>> +	if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
>> +		ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
>> +					 &open_queues, &queues);
>> +		if (ret < 0)
>> +			goto out_free;
>> +
>> +	} else
>> +		queues = 1;
>> +
>> +	eth_dev_vhost_create(name, index,
>> +			iface_name, queues, rte_socket_id());
> syntax: no line wrap required here
>
>> +
>> +out_free:
>> +	rte_kvargs_free(kvlist);
>> +	return ret;
>> +}
>> +
>> +static int
>> +rte_pmd_vhost_devuninit(const char *name)
>> +{
>> +	struct rte_eth_dev *eth_dev = NULL;
>> +	struct pmd_internal *internal;
>> +	unsigned int i;
>> +
>> +	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
>> +
>> +	if (name == NULL)
>> +		return -EINVAL;
> This check is not required, already done in rte_eal_vdev_uninit()
>
>> +
>> +	/* find an ethdev entry */
>> +	eth_dev = rte_eth_dev_allocated(name);
>> +	if (eth_dev == NULL)
>> +		return -ENODEV;
>> +
>> +	internal = eth_dev->data->dev_private;
>> +
>> +	rte_free(vring_states[internal->port_id]);
>> +	vring_states[internal->port_id] = NULL;
>> +
>> +	pthread_mutex_lock(&internal_list_lock);
>> +	TAILQ_REMOVE(&internals_list, internal, next);
>> +	pthread_mutex_unlock(&internal_list_lock);
>> +
>> +	eth_dev_stop(eth_dev);
>> +
>> +	if ((internal) && (internal->dev_name))
> if "internal" can be NULL, above internal->port_id reference will crash, if can't be NULL no need to check here.
>
>> +		free(internal->dev_name);
>> +	if ((internal) && (internal->iface_name))
> Do we need parentheses around internal->iface_name (and internal if it will stay)
>
>> +		free(internal->iface_name);
>> +
>> +	rte_free(eth_dev->data->mac_addrs);
>> +	rte_free(eth_dev->data);
>> +
>> +	for (i = 0; i < internal->nb_rx_queues; i++)
>> +		rte_free(internal->rx_vhost_queues[i]);
>> +	for (i = 0; i < internal->nb_tx_queues; i++)
>> +		rte_free(internal->tx_vhost_queues[i]);
>> +	rte_free(internal);
>> +
>> +	rte_eth_dev_release_port(eth_dev);
>> +
>> +	return 0;
>> +}
>> +
>> +static struct rte_driver pmd_vhost_drv = {
>> +	.name = "eth_vhost",
>> +	.type = PMD_VDEV,
>> +	.init = rte_pmd_vhost_devinit,
>> +	.uninit = rte_pmd_vhost_devuninit,
>> +};
>> +
>> +PMD_REGISTER_DRIVER(pmd_vhost_drv);
>> diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
>> new file mode 100644
>> index 0000000..8aa894a
>> --- /dev/null
>> +++ b/drivers/net/vhost/rte_eth_vhost.h
>> +/**
>> + * Disable features in feature_mask. Returns 0 on success.
>> + */
>> +int rte_eth_vhost_feature_disable(uint64_t feature_mask);
>> +
>> +/**
>> + *  Enable features in feature_mask. Returns 0 on success.
>> + */
>> +int rte_eth_vhost_feature_enable(uint64_t feature_mask);
>> +
>> +/* Returns currently supported vhost features */
> This also can be commented in doxygen style.
>
>> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
>> index 8ecab41..04f7087 100644
>> --- a/mk/rte.app.mk
>> +++ b/mk/rte.app.mk
>> @@ -159,7 +159,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_QAT)        += -lrte_pmd_qat
>>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -lrte_pmd_aesni_mb
>>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
>>  
>> -endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
>> +ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
>> +
>> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
>> +
>> +endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
>> +
>> +endif # $(CONFIG_RTE_BUILD_SHARED_LIB)
> I guess "!" in comment is to say this if block is for SHARED_LIB==n, if so we shouldn't update the comment to remove "!",
> And the line you have added should have "!" in comment : "endif # $(CONFIG_RTE_LIBRTE_VHOST)"
>
>>  
>>  endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
>>  
>> -- 
>> 2.1.4
>>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v6 2/2] vhost: Add VHOST PMD
  2016-02-02 23:43                             ` Ferruh Yigit
  2016-02-03  2:13                               ` Tetsuya Mukawa
@ 2016-02-03  7:48                               ` Tetsuya Mukawa
  2016-02-03  9:24                                 ` Ferruh Yigit
  1 sibling, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-03  7:48 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev, ann.zhuangyanying, yuanhan.liu

On 2016/02/03 8:43, Ferruh Yigit wrote:
> On Tue, Feb 02, 2016 at 08:18:42PM +0900, Tetsuya Mukawa wrote:
>> +
>> +	/* find an ethdev entry */
>> +	eth_dev = rte_eth_dev_allocated(name);
>> +	if (eth_dev == NULL)
>> +		return -ENODEV;
>> +
>> +	internal = eth_dev->data->dev_private;
>> +
>> +	rte_free(vring_states[internal->port_id]);
>> +	vring_states[internal->port_id] = NULL;
>> +
>> +	pthread_mutex_lock(&internal_list_lock);
>> +	TAILQ_REMOVE(&internals_list, internal, next);
>> +	pthread_mutex_unlock(&internal_list_lock);
>> +
>> +	eth_dev_stop(eth_dev);
>> +
>> +	if ((internal) && (internal->dev_name))
> if "internal" can be NULL, above internal->port_id reference will crash, if can't be NULL no need to check here.
>
>

Hi Ferruh,

I guess if internal is NULL, "internal->dev_name" will not be accessed.
So it may be ok to stay above code.

Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v6 2/2] vhost: Add VHOST PMD
  2016-02-03  7:48                               ` Tetsuya Mukawa
@ 2016-02-03  9:24                                 ` Ferruh Yigit
  2016-02-03  9:35                                   ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Ferruh Yigit @ 2016-02-03  9:24 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying, yuanhan.liu

On Wed, Feb 03, 2016 at 04:48:17PM +0900, Tetsuya Mukawa wrote:
> On 2016/02/03 8:43, Ferruh Yigit wrote:
> > On Tue, Feb 02, 2016 at 08:18:42PM +0900, Tetsuya Mukawa wrote:
> >> +
> >> +	/* find an ethdev entry */
> >> +	eth_dev = rte_eth_dev_allocated(name);
> >> +	if (eth_dev == NULL)
> >> +		return -ENODEV;
> >> +
> >> +	internal = eth_dev->data->dev_private;
> >> +
> >> +	rte_free(vring_states[internal->port_id]);
> >> +	vring_states[internal->port_id] = NULL;
> >> +
> >> +	pthread_mutex_lock(&internal_list_lock);
> >> +	TAILQ_REMOVE(&internals_list, internal, next);
> >> +	pthread_mutex_unlock(&internal_list_lock);
> >> +
> >> +	eth_dev_stop(eth_dev);
> >> +
> >> +	if ((internal) && (internal->dev_name))
> > if "internal" can be NULL, above internal->port_id reference will crash, if can't be NULL no need to check here.
> >
> >
> 
> Hi Ferruh,
Hi Tetsuya,

> 
> I guess if internal is NULL, "internal->dev_name" will not be accessed.
Sure.

> So it may be ok to stay above code.
> 
But I mean 8,9 lines above there is an access to internal->port_id, either internal NULL check should be before that access or removed completely.

Thanks,
ferruh

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v6 2/2] vhost: Add VHOST PMD
  2016-02-03  9:24                                 ` Ferruh Yigit
@ 2016-02-03  9:35                                   ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-03  9:35 UTC (permalink / raw)
  To: dev, yuanhan.liu, ann.zhuangyanying

On 2016/02/03 18:24, Ferruh Yigit wrote:
> On Wed, Feb 03, 2016 at 04:48:17PM +0900, Tetsuya Mukawa wrote:
>> On 2016/02/03 8:43, Ferruh Yigit wrote:
>>> On Tue, Feb 02, 2016 at 08:18:42PM +0900, Tetsuya Mukawa wrote:
>>>> +
>>>> +	/* find an ethdev entry */
>>>> +	eth_dev = rte_eth_dev_allocated(name);
>>>> +	if (eth_dev == NULL)
>>>> +		return -ENODEV;
>>>> +
>>>> +	internal = eth_dev->data->dev_private;
>>>> +
>>>> +	rte_free(vring_states[internal->port_id]);
>>>> +	vring_states[internal->port_id] = NULL;
>>>> +
>>>> +	pthread_mutex_lock(&internal_list_lock);
>>>> +	TAILQ_REMOVE(&internals_list, internal, next);
>>>> +	pthread_mutex_unlock(&internal_list_lock);
>>>> +
>>>> +	eth_dev_stop(eth_dev);
>>>> +
>>>> +	if ((internal) && (internal->dev_name))
>>> if "internal" can be NULL, above internal->port_id reference will crash, if can't be NULL no need to check here.
>>>
>>>
>> Hi Ferruh,
> Hi Tetsuya,
>
>> I guess if internal is NULL, "internal->dev_name" will not be accessed.
> Sure.
>
>> So it may be ok to stay above code.
>>
> But I mean 8,9 lines above there is an access to internal->port_id, either internal NULL check should be before that access or removed completely.

I've got your point.
Thanks!

Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v7 0/2] Add VHOST PMD
  2015-11-24  9:00                         ` [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
                                             ` (3 preceding siblings ...)
  2016-02-02 11:18                           ` [PATCH v6 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
@ 2016-02-04  7:26                           ` Tetsuya Mukawa
  2016-02-04  7:26                           ` [PATCH v7 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
  2016-02-04  7:26                           ` [PATCH v7 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  6 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-04  7:26 UTC (permalink / raw)
  To: dev, ferruh.yigit; +Cc: ann.zhuangyanying, yuanhan.liu

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.

PATCH v7 changes:
 - Remove needless parenthesis.
 - Add release note.
 - Remove needless line wraps.
 - Add null pointer check in vring_state_changed().
 - Free queue memory in eth_queue_release().
 - Fix wrong variable name.
 - Fix error handling code of eth_dev_vhost_create() and
   rte_pmd_vhost_devuninit().
 - Remove needless null checking from rte_pmd_vhost_devinit/devuninit().
 - Use port id to create mac address.
 - Add doxygen style comments in "rte_eth_vhost.h".
 - Fix wrong comment in "mk/rte.app.mk".

PATCH v6 changes:
 - Remove rte_vhost_driver_pmd_callback_registe().
 - Support link status interrupt.
 - Support queue state changed interrupt.
 - Add rte_eth_vhost_get_queue_event().
 - Support numa node detection when new device is connected.

PATCH v5 changes:
 - Rebase on latest master.
 - Fix RX/TX routine to count RX/TX bytes.
 - Fix RX/TX routine not to count as error packets if enqueue/dequeue
   cannot send all packets.
 - Fix if-condition checking for multiqueues.
 - Add "static" to pthread variable.
 - Fix format.
 - Change default behavior not to receive queueing event from driver.
 - Split the patch to separate rte_eth_vhost_portid2vdev().

PATCH v4 changes:
 - Rebase on latest DPDK tree.
 - Fix cording style.
 - Fix code not to invoke multiple messaging handling threads.
 - Fix code to handle vdev parameters correctly.
 - Remove needless cast.
 - Remove needless if-condition before rt_free().

PATCH v3 changes:
 - Rebase on latest matser
 - Specify correct queue_id in RX/TX function.

PATCH v2 changes:
 - Remove a below patch that fixes vhost library.
   The patch was applied as a separate patch.
   - vhost: fix crash with multiqueue enabled
 - Fix typos.
   (Thanks to Thomas, Monjalon)
 - Rebase on latest tree with above bernard's patches.

PATCH v1 changes:
 - Support vhost multiple queues.
 - Rebase on "remove pci driver from vdevs".
 - Optimize RX/TX functions.
 - Fix resource leaks.
 - Fix compile issue.
 - Add patch to fix vhost library.

RFC PATCH v3 changes:
 - Optimize performance.
   In RX/TX functions, change code to access only per core data.
 - Add below API to allow user to use vhost library APIs for a port managed
   by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
    - rte_eth_vhost_portid2vdev()
   To support this functionality, vhost library is also changed.
   Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
 - Add code to support vhost multiple queues.
   Actually, multiple queues functionality is not enabled so far.

RFC PATCH v2 changes:
 - Fix issues reported by checkpatch.pl
   (Thanks to Stephen Hemminger)



Tetsuya Mukawa (2):
  ethdev: Add a new event type to notify a queue state changed event
  vhost: Add VHOST PMD

 config/common_linuxapp                      |   6 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/rel_notes/release_2_3.rst        |   3 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c           | 898 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
 drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
 lib/librte_ether/rte_ethdev.h               |   2 +
 mk/rte.app.mk                               |   6 +
 10 files changed, 1102 insertions(+)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

-- 
2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v7 1/2] ethdev: Add a new event type to notify a queue state changed event
  2015-11-24  9:00                         ` [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
                                             ` (4 preceding siblings ...)
  2016-02-04  7:26                           ` [PATCH v7 0/2] " Tetsuya Mukawa
@ 2016-02-04  7:26                           ` Tetsuya Mukawa
  2016-02-04  7:26                           ` [PATCH v7 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  6 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-04  7:26 UTC (permalink / raw)
  To: dev, ferruh.yigit; +Cc: ann.zhuangyanying, yuanhan.liu

This patch adds a below event type.
 - RTE_ETH_EVENT_QUEUE_STATE_CHANGE
This event is used for notifying a queue state changed event.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 lib/librte_ether/rte_ethdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 8710dd7..2fbf42a 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2661,6 +2661,8 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 enum rte_eth_event_type {
 	RTE_ETH_EVENT_UNKNOWN,  /**< unknown event type */
 	RTE_ETH_EVENT_INTR_LSC, /**< lsc interrupt event */
+	RTE_ETH_EVENT_QUEUE_STATE_CHANGE,
+				/**< queue state changed interrupt */
 	RTE_ETH_EVENT_MAX       /**< max value of this enum */
 };
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 2/2] vhost: Add VHOST PMD
  2015-11-24  9:00                         ` [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
                                             ` (5 preceding siblings ...)
  2016-02-04  7:26                           ` [PATCH v7 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
@ 2016-02-04  7:26                           ` Tetsuya Mukawa
  2016-02-04 11:17                             ` Ferruh Yigit
                                               ` (3 more replies)
  6 siblings, 4 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-04  7:26 UTC (permalink / raw)
  To: dev, ferruh.yigit; +Cc: ann.zhuangyanying, yuanhan.liu

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
 - iface:  The parameter is used to specify a path to connect to a
           virtio-net device.
 - queues: The parameter is used to specify the number of the queues
           virtio-net device has.
           (Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
        <snip>
        -chardev socket,id=chr0,path=/tmp/sock0 \
        -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
        -device virtio-net-pci,netdev=net0,mq=on

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 config/common_linuxapp                      |   6 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/rel_notes/release_2_3.rst        |   3 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c           | 898 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
 drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
 mk/rte.app.mk                               |   6 +
 9 files changed, 1100 insertions(+)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 74bc515..357b557 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -514,6 +514,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 
 #
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 33c9cea..5819cdb 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -47,6 +47,7 @@ Network Interface Controller Drivers
     nfp
     szedata2
     virtio
+    vhost
     vmxnet3
     pcap_ring
 
diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst
index 99de186..21a38c7 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -4,6 +4,9 @@ DPDK Release 2.3
 New Features
 ------------
 
+* **Added vhost PMD.**
+
+  Added virtual PMD that wraps librte_vhost.
 
 Resolved Issues
 ---------------
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 6e4497e..4300b93 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -52,5 +52,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 include $(RTE_SDK)/mk/rte.sharelib.mk
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..f49a69b
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+#   BSD LICENSE
+#
+#   Copyright (c) 2010-2016 Intel Corporation.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..b2305c2
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,898 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2016 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co.,Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+#include <stdbool.h>
+#ifdef RTE_LIBRTE_VHOST_NUMA
+#include <numaif.h>
+#endif
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+#include <rte_spinlock.h>
+
+#include "rte_eth_vhost.h"
+
+#define ETH_VHOST_IFACE_ARG		"iface"
+#define ETH_VHOST_QUEUES_ARG		"queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+	ETH_VHOST_IFACE_ARG,
+	ETH_VHOST_QUEUES_ARG,
+	NULL
+};
+
+static struct ether_addr base_eth_addr = {
+	.addr_bytes = {
+		0x56 /* V */,
+		0x48 /* H */,
+		0x4F /* O */,
+		0x53 /* S */,
+		0x54 /* T */,
+		0x00
+	}
+};
+
+struct vhost_queue {
+	rte_atomic32_t allow_queuing;
+	rte_atomic32_t while_queuing;
+	struct virtio_net *device;
+	struct pmd_internal *internal;
+	struct rte_mempool *mb_pool;
+	uint16_t virtqueue_id;
+	uint64_t rx_pkts;
+	uint64_t tx_pkts;
+	uint64_t missed_pkts;
+	uint64_t rx_bytes;
+	uint64_t tx_bytes;
+};
+
+struct pmd_internal {
+	TAILQ_ENTRY(pmd_internal) next;
+	char *dev_name;
+	char *iface_name;
+	uint8_t port_id;
+
+	volatile uint16_t once;
+};
+
+TAILQ_HEAD(pmd_internal_head, pmd_internal);
+static struct pmd_internal_head internals_list =
+	TAILQ_HEAD_INITIALIZER(internals_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static rte_atomic16_t nb_started_ports;
+static pthread_t session_th;
+
+static struct rte_eth_link pmd_link = {
+		.link_speed = 10000,
+		.link_duplex = ETH_LINK_FULL_DUPLEX,
+		.link_status = 0
+};
+
+struct rte_vhost_vring_state {
+	rte_spinlock_t lock;
+
+	bool cur[RTE_MAX_QUEUES_PER_PORT * 2];
+	bool seen[RTE_MAX_QUEUES_PER_PORT * 2];
+	unsigned int index;
+	unsigned int max_vring;
+};
+
+static struct rte_vhost_vring_state *vring_states[RTE_MAX_ETHPORTS];
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_rx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Dequeue packets from guest TX queue */
+	nb_rx = rte_vhost_dequeue_burst(r->device,
+			r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
+
+	r->rx_pkts += nb_rx;
+
+	for (i = 0; likely(i < nb_rx); i++)
+		r->rx_bytes += bufs[i]->pkt_len;
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_tx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Enqueue packets to guest RX queue */
+	nb_tx = rte_vhost_enqueue_burst(r->device,
+			r->virtqueue_id, bufs, nb_bufs);
+
+	r->tx_pkts += nb_tx;
+	r->missed_pkts += nb_bufs - nb_tx;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		r->tx_bytes += bufs[i]->pkt_len;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		rte_pktmbuf_free(bufs[i]);
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	return 0;
+}
+
+static inline struct pmd_internal *
+find_internal_resource(char *ifname)
+{
+	int found = 0;
+	struct pmd_internal *internal;
+
+	if (ifname == NULL)
+		return NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(internal, &internals_list, next) {
+		if (!strcmp(internal->iface_name, ifname)) {
+			found = 1;
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	if (!found)
+		return NULL;
+
+	return internal;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	internal = find_internal_resource(dev->ifname);
+	if (internal == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid device name\n");
+		return -1;
+	}
+
+	eth_dev = rte_eth_dev_allocated(internal->dev_name);
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+		return -1;
+	}
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
+	}
+
+	dev->flags |= VIRTIO_DEV_RUNNING;
+	dev->priv = eth_dev;
+	eth_dev->data->dev_link.link_status = 1;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+
+	RTE_LOG(INFO, PMD, "New connection established\n");
+
+	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
+
+	return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return;
+	}
+
+	eth_dev = (struct rte_eth_dev *)dev->priv;
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+		return;
+	}
+
+	/* Wait until rx/tx_pkt_burst stops accessing vhost device */
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+
+	eth_dev->data->dev_link.link_status = 0;
+
+	dev->priv = NULL;
+	dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+
+	RTE_LOG(INFO, PMD, "Connection closed\n");
+
+	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
+}
+
+static int
+vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
+{
+	struct rte_vhost_vring_state *state;
+	struct pmd_internal *internal;
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	int newnode, ret;
+#endif
+
+	if (dev == NULL) {
+		RTE_LOG(ERR, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	internal = find_internal_resource(dev->ifname);
+	if (internal == NULL) {
+		RTE_LOG(ERR, PMD, "Invalid interface name: %s\n", dev->ifname);
+		return -1;
+	}
+
+	state = vring_states[internal->port_id];
+	if (!state) {
+		RTE_LOG(ERR, PMD, "Unused port id: %d\n", internal->port_id);
+		return -1;
+	}
+
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	ret  = get_mempolicy(&newnode, NULL, 0, dev,
+			MPOL_F_NODE | MPOL_F_ADDR);
+	if (ret < 0) {
+		RTE_LOG(ERR, PMD, "Unknow numa node\n");
+		return -1;
+	}
+
+	rte_eth_devices[internal->port_id].data->numa_node = newnode;
+#endif
+	rte_spinlock_lock(&state->lock);
+	state->cur[vring] = enable;
+	state->max_vring = RTE_MAX(vring, state->max_vring);
+	rte_spinlock_unlock(&state->lock);
+
+	RTE_LOG(INFO, PMD, "vring%u is %s\n",
+			vring, enable ? "enabled" : "disabled");
+
+	_rte_eth_dev_callback_process(&rte_eth_devices[internal->port_id],
+			RTE_ETH_EVENT_QUEUE_STATE_CHANGE);
+
+	return 0;
+}
+
+int
+rte_eth_vhost_get_queue_event(uint8_t port_id,
+		struct rte_eth_vhost_queue_event *event)
+{
+	struct rte_vhost_vring_state *state;
+	unsigned int i;
+	int idx;
+
+	if (port_id >= RTE_MAX_ETHPORTS) {
+		RTE_LOG(ERR, PMD, "Invalid port id\n");
+		return -1;
+	}
+
+	state = vring_states[port_id];
+	if (!state) {
+		RTE_LOG(ERR, PMD, "Unused port\n");
+		return -1;
+	}
+
+	rte_spinlock_lock(&state->lock);
+	for (i = 0; i <= state->max_vring; i++) {
+		idx = state->index++ % (state->max_vring + 1);
+
+		if (state->cur[idx] != state->seen[idx]) {
+			state->seen[idx] = state->cur[idx];
+			event->queue_id = idx / 2;
+			event->rx = idx & 1;
+			event->enable = state->cur[idx];
+			rte_spinlock_unlock(&state->lock);
+			return 0;
+		}
+	}
+	rte_spinlock_unlock(&state->lock);
+
+	return -1;
+}
+
+static void *
+vhost_driver_session(void *param __rte_unused)
+{
+	static struct virtio_net_device_ops vhost_ops;
+
+	/* set vhost arguments */
+	vhost_ops.new_device = new_device;
+	vhost_ops.destroy_device = destroy_device;
+	vhost_ops.vring_state_changed = vring_state_changed;
+	if (rte_vhost_driver_callback_register(&vhost_ops) < 0)
+		rte_panic("Can't register callbacks\n");
+
+	/* start event handling */
+	rte_vhost_driver_session_start();
+
+	pthread_exit(0);
+}
+
+static void
+vhost_driver_session_start(void)
+{
+	int ret;
+
+	ret = pthread_create(&session_th,
+			NULL, vhost_driver_session, NULL);
+	if (ret)
+		rte_panic("Can't create a thread\n");
+}
+
+static void
+vhost_driver_session_stop(void)
+{
+	int ret;
+
+	ret = pthread_cancel(session_th);
+	if (ret)
+		rte_panic("Can't cancel the thread\n");
+
+	ret = pthread_join(session_th, NULL);
+	if (ret)
+		rte_panic("Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	int ret;
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+		ret = rte_vhost_driver_register(internal->iface_name);
+		if (ret)
+			return ret;
+	}
+
+	/* We need only one message handling thread */
+	if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
+		vhost_driver_session_start();
+
+	return 0;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 1, 0))
+		rte_vhost_driver_unregister(internal->iface_name);
+
+	if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
+		vhost_driver_session_stop();
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct vhost_queue *vq;
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->mb_pool = mb_pool;
+	vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
+	dev->data->rx_queues[rx_queue_id] = vq;
+	dev->data->rx_queues[rx_queue_id] = vq;
+	return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct vhost_queue *vq;
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
+	dev->data->tx_queues[tx_queue_id] = vq;
+	dev->data->tx_queues[tx_queue_id] = vq;
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+	     struct rte_eth_dev_info *dev_info)
+{
+	dev_info->driver_name = drivername;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = (uint32_t)-1;
+	dev_info->max_rx_queues = (uint16_t)dev->data->nb_rx_queues;
+	dev_info->max_tx_queues = (uint16_t)dev->data->nb_tx_queues;
+	dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	unsigned i;
+	unsigned long rx_total = 0, tx_total = 0, tx_missed_total = 0;
+	unsigned long rx_total_bytes = 0, tx_total_bytes = 0;
+	struct vhost_queue *vq;
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+	     i < dev->data->nb_rx_queues; i++) {
+		if (dev->data->rx_queues[i] == NULL)
+			continue;
+		vq = dev->data->rx_queues[i];
+		stats->q_ipackets[i] = vq->rx_pkts;
+		rx_total += stats->q_ipackets[i];
+
+		stats->q_ibytes[i] = vq->rx_bytes;
+		rx_total_bytes += stats->q_ibytes[i];
+	}
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+	     i < dev->data->nb_tx_queues; i++) {
+		if (dev->data->tx_queues[i] == NULL)
+			continue;
+		vq = dev->data->tx_queues[i];
+		stats->q_opackets[i] = vq->tx_pkts;
+		tx_missed_total += vq->missed_pkts;
+		tx_total += stats->q_opackets[i];
+
+		stats->q_obytes[i] = vq->tx_bytes;
+		tx_total_bytes += stats->q_obytes[i];
+	}
+
+	stats->ipackets = rx_total;
+	stats->opackets = tx_total;
+	stats->imissed = tx_missed_total;
+	stats->ibytes = rx_total_bytes;
+	stats->obytes = tx_total_bytes;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct vhost_queue *vq;
+	unsigned i;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		if (dev->data->rx_queues[i] == NULL)
+			continue;
+		vq = dev->data->rx_queues[i];
+		vq->rx_pkts = 0;
+		vq->rx_bytes = 0;
+	}
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		if (dev->data->tx_queues[i] == NULL)
+			continue;
+		vq = dev->data->tx_queues[i];
+		vq->tx_pkts = 0;
+		vq->tx_bytes = 0;
+		vq->missed_pkts = 0;
+	}
+}
+
+static void
+eth_queue_release(void *q)
+{
+	rte_free(q);
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+/**
+ * Disable features in feature_mask. Returns 0 on success.
+ */
+int
+rte_eth_vhost_feature_disable(uint64_t feature_mask)
+{
+	return rte_vhost_feature_disable(feature_mask);
+}
+
+/**
+ * Enable features in feature_mask. Returns 0 on success.
+ */
+int
+rte_eth_vhost_feature_enable(uint64_t feature_mask)
+{
+	return rte_vhost_feature_enable(feature_mask);
+}
+
+/* Returns currently supported vhost features */
+uint64_t
+rte_eth_vhost_feature_get(void)
+{
+	return rte_vhost_feature_get();
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, char *iface_name, int16_t queues,
+		     const unsigned numa_node)
+{
+	struct rte_eth_dev_data *data = NULL;
+	struct pmd_internal *internal = NULL;
+	struct rte_eth_dev *eth_dev = NULL;
+	struct ether_addr *eth_addr = NULL;
+	struct rte_vhost_vring_state *vring_state = NULL;
+
+	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+		numa_node);
+
+	/* now do all data allocation - for eth_dev structure, dummy pci driver
+	 * and internal (private) data
+	 */
+	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+	if (data == NULL)
+		goto error;
+
+	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+	if (internal == NULL)
+		goto error;
+
+	/* reserve an ethdev entry */
+	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+	if (eth_dev == NULL)
+		goto error;
+
+	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+	if (eth_addr == NULL)
+		goto error;
+	*eth_addr = base_eth_addr;
+	eth_addr->addr_bytes[5] = eth_dev->data->port_id;
+
+	vring_state = rte_zmalloc_socket(name,
+			sizeof(*vring_state), 0, numa_node);
+	if (vring_state == NULL)
+		goto error;
+
+	TAILQ_INIT(&eth_dev->link_intr_cbs);
+
+	/* now put it all together
+	 * - store queue data in internal,
+	 * - store numa_node info in ethdev data
+	 * - point eth_dev_data to internals
+	 * - and point eth_dev structure to new eth_dev_data structure
+	 */
+	internal->dev_name = strdup(name);
+	if (internal->dev_name == NULL)
+		goto error;
+	internal->iface_name = strdup(iface_name);
+	if (internal->iface_name == NULL)
+		goto error;
+	internal->port_id = eth_dev->data->port_id;
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internals_list, internal, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	rte_spinlock_init(&vring_state->lock);
+	vring_states[eth_dev->data->port_id] = vring_state;
+
+	data->dev_private = internal;
+	data->port_id = eth_dev->data->port_id;
+	memmove(data->name, eth_dev->data->name, sizeof(data->name));
+	data->nb_rx_queues = queues;
+	data->nb_tx_queues = queues;
+	data->dev_link = pmd_link;
+	data->mac_addrs = eth_addr;
+
+	/* We'll replace the 'data' originally allocated by eth_dev. So the
+	 * vhost PMD resources won't be shared between multi processes.
+	 */
+	eth_dev->data = data;
+	eth_dev->dev_ops = &ops;
+	eth_dev->driver = NULL;
+	data->dev_flags =
+		RTE_ETH_DEV_DETACHABLE | RTE_ETH_DEV_INTR_LSC;
+	data->kdrv = RTE_KDRV_NONE;
+	data->drv_name = internal->dev_name;
+	data->numa_node = numa_node;
+
+	/* finally assign rx and tx ops */
+	eth_dev->rx_pkt_burst = eth_vhost_rx;
+	eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+	return data->port_id;
+
+error:
+	if (internal && internal->dev_name)
+		free(internal->dev_name);
+	rte_free(vring_state);
+	rte_free(eth_addr);
+	if (eth_dev)
+		rte_eth_dev_release_port(eth_dev);
+	rte_free(internal);
+	rte_free(data);
+
+	return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	const char **iface_name = extra_args;
+
+	if (value == NULL)
+		return -1;
+
+	*iface_name = value;
+
+	return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *q = extra_args;
+
+	if ((value == NULL) || (extra_args == NULL))
+		return -EINVAL;
+
+	*q = (uint16_t)strtoul(value, NULL, 0);
+	if ((*q == USHRT_MAX) && (errno == ERANGE))
+		return -1;
+
+	if (*q > RTE_MAX_QUEUES_PER_PORT)
+		return -1;
+
+	return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
+	char *iface_name;
+	uint16_t queues;
+
+	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+		return -1;
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+					 &open_iface, &iface_name);
+		if (ret < 0)
+			goto out_free;
+	} else {
+		ret = -1;
+		goto out_free;
+	}
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+					 &open_queues, &queues);
+		if (ret < 0)
+			goto out_free;
+
+	} else
+		queues = 1;
+
+	eth_dev_vhost_create(name, iface_name, queues, rte_socket_id());
+
+out_free:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internal *internal;
+	unsigned int i;
+
+	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+	/* find an ethdev entry */
+	eth_dev = rte_eth_dev_allocated(name);
+	if (eth_dev == NULL)
+		return -ENODEV;
+
+	internal = eth_dev->data->dev_private;
+	if (internal == NULL)
+		return -ENODEV;
+
+	rte_free(vring_states[internal->port_id]);
+	vring_states[internal->port_id] = NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_REMOVE(&internals_list, internal, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	eth_dev_stop(eth_dev);
+
+	if (internal->dev_name)
+		free(internal->dev_name);
+	if (internal->iface_name)
+		free(internal->iface_name);
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++)
+		rte_free(eth_dev->data->rx_queues[i]);
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
+		rte_free(eth_dev->data->tx_queues[i]);
+
+	rte_free(eth_dev->data->mac_addrs);
+	rte_free(eth_dev->data);
+
+	rte_free(internal);
+
+	rte_eth_dev_release_port(eth_dev);
+
+	return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+	.name = "eth_vhost",
+	.type = PMD_VDEV,
+	.init = rte_pmd_vhost_devinit,
+	.uninit = rte_pmd_vhost_devuninit,
+};
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..e78cb74
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,109 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co., Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_VHOST_H_
+#define _RTE_ETH_VHOST_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+#include <stdbool.h>
+
+#include <rte_virtio_net.h>
+
+/**
+ * Disable features in feature_mask.
+ *
+ * @param feature_mask
+ *  Vhost features defined in "linux/virtio_net.h".
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_feature_disable(uint64_t feature_mask);
+
+/**
+ * Enable features in feature_mask.
+ *
+ * @param feature_mask
+ *  Vhost features defined in "linux/virtio_net.h".
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_feature_enable(uint64_t feature_mask);
+
+/**
+ * Returns currently supported vhost features.
+ *
+ * @return
+ *  Vhost features defined in "linux/virtio_net.h".
+ */
+uint64_t rte_eth_vhost_feature_get(void);
+
+/*
+ * Event description.
+ */
+struct rte_eth_vhost_queue_event {
+	uint16_t queue_id;
+	bool rx;
+	bool enable;
+};
+
+/**
+ * Get queue events from specified port.
+ * If a callback for below event is registered by
+ * rte_eth_dev_callback_register(), this function will describe what was
+ * changed.
+ *  - RTE_ETH_EVENT_QUEUE_STATE_CHANGE
+ * Multiple events may cause only one callback kicking, so call this function
+ * while returning 0.
+ *
+ * @param port_id
+ *  Port id.
+ * @param event
+ *  Pointer to a rte_eth_vhost_queue_event structure.
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_get_queue_event(uint8_t port_id,
+		struct rte_eth_vhost_queue_event *event);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..3280b0d
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,11 @@
+DPDK_2.3 {
+
+	global:
+
+	rte_eth_vhost_feature_disable;
+	rte_eth_vhost_feature_enable;
+	rte_eth_vhost_feature_get;
+	rte_eth_vhost_get_queue_event;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 8ecab41..fca210d 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -159,6 +159,12 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_QAT)        += -lrte_pmd_qat
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -lrte_pmd_aesni_mb
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
+
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
 
 endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 2/2] vhost: Add VHOST PMD
  2016-02-04  7:26                           ` [PATCH v7 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
@ 2016-02-04 11:17                             ` Ferruh Yigit
  2016-02-05  6:28                               ` Tetsuya Mukawa
  2016-02-05 11:28                             ` [PATCH v8 0/2] " Tetsuya Mukawa
                                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 200+ messages in thread
From: Ferruh Yigit @ 2016-02-04 11:17 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying, yuanhan.liu

On Thu, Feb 04, 2016 at 04:26:31PM +0900, Tetsuya Mukawa wrote:

Hi Tetsuya,

> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> The vhost messages will be handled only when a port is started. So start
> a port first, then invoke QEMU.
> 
> The PMD has 2 parameters.
>  - iface:  The parameter is used to specify a path to connect to a
>            virtio-net device.
>  - queues: The parameter is used to specify the number of the queues
>            virtio-net device has.
>            (Default: 1)
> 
> Here is an example.
> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
> 
> To connect above testpmd, here is qemu command example.
> 
> $ qemu-system-x86_64 \
>         <snip>
>         -chardev socket,id=chr0,path=/tmp/sock0 \
>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
>         -device virtio-net-pci,netdev=net0,mq=on
> 
> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>

Please find some more comments, mostly minor nits,

please feel free to add my ack for next version of this patch:
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

<...>
> diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
> new file mode 100644
> index 0000000..b2305c2
> --- /dev/null
> +++ b/drivers/net/vhost/rte_eth_vhost.c
<...>
> +
> +struct pmd_internal {
> +	TAILQ_ENTRY(pmd_internal) next;
> +	char *dev_name;
> +	char *iface_name;
> +	uint8_t port_id;

You can also get rid of port_id too, if you keep list of rte_eth_dev.
But this is not so important, keep as it is if you want to.

> +
> +	volatile uint16_t once;
> +};
> +

<...>
> +
> +static int
> +new_device(struct virtio_net *dev)
> +{
<...>
> +
> +	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> +		vq = eth_dev->data->rx_queues[i];
> +		if (vq == NULL)

can vq be NULL? It is allocated in rx/tx_queue_setup() and there is already a NULL check there?

> +			continue;
> +		vq->device = dev;
> +		vq->internal = internal;
> +		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
> +	}
> +	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
> +		vq = eth_dev->data->tx_queues[i];
> +		if (vq == NULL)
> +			continue;
> +		vq->device = dev;
> +		vq->internal = internal;
> +		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
> +	}
> +
> +	dev->flags |= VIRTIO_DEV_RUNNING;
> +	dev->priv = eth_dev;
> +	eth_dev->data->dev_link.link_status = 1;
> +
> +	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> +		vq = eth_dev->data->rx_queues[i];
> +		if (vq == NULL)
> +			continue;
> +		rte_atomic32_set(&vq->allow_queuing, 1);
> +	}
> +	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
> +		vq = eth_dev->data->tx_queues[i];
> +		if (vq == NULL)
> +			continue;
> +		rte_atomic32_set(&vq->allow_queuing, 1);
> +	}
> +
> +	RTE_LOG(INFO, PMD, "New connection established\n");
> +
> +	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
> +
> +	return 0;
> +}
> +

<...>
> +
> +static int
> +vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
> +{
> +	struct rte_vhost_vring_state *state;
> +	struct pmd_internal *internal;
> +#ifdef RTE_LIBRTE_VHOST_NUMA
> +	int newnode, ret;
> +#endif
> +
> +	if (dev == NULL) {
> +		RTE_LOG(ERR, PMD, "Invalid argument\n");
> +		return -1;
> +	}
> +
> +	internal = find_internal_resource(dev->ifname);
> +	if (internal == NULL) {
> +		RTE_LOG(ERR, PMD, "Invalid interface name: %s\n", dev->ifname);
> +		return -1;
> +	}
> +
> +	state = vring_states[internal->port_id];
> +	if (!state) {
> +		RTE_LOG(ERR, PMD, "Unused port id: %d\n", internal->port_id);
> +		return -1;
> +	}
> +
> +#ifdef RTE_LIBRTE_VHOST_NUMA
> +	ret  = get_mempolicy(&newnode, NULL, 0, dev,
> +			MPOL_F_NODE | MPOL_F_ADDR);
> +	if (ret < 0) {
> +		RTE_LOG(ERR, PMD, "Unknow numa node\n");
> +		return -1;
> +	}
> +
> +	rte_eth_devices[internal->port_id].data->numa_node = newnode;

If you prefer to keep the list of device instead of list of internal data, can escape accessing global device array

> +#endif
> +	rte_spinlock_lock(&state->lock);
> +	state->cur[vring] = enable;
> +	state->max_vring = RTE_MAX(vring, state->max_vring);
> +	rte_spinlock_unlock(&state->lock);
> +
> +	RTE_LOG(INFO, PMD, "vring%u is %s\n",
> +			vring, enable ? "enabled" : "disabled");
> +
> +	_rte_eth_dev_callback_process(&rte_eth_devices[internal->port_id],
> +			RTE_ETH_EVENT_QUEUE_STATE_CHANGE);
> +
> +	return 0;
> +}
> +
<...>
> +
> +static void *
> +vhost_driver_session(void *param __rte_unused)
> +{
> +	static struct virtio_net_device_ops vhost_ops;
> +
> +	/* set vhost arguments */
> +	vhost_ops.new_device = new_device;
> +	vhost_ops.destroy_device = destroy_device;
> +	vhost_ops.vring_state_changed = vring_state_changed;
> +	if (rte_vhost_driver_callback_register(&vhost_ops) < 0)
> +		rte_panic("Can't register callbacks\n");
> +
> +	/* start event handling */
> +	rte_vhost_driver_session_start();
> +
> +	pthread_exit(0);

Do we need pthread_exit(), I think just a "return" does same thing for this context.

> +}
> +
> +static void
> +vhost_driver_session_start(void)
> +{
> +	int ret;
> +
> +	ret = pthread_create(&session_th,
> +			NULL, vhost_driver_session, NULL);
> +	if (ret)
> +		rte_panic("Can't create a thread\n");

rte_panic() terminates the process, since we are in a driver, it can be good to return some kind of error and application to decide to terminate or not
application can be using multiple PMDs, and may prefer to not terminate if one PMD is not working.

> +}
> +
<...>
> +
> +static int
> +eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
> +		   uint16_t nb_rx_desc __rte_unused,
> +		   unsigned int socket_id,
> +		   const struct rte_eth_rxconf *rx_conf __rte_unused,
> +		   struct rte_mempool *mb_pool)
> +{
> +	struct vhost_queue *vq;
> +
> +	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
> +			RTE_CACHE_LINE_SIZE, socket_id);
> +	if (vq == NULL) {
> +		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
> +		return -ENOMEM;
> +	}
> +
> +	vq->mb_pool = mb_pool;
> +	vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
> +	dev->data->rx_queues[rx_queue_id] = vq;
> +	dev->data->rx_queues[rx_queue_id] = vq;

duplicated line?

> +	return 0;
> +}
> +
> +static int
> +eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
> +		   uint16_t nb_tx_desc __rte_unused,
> +		   unsigned int socket_id,
> +		   const struct rte_eth_txconf *tx_conf __rte_unused)
> +{
> +	struct vhost_queue *vq;
> +
> +	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
> +			RTE_CACHE_LINE_SIZE, socket_id);
> +	if (vq == NULL) {
> +		RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
> +		return -ENOMEM;
> +	}
> +
> +	vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
> +	dev->data->tx_queues[tx_queue_id] = vq;
> +	dev->data->tx_queues[tx_queue_id] = vq;

duplicated line?

> +	return 0;
> +}
> +
> +static void
> +eth_dev_info(struct rte_eth_dev *dev,
> +	     struct rte_eth_dev_info *dev_info)
> +{
> +	dev_info->driver_name = drivername;
> +	dev_info->max_mac_addrs = 1;
> +	dev_info->max_rx_pktlen = (uint32_t)-1;
> +	dev_info->max_rx_queues = (uint16_t)dev->data->nb_rx_queues;
> +	dev_info->max_tx_queues = (uint16_t)dev->data->nb_tx_queues;

no need to (uint16_t) cast here

> +	dev_info->min_rx_bufsize = 0;
> +}
> +
> +static void
> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
> +{
> +	unsigned i;
> +	unsigned long rx_total = 0, tx_total = 0, tx_missed_total = 0;
> +	unsigned long rx_total_bytes = 0, tx_total_bytes = 0;
> +	struct vhost_queue *vq;
> +
> +	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
> +	     i < dev->data->nb_rx_queues; i++) {

syntax: I think guideline suggest two tabs here, instead of tab and space mixture.

> +		if (dev->data->rx_queues[i] == NULL)
> +			continue;
> +		vq = dev->data->rx_queues[i];
> +		stats->q_ipackets[i] = vq->rx_pkts;
> +		rx_total += stats->q_ipackets[i];
> +
> +		stats->q_ibytes[i] = vq->rx_bytes;
> +		rx_total_bytes += stats->q_ibytes[i];
> +	}
> +
> +	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
> +	     i < dev->data->nb_tx_queues; i++) {
> +		if (dev->data->tx_queues[i] == NULL)

more queue NULL check here, I am not sure if these are necessary

> +			continue;
> +		vq = dev->data->tx_queues[i];
> +		stats->q_opackets[i] = vq->tx_pkts;
> +		tx_missed_total += vq->missed_pkts;
> +		tx_total += stats->q_opackets[i];
> +
> +		stats->q_obytes[i] = vq->tx_bytes;
> +		tx_total_bytes += stats->q_obytes[i];
> +	}
> +
> +	stats->ipackets = rx_total;
> +	stats->opackets = tx_total;
> +	stats->imissed = tx_missed_total;
> +	stats->ibytes = rx_total_bytes;
> +	stats->obytes = tx_total_bytes;
> +}
> +
<...>
> +
> +static inline int
> +open_queues(const char *key __rte_unused, const char *value, void *extra_args)
> +{
> +	uint16_t *q = extra_args;
> +
> +	if ((value == NULL) || (extra_args == NULL))

syntax: extra parenthesis can be removed

> +		return -EINVAL;
> +
> +	*q = (uint16_t)strtoul(value, NULL, 0);
> +	if ((*q == USHRT_MAX) && (errno == ERANGE))
same here
> +		return -1;
> +
> +	if (*q > RTE_MAX_QUEUES_PER_PORT)
> +		return -1;
> +
> +	return 0;
> +}
> +
<...>
> +
> +static int
> +rte_pmd_vhost_devuninit(const char *name)
> +{
> +	struct rte_eth_dev *eth_dev = NULL;
> +	struct pmd_internal *internal;
> +	unsigned int i;
> +
> +	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
> +
> +	/* find an ethdev entry */
> +	eth_dev = rte_eth_dev_allocated(name);
> +	if (eth_dev == NULL)
> +		return -ENODEV;
> +
> +	internal = eth_dev->data->dev_private;
> +	if (internal == NULL)
> +		return -ENODEV;
> +
> +	rte_free(vring_states[internal->port_id]);
> +	vring_states[internal->port_id] = NULL;
> +
> +	pthread_mutex_lock(&internal_list_lock);
> +	TAILQ_REMOVE(&internals_list, internal, next);
> +	pthread_mutex_unlock(&internal_list_lock);
> +
> +	eth_dev_stop(eth_dev);
> +
> +	if (internal->dev_name)

no need to NULL check for free()

> +		free(internal->dev_name);
> +	if (internal->iface_name)
> +		free(internal->iface_name);
> +
> +	for (i = 0; i < eth_dev->data->nb_rx_queues; i++)
> +		rte_free(eth_dev->data->rx_queues[i]);
> +	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
> +		rte_free(eth_dev->data->tx_queues[i]);
> +
> +	rte_free(eth_dev->data->mac_addrs);
> +	rte_free(eth_dev->data);
> +
> +	rte_free(internal);
> +
> +	rte_eth_dev_release_port(eth_dev);
> +
> +	return 0;
> +}
> +
<...>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 2/2] vhost: Add VHOST PMD
  2016-02-04 11:17                             ` Ferruh Yigit
@ 2016-02-05  6:28                               ` Tetsuya Mukawa
  2016-02-05  6:35                                 ` Yuanhan Liu
  2016-02-08  9:42                                 ` Ferruh Yigit
  0 siblings, 2 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-05  6:28 UTC (permalink / raw)
  To: dev, Ferruh Yigit; +Cc: ann.zhuangyanying, yuanhan.liu

On 2016/02/04 20:17, Ferruh Yigit wrote:
> On Thu, Feb 04, 2016 at 04:26:31PM +0900, Tetsuya Mukawa wrote:
>
> Hi Tetsuya,
>
>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
>> The vhost messages will be handled only when a port is started. So start
>> a port first, then invoke QEMU.
>>
>> The PMD has 2 parameters.
>>  - iface:  The parameter is used to specify a path to connect to a
>>            virtio-net device.
>>  - queues: The parameter is used to specify the number of the queues
>>            virtio-net device has.
>>            (Default: 1)
>>
>> Here is an example.
>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
>>
>> To connect above testpmd, here is qemu command example.
>>
>> $ qemu-system-x86_64 \
>>         <snip>
>>         -chardev socket,id=chr0,path=/tmp/sock0 \
>>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
>>         -device virtio-net-pci,netdev=net0,mq=on
>>
>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> Please find some more comments, mostly minor nits,
>
> please feel free to add my ack for next version of this patch:
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
>
> <...>
>> diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
>> new file mode 100644
>> index 0000000..b2305c2
>> --- /dev/null
>> +++ b/drivers/net/vhost/rte_eth_vhost.c
> <...>
>> +
>> +struct pmd_internal {
>> +	TAILQ_ENTRY(pmd_internal) next;
>> +	char *dev_name;
>> +	char *iface_name;
>> +	uint8_t port_id;
> You can also get rid of port_id too, if you keep list of rte_eth_dev.
> But this is not so important, keep as it is if you want to.

Thank you so much for checking and good suggestions.
I will follow your comments without below.

>> +
>> +	volatile uint16_t once;
>> +};
>> +
> <...>
>> +
>> +static int
>> +new_device(struct virtio_net *dev)
>> +{
> <...>
>> +
>> +	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
>> +		vq = eth_dev->data->rx_queues[i];
>> +		if (vq == NULL)
> can vq be NULL? It is allocated in rx/tx_queue_setup() and there is already a NULL check there?

I doubt user may not setup all queues.
In this case, we need above checking.

>
>> +			continue;
>> +		vq->device = dev;
>> +		vq->internal = internal;
>> +		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
>> +	}
>> +	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
>> +		vq = eth_dev->data->tx_queues[i];
>> +		if (vq == NULL)
>> +			continue;

Same here.

>> +		vq->device = dev;
>> +		vq->internal = internal;
>> +		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
>> +	}
>> +
>> +	dev->flags |= VIRTIO_DEV_RUNNING;
>> +	dev->priv = eth_dev;
>> +	eth_dev->data->dev_link.link_status = 1;
>> +
>> +	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
>> +		vq = eth_dev->data->rx_queues[i];
>> +		if (vq == NULL)
>> +			continue;

But we can remove this.

>> +		rte_atomic32_set(&vq->allow_queuing, 1);
>> +	}
>> +	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
>> +		vq = eth_dev->data->tx_queues[i];
>> +		if (vq == NULL)
>> +			continue;

And this.

> <...>
>> +		if (dev->data->rx_queues[i] == NULL)
>> +			continue;
>> +		vq = dev->data->rx_queues[i];
>> +		stats->q_ipackets[i] = vq->rx_pkts;
>> +		rx_total += stats->q_ipackets[i];
>> +
>> +		stats->q_ibytes[i] = vq->rx_bytes;
>> +		rx_total_bytes += stats->q_ibytes[i];
>> +	}
>> +
>> +	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
>> +	     i < dev->data->nb_tx_queues; i++) {
>> +		if (dev->data->tx_queues[i] == NULL)
> more queue NULL check here, I am not sure if these are necessary

Same here, in the case user doesn't setup all queues, I will leave this.

Anyway, I will fix below code.
 - Manage ether devices list instead of internal structures list.
 - Remove needless NULL checking.
 - Replace "pthread_exit" to "return NULL".
 - Replace rte_panic to RTE_LOG, also add error handling.
 - Remove duplicated lines.
 - Remove needless casting.
 - Follow coding style.
 - Remove needless parenthesis.

And leave below.
 - some NULL checking before accessing a queue.

Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 2/2] vhost: Add VHOST PMD
  2016-02-05  6:28                               ` Tetsuya Mukawa
@ 2016-02-05  6:35                                 ` Yuanhan Liu
  2016-02-05  7:10                                   ` Tetsuya Mukawa
  2016-02-08  9:42                                 ` Ferruh Yigit
  1 sibling, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2016-02-05  6:35 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying, yuanhan.liu

On Fri, Feb 05, 2016 at 03:28:37PM +0900, Tetsuya Mukawa wrote:
> On 2016/02/04 20:17, Ferruh Yigit wrote:
> > On Thu, Feb 04, 2016 at 04:26:31PM +0900, Tetsuya Mukawa wrote:
> >
> > Hi Tetsuya,
> >
> >> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> >> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> >> The vhost messages will be handled only when a port is started. So start
> >> a port first, then invoke QEMU.
> >>
> >> The PMD has 2 parameters.
> >>  - iface:  The parameter is used to specify a path to connect to a
> >>            virtio-net device.
> >>  - queues: The parameter is used to specify the number of the queues
> >>            virtio-net device has.
> >>            (Default: 1)
> >>
> >> Here is an example.
> >> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
> >>
> >> To connect above testpmd, here is qemu command example.
> >>
> >> $ qemu-system-x86_64 \
> >>         <snip>
> >>         -chardev socket,id=chr0,path=/tmp/sock0 \
> >>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
> >>         -device virtio-net-pci,netdev=net0,mq=on
> >>
> >> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> > Please find some more comments, mostly minor nits,
> >
> > please feel free to add my ack for next version of this patch:
> > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> >
> > <...>
> >> diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
> >> new file mode 100644
> >> index 0000000..b2305c2
> >> --- /dev/null
> >> +++ b/drivers/net/vhost/rte_eth_vhost.c
> > <...>
> >> +
> >> +struct pmd_internal {
> >> +	TAILQ_ENTRY(pmd_internal) next;
> >> +	char *dev_name;
> >> +	char *iface_name;
> >> +	uint8_t port_id;
> > You can also get rid of port_id too, if you keep list of rte_eth_dev.
> > But this is not so important, keep as it is if you want to.
> 
> Thank you so much for checking and good suggestions.
> I will follow your comments without below.

You might need update the MAINTAINERS file as well.

	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 2/2] vhost: Add VHOST PMD
  2016-02-05  6:35                                 ` Yuanhan Liu
@ 2016-02-05  7:10                                   ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-05  7:10 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, ann.zhuangyanying, yuanhan.liu

On 2016/02/05 15:35, Yuanhan Liu wrote:
> On Fri, Feb 05, 2016 at 03:28:37PM +0900, Tetsuya Mukawa wrote:
>> On 2016/02/04 20:17, Ferruh Yigit wrote:
>>> On Thu, Feb 04, 2016 at 04:26:31PM +0900, Tetsuya Mukawa wrote:
>>>
>>> Hi Tetsuya,
>>>
>>>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>>>> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
>>>> The vhost messages will be handled only when a port is started. So start
>>>> a port first, then invoke QEMU.
>>>>
>>>> The PMD has 2 parameters.
>>>>  - iface:  The parameter is used to specify a path to connect to a
>>>>            virtio-net device.
>>>>  - queues: The parameter is used to specify the number of the queues
>>>>            virtio-net device has.
>>>>            (Default: 1)
>>>>
>>>> Here is an example.
>>>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
>>>>
>>>> To connect above testpmd, here is qemu command example.
>>>>
>>>> $ qemu-system-x86_64 \
>>>>         <snip>
>>>>         -chardev socket,id=chr0,path=/tmp/sock0 \
>>>>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
>>>>         -device virtio-net-pci,netdev=net0,mq=on
>>>>
>>>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
>>> Please find some more comments, mostly minor nits,
>>>
>>> please feel free to add my ack for next version of this patch:
>>> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
>>>
>>> <...>
>>>> diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
>>>> new file mode 100644
>>>> index 0000000..b2305c2
>>>> --- /dev/null
>>>> +++ b/drivers/net/vhost/rte_eth_vhost.c
>>> <...>
>>>> +
>>>> +struct pmd_internal {
>>>> +	TAILQ_ENTRY(pmd_internal) next;
>>>> +	char *dev_name;
>>>> +	char *iface_name;
>>>> +	uint8_t port_id;
>>> You can also get rid of port_id too, if you keep list of rte_eth_dev.
>>> But this is not so important, keep as it is if you want to.
>> Thank you so much for checking and good suggestions.
>> I will follow your comments without below.
> You might need update the MAINTAINERS file as well.
>
> 	--yliu

Sure thanks!

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v8 0/2] Add VHOST PMD
  2016-02-04  7:26                           ` [PATCH v7 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  2016-02-04 11:17                             ` Ferruh Yigit
@ 2016-02-05 11:28                             ` Tetsuya Mukawa
  2016-02-05 11:28                             ` [PATCH v8 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
  2016-02-05 11:28                             ` [PATCH v8 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  3 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-05 11:28 UTC (permalink / raw)
  To: dev, ferruh.yigit; +Cc: ann.zhuangyanying, yuanhan.liu

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.

PATCH v8 changes:
 - Manage ether devices list instead of internal structures list.
 - Remove needless NULL checking.
 - Replace "pthread_exit" to "return NULL".
 - Replace rte_panic to RTE_LOG, also add error handling.
 - Remove duplicated lines.
 - Remove needless casting.
 - Follow coding style.
 - Remove needless parenthesis.

PATCH v7 changes:
 - Remove needless parenthesis.
 - Add release note.
 - Remove needless line wraps.
 - Add null pointer check in vring_state_changed().
 - Free queue memory in eth_queue_release().
 - Fix wrong variable name.
 - Fix error handling code of eth_dev_vhost_create() and
   rte_pmd_vhost_devuninit().
 - Remove needless null checking from rte_pmd_vhost_devinit/devuninit().
 - Use port id to create mac address.
 - Add doxygen style comments in "rte_eth_vhost.h".
 - Fix wrong comment in "mk/rte.app.mk".

PATCH v6 changes:
 - Remove rte_vhost_driver_pmd_callback_registe().
 - Support link status interrupt.
 - Support queue state changed interrupt.
 - Add rte_eth_vhost_get_queue_event().
 - Support numa node detection when new device is connected.

PATCH v5 changes:
 - Rebase on latest master.
 - Fix RX/TX routine to count RX/TX bytes.
 - Fix RX/TX routine not to count as error packets if enqueue/dequeue
   cannot send all packets.
 - Fix if-condition checking for multiqueues.
 - Add "static" to pthread variable.
 - Fix format.
 - Change default behavior not to receive queueing event from driver.
 - Split the patch to separate rte_eth_vhost_portid2vdev().

PATCH v4 changes:
 - Rebase on latest DPDK tree.
 - Fix cording style.
 - Fix code not to invoke multiple messaging handling threads.
 - Fix code to handle vdev parameters correctly.
 - Remove needless cast.
 - Remove needless if-condition before rt_free().

PATCH v3 changes:
 - Rebase on latest matser
 - Specify correct queue_id in RX/TX function.

PATCH v2 changes:
 - Remove a below patch that fixes vhost library.
   The patch was applied as a separate patch.
   - vhost: fix crash with multiqueue enabled
 - Fix typos.
   (Thanks to Thomas, Monjalon)
 - Rebase on latest tree with above bernard's patches.

PATCH v1 changes:
 - Support vhost multiple queues.
 - Rebase on "remove pci driver from vdevs".
 - Optimize RX/TX functions.
 - Fix resource leaks.
 - Fix compile issue.
 - Add patch to fix vhost library.

RFC PATCH v3 changes:
 - Optimize performance.
   In RX/TX functions, change code to access only per core data.
 - Add below API to allow user to use vhost library APIs for a port managed
   by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
    - rte_eth_vhost_portid2vdev()
   To support this functionality, vhost library is also changed.
   Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
 - Add code to support vhost multiple queues.
   Actually, multiple queues functionality is not enabled so far.

RFC PATCH v2 changes:
 - Fix issues reported by checkpatch.pl
   (Thanks to Stephen Hemminger)


Tetsuya Mukawa (2):
  ethdev: Add a new event type to notify a queue state changed event
  vhost: Add VHOST PMD

 MAINTAINERS                                 |   4 +
 config/common_linuxapp                      |   6 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/rel_notes/release_2_3.rst        |   4 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c           | 907 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
 drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
 lib/librte_ether/rte_ethdev.h               |   2 +
 mk/rte.app.mk                               |   6 +
 11 files changed, 1116 insertions(+)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

-- 
2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v8 1/2] ethdev: Add a new event type to notify a queue state changed event
  2016-02-04  7:26                           ` [PATCH v7 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  2016-02-04 11:17                             ` Ferruh Yigit
  2016-02-05 11:28                             ` [PATCH v8 0/2] " Tetsuya Mukawa
@ 2016-02-05 11:28                             ` Tetsuya Mukawa
  2016-02-06  4:57                               ` Yuanhan Liu
  2016-02-05 11:28                             ` [PATCH v8 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  3 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-05 11:28 UTC (permalink / raw)
  To: dev, ferruh.yigit; +Cc: ann.zhuangyanying, yuanhan.liu

This patch adds a below event type.
 - RTE_ETH_EVENT_QUEUE_STATE_CHANGE
This event is used for notifying a queue state changed event.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 lib/librte_ether/rte_ethdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 8710dd7..2fbf42a 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2661,6 +2661,8 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 enum rte_eth_event_type {
 	RTE_ETH_EVENT_UNKNOWN,  /**< unknown event type */
 	RTE_ETH_EVENT_INTR_LSC, /**< lsc interrupt event */
+	RTE_ETH_EVENT_QUEUE_STATE_CHANGE,
+				/**< queue state changed interrupt */
 	RTE_ETH_EVENT_MAX       /**< max value of this enum */
 };
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v8 2/2] vhost: Add VHOST PMD
  2016-02-04  7:26                           ` [PATCH v7 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
                                               ` (2 preceding siblings ...)
  2016-02-05 11:28                             ` [PATCH v8 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
@ 2016-02-05 11:28                             ` Tetsuya Mukawa
  2016-02-06  5:12                               ` Yuanhan Liu
                                                 ` (3 more replies)
  3 siblings, 4 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-05 11:28 UTC (permalink / raw)
  To: dev, ferruh.yigit; +Cc: ann.zhuangyanying, yuanhan.liu

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
 - iface:  The parameter is used to specify a path to connect to a
           virtio-net device.
 - queues: The parameter is used to specify the number of the queues
           virtio-net device has.
           (Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
        <snip>
        -chardev socket,id=chr0,path=/tmp/sock0 \
        -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
        -device virtio-net-pci,netdev=net0,mq=on

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 MAINTAINERS                                 |   4 +
 config/common_linuxapp                      |   6 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/rel_notes/release_2_3.rst        |   4 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c           | 907 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
 drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
 mk/rte.app.mk                               |   6 +
 10 files changed, 1114 insertions(+)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index b90aeea..a44ce9d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -348,6 +348,10 @@ Null PMD
 M: Tetsuya Mukawa <mukawa@igel.co.jp>
 F: drivers/net/null/
 
+Vhost PMD
+M: Tetsuya Mukawa <mukawa@igel.co.jp>
+F: drivers/net/vhost/
+
 Intel AES-NI Multi-Buffer
 M: Declan Doherty <declan.doherty@intel.com>
 F: drivers/crypto/aesni_mb/
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 74bc515..357b557 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -514,6 +514,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 
 #
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 33c9cea..5819cdb 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -47,6 +47,7 @@ Network Interface Controller Drivers
     nfp
     szedata2
     virtio
+    vhost
     vmxnet3
     pcap_ring
 
diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst
index 7945694..d43b7ad 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -39,6 +39,10 @@ This section should contain new features added in this release. Sample format:
 
   Enabled virtio 1.0 support for virtio pmd driver.
 
+* **Added vhost PMD.**
+
+  Added virtual PMD that wraps librte_vhost.
+
 
 Resolved Issues
 ---------------
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 6e4497e..4300b93 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -52,5 +52,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 include $(RTE_SDK)/mk/rte.sharelib.mk
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..f49a69b
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+#   BSD LICENSE
+#
+#   Copyright (c) 2010-2016 Intel Corporation.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..f2095c3
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,907 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2016 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co.,Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+#include <stdbool.h>
+#ifdef RTE_LIBRTE_VHOST_NUMA
+#include <numaif.h>
+#endif
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+#include <rte_spinlock.h>
+
+#include "rte_eth_vhost.h"
+
+#define ETH_VHOST_IFACE_ARG		"iface"
+#define ETH_VHOST_QUEUES_ARG		"queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+	ETH_VHOST_IFACE_ARG,
+	ETH_VHOST_QUEUES_ARG,
+	NULL
+};
+
+static struct ether_addr base_eth_addr = {
+	.addr_bytes = {
+		0x56 /* V */,
+		0x48 /* H */,
+		0x4F /* O */,
+		0x53 /* S */,
+		0x54 /* T */,
+		0x00
+	}
+};
+
+struct vhost_queue {
+	rte_atomic32_t allow_queuing;
+	rte_atomic32_t while_queuing;
+	struct virtio_net *device;
+	struct pmd_internal *internal;
+	struct rte_mempool *mb_pool;
+	uint16_t virtqueue_id;
+	uint64_t rx_pkts;
+	uint64_t tx_pkts;
+	uint64_t missed_pkts;
+	uint64_t rx_bytes;
+	uint64_t tx_bytes;
+};
+
+struct pmd_internal {
+	char *dev_name;
+	char *iface_name;
+	volatile uint16_t once;
+};
+
+struct internal_list {
+	TAILQ_ENTRY(internal_list) next;
+	struct rte_eth_dev *eth_dev;
+};
+
+TAILQ_HEAD(internal_list_head, internal_list);
+static struct internal_list_head internal_list =
+	TAILQ_HEAD_INITIALIZER(internal_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static rte_atomic16_t nb_started_ports;
+static pthread_t session_th;
+
+static struct rte_eth_link pmd_link = {
+		.link_speed = 10000,
+		.link_duplex = ETH_LINK_FULL_DUPLEX,
+		.link_status = 0
+};
+
+struct rte_vhost_vring_state {
+	rte_spinlock_t lock;
+
+	bool cur[RTE_MAX_QUEUES_PER_PORT * 2];
+	bool seen[RTE_MAX_QUEUES_PER_PORT * 2];
+	unsigned int index;
+	unsigned int max_vring;
+};
+
+static struct rte_vhost_vring_state *vring_states[RTE_MAX_ETHPORTS];
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_rx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Dequeue packets from guest TX queue */
+	nb_rx = rte_vhost_dequeue_burst(r->device,
+			r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
+
+	r->rx_pkts += nb_rx;
+
+	for (i = 0; likely(i < nb_rx); i++)
+		r->rx_bytes += bufs[i]->pkt_len;
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_tx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Enqueue packets to guest RX queue */
+	nb_tx = rte_vhost_enqueue_burst(r->device,
+			r->virtqueue_id, bufs, nb_bufs);
+
+	r->tx_pkts += nb_tx;
+	r->missed_pkts += nb_bufs - nb_tx;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		r->tx_bytes += bufs[i]->pkt_len;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		rte_pktmbuf_free(bufs[i]);
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	return 0;
+}
+
+static inline struct internal_list *
+find_internal_resource(char *ifname)
+{
+	int found = 0;
+	struct internal_list *list;
+	struct pmd_internal *internal;
+
+	if (ifname == NULL)
+		return NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(list, &internal_list, next) {
+		internal = list->eth_dev->data->dev_private;
+		if (!strcmp(internal->iface_name, ifname)) {
+			found = 1;
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	if (!found)
+		return NULL;
+
+	return list;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct internal_list *list;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	list = find_internal_resource(dev->ifname);
+	if (list == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid device name\n");
+		return -1;
+	}
+
+	eth_dev = list->eth_dev;
+	internal = eth_dev->data->dev_private;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
+	}
+
+	dev->flags |= VIRTIO_DEV_RUNNING;
+	dev->priv = eth_dev;
+	eth_dev->data->dev_link.link_status = 1;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+
+	RTE_LOG(INFO, PMD, "New connection established\n");
+
+	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
+
+	return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return;
+	}
+
+	eth_dev = (struct rte_eth_dev *)dev->priv;
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+		return;
+	}
+
+	/* Wait until rx/tx_pkt_burst stops accessing vhost device */
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+
+	eth_dev->data->dev_link.link_status = 0;
+
+	dev->priv = NULL;
+	dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+
+	RTE_LOG(INFO, PMD, "Connection closed\n");
+
+	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
+}
+
+static int
+vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
+{
+	struct rte_vhost_vring_state *state;
+	struct rte_eth_dev *eth_dev;
+	struct internal_list *list;
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	int newnode, ret;
+#endif
+
+	if (dev == NULL) {
+		RTE_LOG(ERR, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	list = find_internal_resource(dev->ifname);
+	if (list == NULL) {
+		RTE_LOG(ERR, PMD, "Invalid interface name: %s\n", dev->ifname);
+		return -1;
+	}
+
+	eth_dev = list->eth_dev;
+	/* won't be NULL */
+	state = vring_states[eth_dev->data->port_id];
+
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	ret  = get_mempolicy(&newnode, NULL, 0, dev,
+			MPOL_F_NODE | MPOL_F_ADDR);
+	if (ret < 0) {
+		RTE_LOG(ERR, PMD, "Unknow numa node\n");
+		return -1;
+	}
+
+	eth_dev->data->numa_node = newnode;
+#endif
+	rte_spinlock_lock(&state->lock);
+	state->cur[vring] = enable;
+	state->max_vring = RTE_MAX(vring, state->max_vring);
+	rte_spinlock_unlock(&state->lock);
+
+	RTE_LOG(INFO, PMD, "vring%u is %s\n",
+			vring, enable ? "enabled" : "disabled");
+
+	_rte_eth_dev_callback_process(eth_dev,
+			RTE_ETH_EVENT_QUEUE_STATE_CHANGE);
+
+	return 0;
+}
+
+int
+rte_eth_vhost_get_queue_event(uint8_t port_id,
+		struct rte_eth_vhost_queue_event *event)
+{
+	struct rte_vhost_vring_state *state;
+	unsigned int i;
+	int idx;
+
+	if (port_id >= RTE_MAX_ETHPORTS) {
+		RTE_LOG(ERR, PMD, "Invalid port id\n");
+		return -1;
+	}
+
+	state = vring_states[port_id];
+	if (!state) {
+		RTE_LOG(ERR, PMD, "Unused port\n");
+		return -1;
+	}
+
+	rte_spinlock_lock(&state->lock);
+	for (i = 0; i <= state->max_vring; i++) {
+		idx = state->index++ % (state->max_vring + 1);
+
+		if (state->cur[idx] != state->seen[idx]) {
+			state->seen[idx] = state->cur[idx];
+			event->queue_id = idx / 2;
+			event->rx = idx & 1;
+			event->enable = state->cur[idx];
+			rte_spinlock_unlock(&state->lock);
+			return 0;
+		}
+	}
+	rte_spinlock_unlock(&state->lock);
+
+	return -1;
+}
+
+static void *
+vhost_driver_session(void *param __rte_unused)
+{
+	static struct virtio_net_device_ops vhost_ops;
+
+	/* set vhost arguments */
+	vhost_ops.new_device = new_device;
+	vhost_ops.destroy_device = destroy_device;
+	vhost_ops.vring_state_changed = vring_state_changed;
+	if (rte_vhost_driver_callback_register(&vhost_ops) < 0)
+		RTE_LOG(ERR, PMD, "Can't register callbacks\n");
+
+	/* start event handling */
+	rte_vhost_driver_session_start();
+
+	return NULL;
+}
+
+static int
+vhost_driver_session_start(void)
+{
+	int ret;
+
+	ret = pthread_create(&session_th,
+			NULL, vhost_driver_session, NULL);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Can't create a thread\n");
+
+	return ret;
+}
+
+static void
+vhost_driver_session_stop(void)
+{
+	int ret;
+
+	ret = pthread_cancel(session_th);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Can't cancel the thread\n");
+
+	ret = pthread_join(session_th, NULL);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+	int ret = 0;
+
+	if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+		ret = rte_vhost_driver_register(internal->iface_name);
+		if (ret)
+			return ret;
+	}
+
+	/* We need only one message handling thread */
+	if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
+		ret = vhost_driver_session_start();
+
+	return ret;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 1, 0))
+		rte_vhost_driver_unregister(internal->iface_name);
+
+	if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
+		vhost_driver_session_stop();
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct vhost_queue *vq;
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->mb_pool = mb_pool;
+	vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
+	dev->data->rx_queues[rx_queue_id] = vq;
+
+	return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct vhost_queue *vq;
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
+	dev->data->tx_queues[tx_queue_id] = vq;
+
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+	     struct rte_eth_dev_info *dev_info)
+{
+	dev_info->driver_name = drivername;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = (uint32_t)-1;
+	dev_info->max_rx_queues = dev->data->nb_rx_queues;
+	dev_info->max_tx_queues = dev->data->nb_tx_queues;
+	dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	unsigned i;
+	unsigned long rx_total = 0, tx_total = 0, tx_missed_total = 0;
+	unsigned long rx_total_bytes = 0, tx_total_bytes = 0;
+	struct vhost_queue *vq;
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+			i < dev->data->nb_rx_queues; i++) {
+		if (dev->data->rx_queues[i] == NULL)
+			continue;
+		vq = dev->data->rx_queues[i];
+		stats->q_ipackets[i] = vq->rx_pkts;
+		rx_total += stats->q_ipackets[i];
+
+		stats->q_ibytes[i] = vq->rx_bytes;
+		rx_total_bytes += stats->q_ibytes[i];
+	}
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+			i < dev->data->nb_tx_queues; i++) {
+		if (dev->data->tx_queues[i] == NULL)
+			continue;
+		vq = dev->data->tx_queues[i];
+		stats->q_opackets[i] = vq->tx_pkts;
+		tx_missed_total += vq->missed_pkts;
+		tx_total += stats->q_opackets[i];
+
+		stats->q_obytes[i] = vq->tx_bytes;
+		tx_total_bytes += stats->q_obytes[i];
+	}
+
+	stats->ipackets = rx_total;
+	stats->opackets = tx_total;
+	stats->imissed = tx_missed_total;
+	stats->ibytes = rx_total_bytes;
+	stats->obytes = tx_total_bytes;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct vhost_queue *vq;
+	unsigned i;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		if (dev->data->rx_queues[i] == NULL)
+			continue;
+		vq = dev->data->rx_queues[i];
+		vq->rx_pkts = 0;
+		vq->rx_bytes = 0;
+	}
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		if (dev->data->tx_queues[i] == NULL)
+			continue;
+		vq = dev->data->tx_queues[i];
+		vq->tx_pkts = 0;
+		vq->tx_bytes = 0;
+		vq->missed_pkts = 0;
+	}
+}
+
+static void
+eth_queue_release(void *q)
+{
+	rte_free(q);
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+/**
+ * Disable features in feature_mask. Returns 0 on success.
+ */
+int
+rte_eth_vhost_feature_disable(uint64_t feature_mask)
+{
+	return rte_vhost_feature_disable(feature_mask);
+}
+
+/**
+ * Enable features in feature_mask. Returns 0 on success.
+ */
+int
+rte_eth_vhost_feature_enable(uint64_t feature_mask)
+{
+	return rte_vhost_feature_enable(feature_mask);
+}
+
+/* Returns currently supported vhost features */
+uint64_t
+rte_eth_vhost_feature_get(void)
+{
+	return rte_vhost_feature_get();
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, char *iface_name, int16_t queues,
+		     const unsigned numa_node)
+{
+	struct rte_eth_dev_data *data = NULL;
+	struct pmd_internal *internal = NULL;
+	struct rte_eth_dev *eth_dev = NULL;
+	struct ether_addr *eth_addr = NULL;
+	struct rte_vhost_vring_state *vring_state = NULL;
+	struct internal_list *list = NULL;
+
+	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+		numa_node);
+
+	/* now do all data allocation - for eth_dev structure, dummy pci driver
+	 * and internal (private) data
+	 */
+	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+	if (data == NULL)
+		goto error;
+
+	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+	if (internal == NULL)
+		goto error;
+
+	list = rte_zmalloc_socket(name, sizeof(*list), 0, numa_node);
+	if (list == NULL)
+		goto error;
+
+	/* reserve an ethdev entry */
+	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+	if (eth_dev == NULL)
+		goto error;
+
+	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+	if (eth_addr == NULL)
+		goto error;
+	*eth_addr = base_eth_addr;
+	eth_addr->addr_bytes[5] = eth_dev->data->port_id;
+
+	vring_state = rte_zmalloc_socket(name,
+			sizeof(*vring_state), 0, numa_node);
+	if (vring_state == NULL)
+		goto error;
+
+	TAILQ_INIT(&eth_dev->link_intr_cbs);
+
+	/* now put it all together
+	 * - store queue data in internal,
+	 * - store numa_node info in ethdev data
+	 * - point eth_dev_data to internals
+	 * - and point eth_dev structure to new eth_dev_data structure
+	 */
+	internal->dev_name = strdup(name);
+	if (internal->dev_name == NULL)
+		goto error;
+	internal->iface_name = strdup(iface_name);
+	if (internal->iface_name == NULL)
+		goto error;
+
+	list->eth_dev = eth_dev;
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internal_list, list, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	rte_spinlock_init(&vring_state->lock);
+	vring_states[eth_dev->data->port_id] = vring_state;
+
+	data->dev_private = internal;
+	data->port_id = eth_dev->data->port_id;
+	memmove(data->name, eth_dev->data->name, sizeof(data->name));
+	data->nb_rx_queues = queues;
+	data->nb_tx_queues = queues;
+	data->dev_link = pmd_link;
+	data->mac_addrs = eth_addr;
+
+	/* We'll replace the 'data' originally allocated by eth_dev. So the
+	 * vhost PMD resources won't be shared between multi processes.
+	 */
+	eth_dev->data = data;
+	eth_dev->dev_ops = &ops;
+	eth_dev->driver = NULL;
+	data->dev_flags =
+		RTE_ETH_DEV_DETACHABLE | RTE_ETH_DEV_INTR_LSC;
+	data->kdrv = RTE_KDRV_NONE;
+	data->drv_name = internal->dev_name;
+	data->numa_node = numa_node;
+
+	/* finally assign rx and tx ops */
+	eth_dev->rx_pkt_burst = eth_vhost_rx;
+	eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+	return data->port_id;
+
+error:
+	if (internal)
+		free(internal->dev_name);
+	rte_free(vring_state);
+	rte_free(eth_addr);
+	if (eth_dev)
+		rte_eth_dev_release_port(eth_dev);
+	rte_free(internal);
+	rte_free(list);
+	rte_free(data);
+
+	return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	const char **iface_name = extra_args;
+
+	if (value == NULL)
+		return -1;
+
+	*iface_name = value;
+
+	return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *q = extra_args;
+
+	if (value == NULL || extra_args == NULL)
+		return -EINVAL;
+
+	*q = (uint16_t)strtoul(value, NULL, 0);
+	if (*q == USHRT_MAX && errno == ERANGE)
+		return -1;
+
+	if (*q > RTE_MAX_QUEUES_PER_PORT)
+		return -1;
+
+	return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
+	char *iface_name;
+	uint16_t queues;
+
+	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+		return -1;
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+					 &open_iface, &iface_name);
+		if (ret < 0)
+			goto out_free;
+	} else {
+		ret = -1;
+		goto out_free;
+	}
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+					 &open_queues, &queues);
+		if (ret < 0)
+			goto out_free;
+
+	} else
+		queues = 1;
+
+	eth_dev_vhost_create(name, iface_name, queues, rte_socket_id());
+
+out_free:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internal *internal;
+	struct internal_list *list;
+	unsigned int i;
+
+	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+	/* find an ethdev entry */
+	eth_dev = rte_eth_dev_allocated(name);
+	if (eth_dev == NULL)
+		return -ENODEV;
+
+	internal = eth_dev->data->dev_private;
+	if (internal == NULL)
+		return -ENODEV;
+
+	list = find_internal_resource(internal->iface_name);
+	if (list == NULL)
+		return -ENODEV;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_REMOVE(&internal_list, list, next);
+	pthread_mutex_unlock(&internal_list_lock);
+	rte_free(list);
+
+	eth_dev_stop(eth_dev);
+
+	rte_free(vring_states[eth_dev->data->port_id]);
+	vring_states[eth_dev->data->port_id] = NULL;
+
+	free(internal->dev_name);
+	free(internal->iface_name);
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++)
+		rte_free(eth_dev->data->rx_queues[i]);
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
+		rte_free(eth_dev->data->tx_queues[i]);
+
+	rte_free(eth_dev->data->mac_addrs);
+	rte_free(eth_dev->data);
+	rte_free(internal);
+
+	rte_eth_dev_release_port(eth_dev);
+
+	return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+	.name = "eth_vhost",
+	.type = PMD_VDEV,
+	.init = rte_pmd_vhost_devinit,
+	.uninit = rte_pmd_vhost_devuninit,
+};
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..e78cb74
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,109 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co., Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_VHOST_H_
+#define _RTE_ETH_VHOST_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+#include <stdbool.h>
+
+#include <rte_virtio_net.h>
+
+/**
+ * Disable features in feature_mask.
+ *
+ * @param feature_mask
+ *  Vhost features defined in "linux/virtio_net.h".
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_feature_disable(uint64_t feature_mask);
+
+/**
+ * Enable features in feature_mask.
+ *
+ * @param feature_mask
+ *  Vhost features defined in "linux/virtio_net.h".
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_feature_enable(uint64_t feature_mask);
+
+/**
+ * Returns currently supported vhost features.
+ *
+ * @return
+ *  Vhost features defined in "linux/virtio_net.h".
+ */
+uint64_t rte_eth_vhost_feature_get(void);
+
+/*
+ * Event description.
+ */
+struct rte_eth_vhost_queue_event {
+	uint16_t queue_id;
+	bool rx;
+	bool enable;
+};
+
+/**
+ * Get queue events from specified port.
+ * If a callback for below event is registered by
+ * rte_eth_dev_callback_register(), this function will describe what was
+ * changed.
+ *  - RTE_ETH_EVENT_QUEUE_STATE_CHANGE
+ * Multiple events may cause only one callback kicking, so call this function
+ * while returning 0.
+ *
+ * @param port_id
+ *  Port id.
+ * @param event
+ *  Pointer to a rte_eth_vhost_queue_event structure.
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_get_queue_event(uint8_t port_id,
+		struct rte_eth_vhost_queue_event *event);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..3280b0d
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,11 @@
+DPDK_2.3 {
+
+	global:
+
+	rte_eth_vhost_feature_disable;
+	rte_eth_vhost_feature_enable;
+	rte_eth_vhost_feature_get;
+	rte_eth_vhost_get_queue_event;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 8ecab41..fca210d 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -159,6 +159,12 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_QAT)        += -lrte_pmd_qat
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -lrte_pmd_aesni_mb
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
+
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
 
 endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [PATCH v8 1/2] ethdev: Add a new event type to notify a queue state changed event
  2016-02-05 11:28                             ` [PATCH v8 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
@ 2016-02-06  4:57                               ` Yuanhan Liu
  0 siblings, 0 replies; 200+ messages in thread
From: Yuanhan Liu @ 2016-02-06  4:57 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Fri, Feb 05, 2016 at 08:28:12PM +0900, Tetsuya Mukawa wrote:
> This patch adds a below event type.
>  - RTE_ETH_EVENT_QUEUE_STATE_CHANGE
> This event is used for notifying a queue state changed event.
> 
> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
>  lib/librte_ether/rte_ethdev.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 8710dd7..2fbf42a 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -2661,6 +2661,8 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
>  enum rte_eth_event_type {
>  	RTE_ETH_EVENT_UNKNOWN,  /**< unknown event type */
>  	RTE_ETH_EVENT_INTR_LSC, /**< lsc interrupt event */
> +	RTE_ETH_EVENT_QUEUE_STATE_CHANGE,
> +				/**< queue state changed interrupt */

Though it's defined to be a generic event, you might need to note in
the comment that vhost-pmd is the only user so far.

Anyway, it looks good to me:

Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v8 2/2] vhost: Add VHOST PMD
  2016-02-05 11:28                             ` [PATCH v8 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
@ 2016-02-06  5:12                               ` Yuanhan Liu
  2016-02-09  9:38                               ` [PATCH v9 0/2] " Tetsuya Mukawa
                                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ messages in thread
From: Yuanhan Liu @ 2016-02-06  5:12 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Fri, Feb 05, 2016 at 08:28:13PM +0900, Tetsuya Mukawa wrote:
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> The vhost messages will be handled only when a port is started. So start
> a port first, then invoke QEMU.
> 
> The PMD has 2 parameters.
>  - iface:  The parameter is used to specify a path to connect to a
>            virtio-net device.
>  - queues: The parameter is used to specify the number of the queues
>            virtio-net device has.
>            (Default: 1)
> 
> Here is an example.
> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
> 
> To connect above testpmd, here is qemu command example.
> 
> $ qemu-system-x86_64 \
>         <snip>
>         -chardev socket,id=chr0,path=/tmp/sock0 \
>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
>         -device virtio-net-pci,netdev=net0,mq=on
> 
> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

Thanks.

	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 2/2] vhost: Add VHOST PMD
  2016-02-05  6:28                               ` Tetsuya Mukawa
  2016-02-05  6:35                                 ` Yuanhan Liu
@ 2016-02-08  9:42                                 ` Ferruh Yigit
  2016-02-09  1:54                                   ` Tetsuya Mukawa
  1 sibling, 1 reply; 200+ messages in thread
From: Ferruh Yigit @ 2016-02-08  9:42 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying, yuanhan.liu

On Fri, Feb 05, 2016 at 03:28:37PM +0900, Tetsuya Mukawa wrote:
> On 2016/02/04 20:17, Ferruh Yigit wrote:
> > On Thu, Feb 04, 2016 at 04:26:31PM +0900, Tetsuya Mukawa wrote:
> >
> > Hi Tetsuya,
> >
> >> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> >> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> >> The vhost messages will be handled only when a port is started. So start
> >> a port first, then invoke QEMU.
> >>
> >> The PMD has 2 parameters.
> >>  - iface:  The parameter is used to specify a path to connect to a
> >>            virtio-net device.
> >>  - queues: The parameter is used to specify the number of the queues
> >>            virtio-net device has.
> >>            (Default: 1)
> >>
> >> Here is an example.
> >> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
> >>
> >> To connect above testpmd, here is qemu command example.
> >>
> >> $ qemu-system-x86_64 \
> >>         <snip>
> >>         -chardev socket,id=chr0,path=/tmp/sock0 \
> >>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
> >>         -device virtio-net-pci,netdev=net0,mq=on
> >>
> >> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> > Please find some more comments, mostly minor nits,
> >
> > please feel free to add my ack for next version of this patch:
> > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> >
> > <...>
> >> diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
> >> new file mode 100644
> >> index 0000000..b2305c2
> >> --- /dev/null
> >> +++ b/drivers/net/vhost/rte_eth_vhost.c
> > <...>
> >> +
> >> +struct pmd_internal {
> >> +	TAILQ_ENTRY(pmd_internal) next;
> >> +	char *dev_name;
> >> +	char *iface_name;
> >> +	uint8_t port_id;
> > You can also get rid of port_id too, if you keep list of rte_eth_dev.
> > But this is not so important, keep as it is if you want to.
> 
> Thank you so much for checking and good suggestions.
> I will follow your comments without below.
> 
> >> +
> >> +	volatile uint16_t once;
> >> +};
> >> +
> > <...>
> >> +
> >> +static int
> >> +new_device(struct virtio_net *dev)
> >> +{
> > <...>
> >> +
> >> +	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> >> +		vq = eth_dev->data->rx_queues[i];
> >> +		if (vq == NULL)
> > can vq be NULL? It is allocated in rx/tx_queue_setup() and there is already a NULL check there?
> 
> I doubt user may not setup all queues.
> In this case, we need above checking.
> 
> >
> >> +			continue;
> >> +		vq->device = dev;
> >> +		vq->internal = internal;
> >> +		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
> >> +	}
> >> +	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
> >> +		vq = eth_dev->data->tx_queues[i];
> >> +		if (vq == NULL)
> >> +			continue;
> 
> Same here.
> 
> >> +		vq->device = dev;
> >> +		vq->internal = internal;
> >> +		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
> >> +	}
> >> +
> >> +	dev->flags |= VIRTIO_DEV_RUNNING;
> >> +	dev->priv = eth_dev;
> >> +	eth_dev->data->dev_link.link_status = 1;
> >> +
> >> +	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> >> +		vq = eth_dev->data->rx_queues[i];
> >> +		if (vq == NULL)
> >> +			continue;
> 
> But we can remove this.

If in above loop, vq can be NULL because user not setup the queue, it will be NULL in here too, isn't it?
Why we can remove NULL check here?

> 
> >> +		rte_atomic32_set(&vq->allow_queuing, 1);
> >> +	}
> >> +	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
> >> +		vq = eth_dev->data->tx_queues[i];
> >> +		if (vq == NULL)
> >> +			continue;
> 
> And this.
> 
> > <...>
> >> +		if (dev->data->rx_queues[i] == NULL)
> >> +			continue;
> >> +		vq = dev->data->rx_queues[i];
> >> +		stats->q_ipackets[i] = vq->rx_pkts;
> >> +		rx_total += stats->q_ipackets[i];
> >> +
> >> +		stats->q_ibytes[i] = vq->rx_bytes;
> >> +		rx_total_bytes += stats->q_ibytes[i];
> >> +	}
> >> +
> >> +	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
> >> +	     i < dev->data->nb_tx_queues; i++) {
> >> +		if (dev->data->tx_queues[i] == NULL)
> > more queue NULL check here, I am not sure if these are necessary
> 
> Same here, in the case user doesn't setup all queues, I will leave this.
> 
> Anyway, I will fix below code.
>  - Manage ether devices list instead of internal structures list.
>  - Remove needless NULL checking.
>  - Replace "pthread_exit" to "return NULL".
>  - Replace rte_panic to RTE_LOG, also add error handling.
>  - Remove duplicated lines.
>  - Remove needless casting.
>  - Follow coding style.
>  - Remove needless parenthesis.
> 
> And leave below.
>  - some NULL checking before accessing a queue.
> 
> Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 2/2] vhost: Add VHOST PMD
  2016-02-08  9:42                                 ` Ferruh Yigit
@ 2016-02-09  1:54                                   ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-09  1:54 UTC (permalink / raw)
  To: dev, yuanhan.liu, ann.zhuangyanying

On 2016/02/08 18:42, Ferruh Yigit wrote:
> On Fri, Feb 05, 2016 at 03:28:37PM +0900, Tetsuya Mukawa wrote:
>> On 2016/02/04 20:17, Ferruh Yigit wrote:
>>> On Thu, Feb 04, 2016 at 04:26:31PM +0900, Tetsuya Mukawa wrote:
>>>
>>> Hi Tetsuya,
>>>
>>>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>>>> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
>>>> The vhost messages will be handled only when a port is started. So start
>>>> a port first, then invoke QEMU.
>>>>
>>>> The PMD has 2 parameters.
>>>>  - iface:  The parameter is used to specify a path to connect to a
>>>>            virtio-net device.
>>>>  - queues: The parameter is used to specify the number of the queues
>>>>            virtio-net device has.
>>>>            (Default: 1)
>>>>
>>>> Here is an example.
>>>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
>>>>
>>>> To connect above testpmd, here is qemu command example.
>>>>
>>>> $ qemu-system-x86_64 \
>>>>         <snip>
>>>>         -chardev socket,id=chr0,path=/tmp/sock0 \
>>>>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
>>>>         -device virtio-net-pci,netdev=net0,mq=on
>>>>
>>>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
>>> Please find some more comments, mostly minor nits,
>>>
>>> please feel free to add my ack for next version of this patch:
>>> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
>>>
>>> <...>
>>>> diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
>>>> new file mode 100644
>>>> index 0000000..b2305c2
>>>> --- /dev/null
>>>> +++ b/drivers/net/vhost/rte_eth_vhost.c
>>> <...>
>>>> +
>>>> +struct pmd_internal {
>>>> +	TAILQ_ENTRY(pmd_internal) next;
>>>> +	char *dev_name;
>>>> +	char *iface_name;
>>>> +	uint8_t port_id;
>>> You can also get rid of port_id too, if you keep list of rte_eth_dev.
>>> But this is not so important, keep as it is if you want to.
>> Thank you so much for checking and good suggestions.
>> I will follow your comments without below.
>>
>>>> +
>>>> +	volatile uint16_t once;
>>>> +};
>>>> +
>>> <...>
>>>> +
>>>> +static int
>>>> +new_device(struct virtio_net *dev)
>>>> +{
>>> <...>
>>>> +
>>>> +	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
>>>> +		vq = eth_dev->data->rx_queues[i];
>>>> +		if (vq == NULL)
>>> can vq be NULL? It is allocated in rx/tx_queue_setup() and there is already a NULL check there?
>> I doubt user may not setup all queues.
>> In this case, we need above checking.
>>
>>>> +			continue;
>>>> +		vq->device = dev;
>>>> +		vq->internal = internal;
>>>> +		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
>>>> +	}
>>>> +	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
>>>> +		vq = eth_dev->data->tx_queues[i];
>>>> +		if (vq == NULL)
>>>> +			continue;
>> Same here.
>>
>>>> +		vq->device = dev;
>>>> +		vq->internal = internal;
>>>> +		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
>>>> +	}
>>>> +
>>>> +	dev->flags |= VIRTIO_DEV_RUNNING;
>>>> +	dev->priv = eth_dev;
>>>> +	eth_dev->data->dev_link.link_status = 1;
>>>> +
>>>> +	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
>>>> +		vq = eth_dev->data->rx_queues[i];
>>>> +		if (vq == NULL)
>>>> +			continue;
>> But we can remove this.
> If in above loop, vq can be NULL because user not setup the queue, it will be NULL in here too, isn't it?
> Why we can remove NULL check here?

Yes, you are right.
Will fix it and submit again.

Thanks,
Tetsuya


>>>> +		rte_atomic32_set(&vq->allow_queuing, 1);
>>>> +	}
>>>> +	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
>>>> +		vq = eth_dev->data->tx_queues[i];
>>>> +		if (vq == NULL)
>>>> +			continue;
>> And this.
>>
>>> <...>
>>>> +		if (dev->data->rx_queues[i] == NULL)
>>>> +			continue;
>>>> +		vq = dev->data->rx_queues[i];
>>>> +		stats->q_ipackets[i] = vq->rx_pkts;
>>>> +		rx_total += stats->q_ipackets[i];
>>>> +
>>>> +		stats->q_ibytes[i] = vq->rx_bytes;
>>>> +		rx_total_bytes += stats->q_ibytes[i];
>>>> +	}
>>>> +
>>>> +	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
>>>> +	     i < dev->data->nb_tx_queues; i++) {
>>>> +		if (dev->data->tx_queues[i] == NULL)
>>> more queue NULL check here, I am not sure if these are necessary
>> Same here, in the case user doesn't setup all queues, I will leave this.
>>
>> Anyway, I will fix below code.
>>  - Manage ether devices list instead of internal structures list.
>>  - Remove needless NULL checking.
>>  - Replace "pthread_exit" to "return NULL".
>>  - Replace rte_panic to RTE_LOG, also add error handling.
>>  - Remove duplicated lines.
>>  - Remove needless casting.
>>  - Follow coding style.
>>  - Remove needless parenthesis.
>>
>> And leave below.
>>  - some NULL checking before accessing a queue.
>>
>> Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v9 0/2] Add VHOST PMD
  2016-02-05 11:28                             ` [PATCH v8 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  2016-02-06  5:12                               ` Yuanhan Liu
@ 2016-02-09  9:38                               ` Tetsuya Mukawa
  2016-02-24  2:45                                 ` Qiu, Michael
  2016-02-09  9:38                               ` [PATCH v9 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
  2016-02-09  9:38                               ` [PATCH v9 " Tetsuya Mukawa
  3 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-09  9:38 UTC (permalink / raw)
  To: dev, ferruh.yigit; +Cc: ann.zhuangyanying, yuanhan.liu

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.


PATCH v9 changes:
 - Fix a null pointer access issue implemented in v8 patch.

PATCH v8 changes:
 - Manage ether devices list instead of internal structures list.
 - Remove needless NULL checking.
 - Replace "pthread_exit" to "return NULL".
 - Replace rte_panic to RTE_LOG, also add error handling.
 - Remove duplicated lines.
 - Remove needless casting.
 - Follow coding style.
 - Remove needless parenthesis.

PATCH v7 changes:
 - Remove needless parenthesis.
 - Add release note.
 - Remove needless line wraps.
 - Add null pointer check in vring_state_changed().
 - Free queue memory in eth_queue_release().
 - Fix wrong variable name.
 - Fix error handling code of eth_dev_vhost_create() and
   rte_pmd_vhost_devuninit().
 - Remove needless null checking from rte_pmd_vhost_devinit/devuninit().
 - Use port id to create mac address.
 - Add doxygen style comments in "rte_eth_vhost.h".
 - Fix wrong comment in "mk/rte.app.mk".

PATCH v6 changes:
 - Remove rte_vhost_driver_pmd_callback_registe().
 - Support link status interrupt.
 - Support queue state changed interrupt.
 - Add rte_eth_vhost_get_queue_event().
 - Support numa node detection when new device is connected.

PATCH v5 changes:
 - Rebase on latest master.
 - Fix RX/TX routine to count RX/TX bytes.
 - Fix RX/TX routine not to count as error packets if enqueue/dequeue
   cannot send all packets.
 - Fix if-condition checking for multiqueues.
 - Add "static" to pthread variable.
 - Fix format.
 - Change default behavior not to receive queueing event from driver.
 - Split the patch to separate rte_eth_vhost_portid2vdev().

PATCH v4 changes:
 - Rebase on latest DPDK tree.
 - Fix cording style.
 - Fix code not to invoke multiple messaging handling threads.
 - Fix code to handle vdev parameters correctly.
 - Remove needless cast.
 - Remove needless if-condition before rt_free().

PATCH v3 changes:
 - Rebase on latest matser
 - Specify correct queue_id in RX/TX function.

PATCH v2 changes:
 - Remove a below patch that fixes vhost library.
   The patch was applied as a separate patch.
   - vhost: fix crash with multiqueue enabled
 - Fix typos.
   (Thanks to Thomas, Monjalon)
 - Rebase on latest tree with above bernard's patches.

PATCH v1 changes:
 - Support vhost multiple queues.
 - Rebase on "remove pci driver from vdevs".
 - Optimize RX/TX functions.
 - Fix resource leaks.
 - Fix compile issue.
 - Add patch to fix vhost library.

RFC PATCH v3 changes:
 - Optimize performance.
   In RX/TX functions, change code to access only per core data.
 - Add below API to allow user to use vhost library APIs for a port managed
   by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
    - rte_eth_vhost_portid2vdev()
   To support this functionality, vhost library is also changed.
   Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
 - Add code to support vhost multiple queues.
   Actually, multiple queues functionality is not enabled so far.

RFC PATCH v2 changes:
 - Fix issues reported by checkpatch.pl
   (Thanks to Stephen Hemminger)


Tetsuya Mukawa (2):
  ethdev: Add a new event type to notify a queue state changed event
  vhost: Add VHOST PMD

 MAINTAINERS                                 |   4 +
 config/common_linuxapp                      |   6 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/rel_notes/release_2_3.rst        |   4 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c           | 911 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
 drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
 lib/librte_ether/rte_ethdev.h               |   2 +
 mk/rte.app.mk                               |   6 +
 11 files changed, 1120 insertions(+)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

-- 
2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v9 1/2] ethdev: Add a new event type to notify a queue state changed event
  2016-02-05 11:28                             ` [PATCH v8 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  2016-02-06  5:12                               ` Yuanhan Liu
  2016-02-09  9:38                               ` [PATCH v9 0/2] " Tetsuya Mukawa
@ 2016-02-09  9:38                               ` Tetsuya Mukawa
  2016-03-04  4:17                                 ` [PATCH v10 0/2] Add VHOST PMD Tetsuya Mukawa
                                                   ` (2 more replies)
  2016-02-09  9:38                               ` [PATCH v9 " Tetsuya Mukawa
  3 siblings, 3 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-09  9:38 UTC (permalink / raw)
  To: dev, ferruh.yigit; +Cc: ann.zhuangyanying, yuanhan.liu

This patch adds a below event type.
 - RTE_ETH_EVENT_QUEUE_STATE_CHANGE
This event is used for notifying a queue state changed event.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 lib/librte_ether/rte_ethdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 8710dd7..2fbf42a 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2661,6 +2661,8 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 enum rte_eth_event_type {
 	RTE_ETH_EVENT_UNKNOWN,  /**< unknown event type */
 	RTE_ETH_EVENT_INTR_LSC, /**< lsc interrupt event */
+	RTE_ETH_EVENT_QUEUE_STATE_CHANGE,
+				/**< queue state changed interrupt */
 	RTE_ETH_EVENT_MAX       /**< max value of this enum */
 };
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v9 2/2] vhost: Add VHOST PMD
  2016-02-05 11:28                             ` [PATCH v8 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
                                                 ` (2 preceding siblings ...)
  2016-02-09  9:38                               ` [PATCH v9 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
@ 2016-02-09  9:38                               ` Tetsuya Mukawa
  3 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-09  9:38 UTC (permalink / raw)
  To: dev, ferruh.yigit; +Cc: ann.zhuangyanying, yuanhan.liu

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
 - iface:  The parameter is used to specify a path to connect to a
           virtio-net device.
 - queues: The parameter is used to specify the number of the queues
           virtio-net device has.
           (Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
        <snip>
        -chardev socket,id=chr0,path=/tmp/sock0 \
        -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
        -device virtio-net-pci,netdev=net0,mq=on

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 MAINTAINERS                                 |   4 +
 config/common_linuxapp                      |   6 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/rel_notes/release_2_3.rst        |   4 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c           | 911 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
 drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
 mk/rte.app.mk                               |   6 +
 10 files changed, 1118 insertions(+)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index b90aeea..a44ce9d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -348,6 +348,10 @@ Null PMD
 M: Tetsuya Mukawa <mukawa@igel.co.jp>
 F: drivers/net/null/
 
+Vhost PMD
+M: Tetsuya Mukawa <mukawa@igel.co.jp>
+F: drivers/net/vhost/
+
 Intel AES-NI Multi-Buffer
 M: Declan Doherty <declan.doherty@intel.com>
 F: drivers/crypto/aesni_mb/
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 74bc515..357b557 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -514,6 +514,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 
 #
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 33c9cea..5819cdb 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -47,6 +47,7 @@ Network Interface Controller Drivers
     nfp
     szedata2
     virtio
+    vhost
     vmxnet3
     pcap_ring
 
diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst
index 7945694..d43b7ad 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -39,6 +39,10 @@ This section should contain new features added in this release. Sample format:
 
   Enabled virtio 1.0 support for virtio pmd driver.
 
+* **Added vhost PMD.**
+
+  Added virtual PMD that wraps librte_vhost.
+
 
 Resolved Issues
 ---------------
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 6e4497e..4300b93 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -52,5 +52,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 include $(RTE_SDK)/mk/rte.sharelib.mk
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..f49a69b
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+#   BSD LICENSE
+#
+#   Copyright (c) 2010-2016 Intel Corporation.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..409b385
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,911 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2016 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co.,Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+#include <stdbool.h>
+#ifdef RTE_LIBRTE_VHOST_NUMA
+#include <numaif.h>
+#endif
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+#include <rte_spinlock.h>
+
+#include "rte_eth_vhost.h"
+
+#define ETH_VHOST_IFACE_ARG		"iface"
+#define ETH_VHOST_QUEUES_ARG		"queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+	ETH_VHOST_IFACE_ARG,
+	ETH_VHOST_QUEUES_ARG,
+	NULL
+};
+
+static struct ether_addr base_eth_addr = {
+	.addr_bytes = {
+		0x56 /* V */,
+		0x48 /* H */,
+		0x4F /* O */,
+		0x53 /* S */,
+		0x54 /* T */,
+		0x00
+	}
+};
+
+struct vhost_queue {
+	rte_atomic32_t allow_queuing;
+	rte_atomic32_t while_queuing;
+	struct virtio_net *device;
+	struct pmd_internal *internal;
+	struct rte_mempool *mb_pool;
+	uint16_t virtqueue_id;
+	uint64_t rx_pkts;
+	uint64_t tx_pkts;
+	uint64_t missed_pkts;
+	uint64_t rx_bytes;
+	uint64_t tx_bytes;
+};
+
+struct pmd_internal {
+	char *dev_name;
+	char *iface_name;
+	volatile uint16_t once;
+};
+
+struct internal_list {
+	TAILQ_ENTRY(internal_list) next;
+	struct rte_eth_dev *eth_dev;
+};
+
+TAILQ_HEAD(internal_list_head, internal_list);
+static struct internal_list_head internal_list =
+	TAILQ_HEAD_INITIALIZER(internal_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static rte_atomic16_t nb_started_ports;
+static pthread_t session_th;
+
+static struct rte_eth_link pmd_link = {
+		.link_speed = 10000,
+		.link_duplex = ETH_LINK_FULL_DUPLEX,
+		.link_status = 0
+};
+
+struct rte_vhost_vring_state {
+	rte_spinlock_t lock;
+
+	bool cur[RTE_MAX_QUEUES_PER_PORT * 2];
+	bool seen[RTE_MAX_QUEUES_PER_PORT * 2];
+	unsigned int index;
+	unsigned int max_vring;
+};
+
+static struct rte_vhost_vring_state *vring_states[RTE_MAX_ETHPORTS];
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_rx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Dequeue packets from guest TX queue */
+	nb_rx = rte_vhost_dequeue_burst(r->device,
+			r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
+
+	r->rx_pkts += nb_rx;
+
+	for (i = 0; likely(i < nb_rx); i++)
+		r->rx_bytes += bufs[i]->pkt_len;
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_tx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Enqueue packets to guest RX queue */
+	nb_tx = rte_vhost_enqueue_burst(r->device,
+			r->virtqueue_id, bufs, nb_bufs);
+
+	r->tx_pkts += nb_tx;
+	r->missed_pkts += nb_bufs - nb_tx;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		r->tx_bytes += bufs[i]->pkt_len;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		rte_pktmbuf_free(bufs[i]);
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	return 0;
+}
+
+static inline struct internal_list *
+find_internal_resource(char *ifname)
+{
+	int found = 0;
+	struct internal_list *list;
+	struct pmd_internal *internal;
+
+	if (ifname == NULL)
+		return NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(list, &internal_list, next) {
+		internal = list->eth_dev->data->dev_private;
+		if (!strcmp(internal->iface_name, ifname)) {
+			found = 1;
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	if (!found)
+		return NULL;
+
+	return list;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct internal_list *list;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	list = find_internal_resource(dev->ifname);
+	if (list == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid device name\n");
+		return -1;
+	}
+
+	eth_dev = list->eth_dev;
+	internal = eth_dev->data->dev_private;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
+	}
+
+	dev->flags |= VIRTIO_DEV_RUNNING;
+	dev->priv = eth_dev;
+	eth_dev->data->dev_link.link_status = 1;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+
+	RTE_LOG(INFO, PMD, "New connection established\n");
+
+	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
+
+	return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return;
+	}
+
+	eth_dev = (struct rte_eth_dev *)dev->priv;
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+		return;
+	}
+
+	/* Wait until rx/tx_pkt_burst stops accessing vhost device */
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+
+	eth_dev->data->dev_link.link_status = 0;
+
+	dev->priv = NULL;
+	dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+
+	RTE_LOG(INFO, PMD, "Connection closed\n");
+
+	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
+}
+
+static int
+vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
+{
+	struct rte_vhost_vring_state *state;
+	struct rte_eth_dev *eth_dev;
+	struct internal_list *list;
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	int newnode, ret;
+#endif
+
+	if (dev == NULL) {
+		RTE_LOG(ERR, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	list = find_internal_resource(dev->ifname);
+	if (list == NULL) {
+		RTE_LOG(ERR, PMD, "Invalid interface name: %s\n", dev->ifname);
+		return -1;
+	}
+
+	eth_dev = list->eth_dev;
+	/* won't be NULL */
+	state = vring_states[eth_dev->data->port_id];
+
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	ret  = get_mempolicy(&newnode, NULL, 0, dev,
+			MPOL_F_NODE | MPOL_F_ADDR);
+	if (ret < 0) {
+		RTE_LOG(ERR, PMD, "Unknow numa node\n");
+		return -1;
+	}
+
+	eth_dev->data->numa_node = newnode;
+#endif
+	rte_spinlock_lock(&state->lock);
+	state->cur[vring] = enable;
+	state->max_vring = RTE_MAX(vring, state->max_vring);
+	rte_spinlock_unlock(&state->lock);
+
+	RTE_LOG(INFO, PMD, "vring%u is %s\n",
+			vring, enable ? "enabled" : "disabled");
+
+	_rte_eth_dev_callback_process(eth_dev,
+			RTE_ETH_EVENT_QUEUE_STATE_CHANGE);
+
+	return 0;
+}
+
+int
+rte_eth_vhost_get_queue_event(uint8_t port_id,
+		struct rte_eth_vhost_queue_event *event)
+{
+	struct rte_vhost_vring_state *state;
+	unsigned int i;
+	int idx;
+
+	if (port_id >= RTE_MAX_ETHPORTS) {
+		RTE_LOG(ERR, PMD, "Invalid port id\n");
+		return -1;
+	}
+
+	state = vring_states[port_id];
+	if (!state) {
+		RTE_LOG(ERR, PMD, "Unused port\n");
+		return -1;
+	}
+
+	rte_spinlock_lock(&state->lock);
+	for (i = 0; i <= state->max_vring; i++) {
+		idx = state->index++ % (state->max_vring + 1);
+
+		if (state->cur[idx] != state->seen[idx]) {
+			state->seen[idx] = state->cur[idx];
+			event->queue_id = idx / 2;
+			event->rx = idx & 1;
+			event->enable = state->cur[idx];
+			rte_spinlock_unlock(&state->lock);
+			return 0;
+		}
+	}
+	rte_spinlock_unlock(&state->lock);
+
+	return -1;
+}
+
+static void *
+vhost_driver_session(void *param __rte_unused)
+{
+	static struct virtio_net_device_ops vhost_ops;
+
+	/* set vhost arguments */
+	vhost_ops.new_device = new_device;
+	vhost_ops.destroy_device = destroy_device;
+	vhost_ops.vring_state_changed = vring_state_changed;
+	if (rte_vhost_driver_callback_register(&vhost_ops) < 0)
+		RTE_LOG(ERR, PMD, "Can't register callbacks\n");
+
+	/* start event handling */
+	rte_vhost_driver_session_start();
+
+	return NULL;
+}
+
+static int
+vhost_driver_session_start(void)
+{
+	int ret;
+
+	ret = pthread_create(&session_th,
+			NULL, vhost_driver_session, NULL);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Can't create a thread\n");
+
+	return ret;
+}
+
+static void
+vhost_driver_session_stop(void)
+{
+	int ret;
+
+	ret = pthread_cancel(session_th);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Can't cancel the thread\n");
+
+	ret = pthread_join(session_th, NULL);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+	int ret = 0;
+
+	if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+		ret = rte_vhost_driver_register(internal->iface_name);
+		if (ret)
+			return ret;
+	}
+
+	/* We need only one message handling thread */
+	if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
+		ret = vhost_driver_session_start();
+
+	return ret;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 1, 0))
+		rte_vhost_driver_unregister(internal->iface_name);
+
+	if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
+		vhost_driver_session_stop();
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct vhost_queue *vq;
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->mb_pool = mb_pool;
+	vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
+	dev->data->rx_queues[rx_queue_id] = vq;
+
+	return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct vhost_queue *vq;
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
+	dev->data->tx_queues[tx_queue_id] = vq;
+
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+	     struct rte_eth_dev_info *dev_info)
+{
+	dev_info->driver_name = drivername;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = (uint32_t)-1;
+	dev_info->max_rx_queues = dev->data->nb_rx_queues;
+	dev_info->max_tx_queues = dev->data->nb_tx_queues;
+	dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	unsigned i;
+	unsigned long rx_total = 0, tx_total = 0, tx_missed_total = 0;
+	unsigned long rx_total_bytes = 0, tx_total_bytes = 0;
+	struct vhost_queue *vq;
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+			i < dev->data->nb_rx_queues; i++) {
+		if (dev->data->rx_queues[i] == NULL)
+			continue;
+		vq = dev->data->rx_queues[i];
+		stats->q_ipackets[i] = vq->rx_pkts;
+		rx_total += stats->q_ipackets[i];
+
+		stats->q_ibytes[i] = vq->rx_bytes;
+		rx_total_bytes += stats->q_ibytes[i];
+	}
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+			i < dev->data->nb_tx_queues; i++) {
+		if (dev->data->tx_queues[i] == NULL)
+			continue;
+		vq = dev->data->tx_queues[i];
+		stats->q_opackets[i] = vq->tx_pkts;
+		tx_missed_total += vq->missed_pkts;
+		tx_total += stats->q_opackets[i];
+
+		stats->q_obytes[i] = vq->tx_bytes;
+		tx_total_bytes += stats->q_obytes[i];
+	}
+
+	stats->ipackets = rx_total;
+	stats->opackets = tx_total;
+	stats->imissed = tx_missed_total;
+	stats->ibytes = rx_total_bytes;
+	stats->obytes = tx_total_bytes;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct vhost_queue *vq;
+	unsigned i;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		if (dev->data->rx_queues[i] == NULL)
+			continue;
+		vq = dev->data->rx_queues[i];
+		vq->rx_pkts = 0;
+		vq->rx_bytes = 0;
+	}
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		if (dev->data->tx_queues[i] == NULL)
+			continue;
+		vq = dev->data->tx_queues[i];
+		vq->tx_pkts = 0;
+		vq->tx_bytes = 0;
+		vq->missed_pkts = 0;
+	}
+}
+
+static void
+eth_queue_release(void *q)
+{
+	rte_free(q);
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+/**
+ * Disable features in feature_mask. Returns 0 on success.
+ */
+int
+rte_eth_vhost_feature_disable(uint64_t feature_mask)
+{
+	return rte_vhost_feature_disable(feature_mask);
+}
+
+/**
+ * Enable features in feature_mask. Returns 0 on success.
+ */
+int
+rte_eth_vhost_feature_enable(uint64_t feature_mask)
+{
+	return rte_vhost_feature_enable(feature_mask);
+}
+
+/* Returns currently supported vhost features */
+uint64_t
+rte_eth_vhost_feature_get(void)
+{
+	return rte_vhost_feature_get();
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, char *iface_name, int16_t queues,
+		     const unsigned numa_node)
+{
+	struct rte_eth_dev_data *data = NULL;
+	struct pmd_internal *internal = NULL;
+	struct rte_eth_dev *eth_dev = NULL;
+	struct ether_addr *eth_addr = NULL;
+	struct rte_vhost_vring_state *vring_state = NULL;
+	struct internal_list *list = NULL;
+
+	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+		numa_node);
+
+	/* now do all data allocation - for eth_dev structure, dummy pci driver
+	 * and internal (private) data
+	 */
+	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+	if (data == NULL)
+		goto error;
+
+	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+	if (internal == NULL)
+		goto error;
+
+	list = rte_zmalloc_socket(name, sizeof(*list), 0, numa_node);
+	if (list == NULL)
+		goto error;
+
+	/* reserve an ethdev entry */
+	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+	if (eth_dev == NULL)
+		goto error;
+
+	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+	if (eth_addr == NULL)
+		goto error;
+	*eth_addr = base_eth_addr;
+	eth_addr->addr_bytes[5] = eth_dev->data->port_id;
+
+	vring_state = rte_zmalloc_socket(name,
+			sizeof(*vring_state), 0, numa_node);
+	if (vring_state == NULL)
+		goto error;
+
+	TAILQ_INIT(&eth_dev->link_intr_cbs);
+
+	/* now put it all together
+	 * - store queue data in internal,
+	 * - store numa_node info in ethdev data
+	 * - point eth_dev_data to internals
+	 * - and point eth_dev structure to new eth_dev_data structure
+	 */
+	internal->dev_name = strdup(name);
+	if (internal->dev_name == NULL)
+		goto error;
+	internal->iface_name = strdup(iface_name);
+	if (internal->iface_name == NULL)
+		goto error;
+
+	list->eth_dev = eth_dev;
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internal_list, list, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	rte_spinlock_init(&vring_state->lock);
+	vring_states[eth_dev->data->port_id] = vring_state;
+
+	data->dev_private = internal;
+	data->port_id = eth_dev->data->port_id;
+	memmove(data->name, eth_dev->data->name, sizeof(data->name));
+	data->nb_rx_queues = queues;
+	data->nb_tx_queues = queues;
+	data->dev_link = pmd_link;
+	data->mac_addrs = eth_addr;
+
+	/* We'll replace the 'data' originally allocated by eth_dev. So the
+	 * vhost PMD resources won't be shared between multi processes.
+	 */
+	eth_dev->data = data;
+	eth_dev->dev_ops = &ops;
+	eth_dev->driver = NULL;
+	data->dev_flags =
+		RTE_ETH_DEV_DETACHABLE | RTE_ETH_DEV_INTR_LSC;
+	data->kdrv = RTE_KDRV_NONE;
+	data->drv_name = internal->dev_name;
+	data->numa_node = numa_node;
+
+	/* finally assign rx and tx ops */
+	eth_dev->rx_pkt_burst = eth_vhost_rx;
+	eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+	return data->port_id;
+
+error:
+	if (internal)
+		free(internal->dev_name);
+	rte_free(vring_state);
+	rte_free(eth_addr);
+	if (eth_dev)
+		rte_eth_dev_release_port(eth_dev);
+	rte_free(internal);
+	rte_free(list);
+	rte_free(data);
+
+	return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	const char **iface_name = extra_args;
+
+	if (value == NULL)
+		return -1;
+
+	*iface_name = value;
+
+	return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *q = extra_args;
+
+	if (value == NULL || extra_args == NULL)
+		return -EINVAL;
+
+	*q = (uint16_t)strtoul(value, NULL, 0);
+	if (*q == USHRT_MAX && errno == ERANGE)
+		return -1;
+
+	if (*q > RTE_MAX_QUEUES_PER_PORT)
+		return -1;
+
+	return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
+	char *iface_name;
+	uint16_t queues;
+
+	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+		return -1;
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+					 &open_iface, &iface_name);
+		if (ret < 0)
+			goto out_free;
+	} else {
+		ret = -1;
+		goto out_free;
+	}
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+					 &open_queues, &queues);
+		if (ret < 0)
+			goto out_free;
+
+	} else
+		queues = 1;
+
+	eth_dev_vhost_create(name, iface_name, queues, rte_socket_id());
+
+out_free:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internal *internal;
+	struct internal_list *list;
+	unsigned int i;
+
+	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+	/* find an ethdev entry */
+	eth_dev = rte_eth_dev_allocated(name);
+	if (eth_dev == NULL)
+		return -ENODEV;
+
+	internal = eth_dev->data->dev_private;
+	if (internal == NULL)
+		return -ENODEV;
+
+	list = find_internal_resource(internal->iface_name);
+	if (list == NULL)
+		return -ENODEV;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_REMOVE(&internal_list, list, next);
+	pthread_mutex_unlock(&internal_list_lock);
+	rte_free(list);
+
+	eth_dev_stop(eth_dev);
+
+	rte_free(vring_states[eth_dev->data->port_id]);
+	vring_states[eth_dev->data->port_id] = NULL;
+
+	free(internal->dev_name);
+	free(internal->iface_name);
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++)
+		rte_free(eth_dev->data->rx_queues[i]);
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
+		rte_free(eth_dev->data->tx_queues[i]);
+
+	rte_free(eth_dev->data->mac_addrs);
+	rte_free(eth_dev->data);
+	rte_free(internal);
+
+	rte_eth_dev_release_port(eth_dev);
+
+	return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+	.name = "eth_vhost",
+	.type = PMD_VDEV,
+	.init = rte_pmd_vhost_devinit,
+	.uninit = rte_pmd_vhost_devuninit,
+};
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..e78cb74
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,109 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co., Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_VHOST_H_
+#define _RTE_ETH_VHOST_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+#include <stdbool.h>
+
+#include <rte_virtio_net.h>
+
+/**
+ * Disable features in feature_mask.
+ *
+ * @param feature_mask
+ *  Vhost features defined in "linux/virtio_net.h".
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_feature_disable(uint64_t feature_mask);
+
+/**
+ * Enable features in feature_mask.
+ *
+ * @param feature_mask
+ *  Vhost features defined in "linux/virtio_net.h".
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_feature_enable(uint64_t feature_mask);
+
+/**
+ * Returns currently supported vhost features.
+ *
+ * @return
+ *  Vhost features defined in "linux/virtio_net.h".
+ */
+uint64_t rte_eth_vhost_feature_get(void);
+
+/*
+ * Event description.
+ */
+struct rte_eth_vhost_queue_event {
+	uint16_t queue_id;
+	bool rx;
+	bool enable;
+};
+
+/**
+ * Get queue events from specified port.
+ * If a callback for below event is registered by
+ * rte_eth_dev_callback_register(), this function will describe what was
+ * changed.
+ *  - RTE_ETH_EVENT_QUEUE_STATE_CHANGE
+ * Multiple events may cause only one callback kicking, so call this function
+ * while returning 0.
+ *
+ * @param port_id
+ *  Port id.
+ * @param event
+ *  Pointer to a rte_eth_vhost_queue_event structure.
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_get_queue_event(uint8_t port_id,
+		struct rte_eth_vhost_queue_event *event);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..3280b0d
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,11 @@
+DPDK_2.3 {
+
+	global:
+
+	rte_eth_vhost_feature_disable;
+	rte_eth_vhost_feature_enable;
+	rte_eth_vhost_feature_get;
+	rte_eth_vhost_get_queue_event;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 8ecab41..fca210d 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -159,6 +159,12 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_QAT)        += -lrte_pmd_qat
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -lrte_pmd_aesni_mb
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
+
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
 
 endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [PATCH v9 0/2] Add VHOST PMD
  2016-02-09  9:38                               ` [PATCH v9 0/2] " Tetsuya Mukawa
@ 2016-02-24  2:45                                 ` Qiu, Michael
  2016-02-24  5:09                                   ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Qiu, Michael @ 2016-02-24  2:45 UTC (permalink / raw)
  To: Tetsuya Mukawa, dev, Yigit, Ferruh; +Cc: ann.zhuangyanying, Liu, Yuanhan

Hi,  Tetsuya

When I applied your v6 patch, I could reach 9.5Mpps with 64B packet.

But when apply v9 only 8.4 Mpps, could you figure out why has
performance drop?

Thanks,
Michael
On 2/9/2016 5:38 PM, Tetsuya Mukawa wrote:
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost.
>
>
> PATCH v9 changes:
>  - Fix a null pointer access issue implemented in v8 patch.
>
> PATCH v8 changes:
>  - Manage ether devices list instead of internal structures list.
>  - Remove needless NULL checking.
>  - Replace "pthread_exit" to "return NULL".
>  - Replace rte_panic to RTE_LOG, also add error handling.
>  - Remove duplicated lines.
>  - Remove needless casting.
>  - Follow coding style.
>  - Remove needless parenthesis.
>
> PATCH v7 changes:
>  - Remove needless parenthesis.
>  - Add release note.
>  - Remove needless line wraps.
>  - Add null pointer check in vring_state_changed().
>  - Free queue memory in eth_queue_release().
>  - Fix wrong variable name.
>  - Fix error handling code of eth_dev_vhost_create() and
>    rte_pmd_vhost_devuninit().
>  - Remove needless null checking from rte_pmd_vhost_devinit/devuninit().
>  - Use port id to create mac address.
>  - Add doxygen style comments in "rte_eth_vhost.h".
>  - Fix wrong comment in "mk/rte.app.mk".
>
> PATCH v6 changes:
>  - Remove rte_vhost_driver_pmd_callback_registe().
>  - Support link status interrupt.
>  - Support queue state changed interrupt.
>  - Add rte_eth_vhost_get_queue_event().
>  - Support numa node detection when new device is connected.
>
> PATCH v5 changes:
>  - Rebase on latest master.
>  - Fix RX/TX routine to count RX/TX bytes.
>  - Fix RX/TX routine not to count as error packets if enqueue/dequeue
>    cannot send all packets.
>  - Fix if-condition checking for multiqueues.
>  - Add "static" to pthread variable.
>  - Fix format.
>  - Change default behavior not to receive queueing event from driver.
>  - Split the patch to separate rte_eth_vhost_portid2vdev().
>
> PATCH v4 changes:
>  - Rebase on latest DPDK tree.
>  - Fix cording style.
>  - Fix code not to invoke multiple messaging handling threads.
>  - Fix code to handle vdev parameters correctly.
>  - Remove needless cast.
>  - Remove needless if-condition before rt_free().
>
> PATCH v3 changes:
>  - Rebase on latest matser
>  - Specify correct queue_id in RX/TX function.
>
> PATCH v2 changes:
>  - Remove a below patch that fixes vhost library.
>    The patch was applied as a separate patch.
>    - vhost: fix crash with multiqueue enabled
>  - Fix typos.
>    (Thanks to Thomas, Monjalon)
>  - Rebase on latest tree with above bernard's patches.
>
> PATCH v1 changes:
>  - Support vhost multiple queues.
>  - Rebase on "remove pci driver from vdevs".
>  - Optimize RX/TX functions.
>  - Fix resource leaks.
>  - Fix compile issue.
>  - Add patch to fix vhost library.
>
> RFC PATCH v3 changes:
>  - Optimize performance.
>    In RX/TX functions, change code to access only per core data.
>  - Add below API to allow user to use vhost library APIs for a port managed
>    by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
>     - rte_eth_vhost_portid2vdev()
>    To support this functionality, vhost library is also changed.
>    Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
>  - Add code to support vhost multiple queues.
>    Actually, multiple queues functionality is not enabled so far.
>
> RFC PATCH v2 changes:
>  - Fix issues reported by checkpatch.pl
>    (Thanks to Stephen Hemminger)
>
>
> Tetsuya Mukawa (2):
>   ethdev: Add a new event type to notify a queue state changed event
>   vhost: Add VHOST PMD
>
>  MAINTAINERS                                 |   4 +
>  config/common_linuxapp                      |   6 +
>  doc/guides/nics/index.rst                   |   1 +
>  doc/guides/rel_notes/release_2_3.rst        |   4 +
>  drivers/net/Makefile                        |   4 +
>  drivers/net/vhost/Makefile                  |  62 ++
>  drivers/net/vhost/rte_eth_vhost.c           | 911 ++++++++++++++++++++++++++++
>  drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
>  drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
>  lib/librte_ether/rte_ethdev.h               |   2 +
>  mk/rte.app.mk                               |   6 +
>  11 files changed, 1120 insertions(+)
>  create mode 100644 drivers/net/vhost/Makefile
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.h
>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
>


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v9 0/2] Add VHOST PMD
  2016-02-24  2:45                                 ` Qiu, Michael
@ 2016-02-24  5:09                                   ` Tetsuya Mukawa
  2016-02-25  7:51                                     ` Qiu, Michael
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-24  5:09 UTC (permalink / raw)
  To: Qiu, Michael, dev; +Cc: ann.zhuangyanying, Liu, Yuanhan

On 2016/02/24 11:45, Qiu, Michael wrote:
> Hi,  Tetsuya
>
> When I applied your v6 patch, I could reach 9.5Mpps with 64B packet.
>
> But when apply v9 only 8.4 Mpps, could you figure out why has
> performance drop?

Hi Michael,

Thanks for checking it.
I tried to re-produce it, but I don't see the drop on my environment.
(My cpu is Xeon E5-2697-v2, and the performances of v6 and v9 patch are
almost 5.9Mpps)
Did you use totally same code except for vhost PMD?

Thanks,
Tetsuya

> Thanks,
> Michael
> On 2/9/2016 5:38 PM, Tetsuya Mukawa wrote:
>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>> of librte_vhost.
>>
>>
>> PATCH v9 changes:
>>  - Fix a null pointer access issue implemented in v8 patch.
>>
>> PATCH v8 changes:
>>  - Manage ether devices list instead of internal structures list.
>>  - Remove needless NULL checking.
>>  - Replace "pthread_exit" to "return NULL".
>>  - Replace rte_panic to RTE_LOG, also add error handling.
>>  - Remove duplicated lines.
>>  - Remove needless casting.
>>  - Follow coding style.
>>  - Remove needless parenthesis.
>>
>> PATCH v7 changes:
>>  - Remove needless parenthesis.
>>  - Add release note.
>>  - Remove needless line wraps.
>>  - Add null pointer check in vring_state_changed().
>>  - Free queue memory in eth_queue_release().
>>  - Fix wrong variable name.
>>  - Fix error handling code of eth_dev_vhost_create() and
>>    rte_pmd_vhost_devuninit().
>>  - Remove needless null checking from rte_pmd_vhost_devinit/devuninit().
>>  - Use port id to create mac address.
>>  - Add doxygen style comments in "rte_eth_vhost.h".
>>  - Fix wrong comment in "mk/rte.app.mk".
>>
>> PATCH v6 changes:
>>  - Remove rte_vhost_driver_pmd_callback_registe().
>>  - Support link status interrupt.
>>  - Support queue state changed interrupt.
>>  - Add rte_eth_vhost_get_queue_event().
>>  - Support numa node detection when new device is connected.
>>
>> PATCH v5 changes:
>>  - Rebase on latest master.
>>  - Fix RX/TX routine to count RX/TX bytes.
>>  - Fix RX/TX routine not to count as error packets if enqueue/dequeue
>>    cannot send all packets.
>>  - Fix if-condition checking for multiqueues.
>>  - Add "static" to pthread variable.
>>  - Fix format.
>>  - Change default behavior not to receive queueing event from driver.
>>  - Split the patch to separate rte_eth_vhost_portid2vdev().
>>
>> PATCH v4 changes:
>>  - Rebase on latest DPDK tree.
>>  - Fix cording style.
>>  - Fix code not to invoke multiple messaging handling threads.
>>  - Fix code to handle vdev parameters correctly.
>>  - Remove needless cast.
>>  - Remove needless if-condition before rt_free().
>>
>> PATCH v3 changes:
>>  - Rebase on latest matser
>>  - Specify correct queue_id in RX/TX function.
>>
>> PATCH v2 changes:
>>  - Remove a below patch that fixes vhost library.
>>    The patch was applied as a separate patch.
>>    - vhost: fix crash with multiqueue enabled
>>  - Fix typos.
>>    (Thanks to Thomas, Monjalon)
>>  - Rebase on latest tree with above bernard's patches.
>>
>> PATCH v1 changes:
>>  - Support vhost multiple queues.
>>  - Rebase on "remove pci driver from vdevs".
>>  - Optimize RX/TX functions.
>>  - Fix resource leaks.
>>  - Fix compile issue.
>>  - Add patch to fix vhost library.
>>
>> RFC PATCH v3 changes:
>>  - Optimize performance.
>>    In RX/TX functions, change code to access only per core data.
>>  - Add below API to allow user to use vhost library APIs for a port managed
>>    by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
>>     - rte_eth_vhost_portid2vdev()
>>    To support this functionality, vhost library is also changed.
>>    Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
>>  - Add code to support vhost multiple queues.
>>    Actually, multiple queues functionality is not enabled so far.
>>
>> RFC PATCH v2 changes:
>>  - Fix issues reported by checkpatch.pl
>>    (Thanks to Stephen Hemminger)
>>
>>
>> Tetsuya Mukawa (2):
>>   ethdev: Add a new event type to notify a queue state changed event
>>   vhost: Add VHOST PMD
>>
>>  MAINTAINERS                                 |   4 +
>>  config/common_linuxapp                      |   6 +
>>  doc/guides/nics/index.rst                   |   1 +
>>  doc/guides/rel_notes/release_2_3.rst        |   4 +
>>  drivers/net/Makefile                        |   4 +
>>  drivers/net/vhost/Makefile                  |  62 ++
>>  drivers/net/vhost/rte_eth_vhost.c           | 911 ++++++++++++++++++++++++++++
>>  drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
>>  drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
>>  lib/librte_ether/rte_ethdev.h               |   2 +
>>  mk/rte.app.mk                               |   6 +
>>  11 files changed, 1120 insertions(+)
>>  create mode 100644 drivers/net/vhost/Makefile
>>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>>  create mode 100644 drivers/net/vhost/rte_eth_vhost.h
>>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
>>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v9 0/2] Add VHOST PMD
  2016-02-24  5:09                                   ` Tetsuya Mukawa
@ 2016-02-25  7:51                                     ` Qiu, Michael
  2016-02-26  4:29                                       ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Qiu, Michael @ 2016-02-25  7:51 UTC (permalink / raw)
  To: Tetsuya Mukawa, dev; +Cc: ann.zhuangyanying, Liu, Yuanhan

On 2/24/2016 1:10 PM, Tetsuya Mukawa wrote:
> On 2016/02/24 11:45, Qiu, Michael wrote:
>> Hi,  Tetsuya
>>
>> When I applied your v6 patch, I could reach 9.5Mpps with 64B packet.
>>
>> But when apply v9 only 8.4 Mpps, could you figure out why has
>> performance drop?
> Hi Michael,
>
> Thanks for checking it.
> I tried to re-produce it, but I don't see the drop on my environment.
> (My cpu is Xeon E5-2697-v2, and the performances of v6 and v9 patch are
> almost 5.9Mpps)
> Did you use totally same code except for vhost PMD?

Yes, totally same code and same platform, only difference is versions of
vhost PMD.

BTW, I have set the frontend mergeable off.

Thanks,
Michael
>
> Thanks,
> Tetsuya
>
>> Thanks,
>> Michael
>> On 2/9/2016 5:38 PM, Tetsuya Mukawa wrote:
>>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>>> of librte_vhost.
>>>
>>>
>>> PATCH v9 changes:
>>>  - Fix a null pointer access issue implemented in v8 patch.
>>>
>>> PATCH v8 changes:
>>>  - Manage ether devices list instead of internal structures list.
>>>  - Remove needless NULL checking.
>>>  - Replace "pthread_exit" to "return NULL".
>>>  - Replace rte_panic to RTE_LOG, also add error handling.
>>>  - Remove duplicated lines.
>>>  - Remove needless casting.
>>>  - Follow coding style.
>>>  - Remove needless parenthesis.
>>>
>>> PATCH v7 changes:
>>>  - Remove needless parenthesis.
>>>  - Add release note.
>>>  - Remove needless line wraps.
>>>  - Add null pointer check in vring_state_changed().
>>>  - Free queue memory in eth_queue_release().
>>>  - Fix wrong variable name.
>>>  - Fix error handling code of eth_dev_vhost_create() and
>>>    rte_pmd_vhost_devuninit().
>>>  - Remove needless null checking from rte_pmd_vhost_devinit/devuninit().
>>>  - Use port id to create mac address.
>>>  - Add doxygen style comments in "rte_eth_vhost.h".
>>>  - Fix wrong comment in "mk/rte.app.mk".
>>>
>>> PATCH v6 changes:
>>>  - Remove rte_vhost_driver_pmd_callback_registe().
>>>  - Support link status interrupt.
>>>  - Support queue state changed interrupt.
>>>  - Add rte_eth_vhost_get_queue_event().
>>>  - Support numa node detection when new device is connected.
>>>
>>> PATCH v5 changes:
>>>  - Rebase on latest master.
>>>  - Fix RX/TX routine to count RX/TX bytes.
>>>  - Fix RX/TX routine not to count as error packets if enqueue/dequeue
>>>    cannot send all packets.
>>>  - Fix if-condition checking for multiqueues.
>>>  - Add "static" to pthread variable.
>>>  - Fix format.
>>>  - Change default behavior not to receive queueing event from driver.
>>>  - Split the patch to separate rte_eth_vhost_portid2vdev().
>>>
>>> PATCH v4 changes:
>>>  - Rebase on latest DPDK tree.
>>>  - Fix cording style.
>>>  - Fix code not to invoke multiple messaging handling threads.
>>>  - Fix code to handle vdev parameters correctly.
>>>  - Remove needless cast.
>>>  - Remove needless if-condition before rt_free().
>>>
>>> PATCH v3 changes:
>>>  - Rebase on latest matser
>>>  - Specify correct queue_id in RX/TX function.
>>>
>>> PATCH v2 changes:
>>>  - Remove a below patch that fixes vhost library.
>>>    The patch was applied as a separate patch.
>>>    - vhost: fix crash with multiqueue enabled
>>>  - Fix typos.
>>>    (Thanks to Thomas, Monjalon)
>>>  - Rebase on latest tree with above bernard's patches.
>>>
>>> PATCH v1 changes:
>>>  - Support vhost multiple queues.
>>>  - Rebase on "remove pci driver from vdevs".
>>>  - Optimize RX/TX functions.
>>>  - Fix resource leaks.
>>>  - Fix compile issue.
>>>  - Add patch to fix vhost library.
>>>
>>> RFC PATCH v3 changes:
>>>  - Optimize performance.
>>>    In RX/TX functions, change code to access only per core data.
>>>  - Add below API to allow user to use vhost library APIs for a port managed
>>>    by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
>>>     - rte_eth_vhost_portid2vdev()
>>>    To support this functionality, vhost library is also changed.
>>>    Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
>>>  - Add code to support vhost multiple queues.
>>>    Actually, multiple queues functionality is not enabled so far.
>>>
>>> RFC PATCH v2 changes:
>>>  - Fix issues reported by checkpatch.pl
>>>    (Thanks to Stephen Hemminger)
>>>
>>>
>>> Tetsuya Mukawa (2):
>>>   ethdev: Add a new event type to notify a queue state changed event
>>>   vhost: Add VHOST PMD
>>>
>>>  MAINTAINERS                                 |   4 +
>>>  config/common_linuxapp                      |   6 +
>>>  doc/guides/nics/index.rst                   |   1 +
>>>  doc/guides/rel_notes/release_2_3.rst        |   4 +
>>>  drivers/net/Makefile                        |   4 +
>>>  drivers/net/vhost/Makefile                  |  62 ++
>>>  drivers/net/vhost/rte_eth_vhost.c           | 911 ++++++++++++++++++++++++++++
>>>  drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
>>>  drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
>>>  lib/librte_ether/rte_ethdev.h               |   2 +
>>>  mk/rte.app.mk                               |   6 +
>>>  11 files changed, 1120 insertions(+)
>>>  create mode 100644 drivers/net/vhost/Makefile
>>>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>>>  create mode 100644 drivers/net/vhost/rte_eth_vhost.h
>>>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
>>>
>


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v9 0/2] Add VHOST PMD
  2016-02-25  7:51                                     ` Qiu, Michael
@ 2016-02-26  4:29                                       ` Tetsuya Mukawa
  2016-02-26  8:35                                         ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-26  4:29 UTC (permalink / raw)
  To: Qiu, Michael, dev; +Cc: ann.zhuangyanying, Liu, Yuanhan

On 2016/02/25 16:51, Qiu, Michael wrote:
> On 2/24/2016 1:10 PM, Tetsuya Mukawa wrote:
>> On 2016/02/24 11:45, Qiu, Michael wrote:
>>> Hi,  Tetsuya
>>>
>>> When I applied your v6 patch, I could reach 9.5Mpps with 64B packet.
>>>
>>> But when apply v9 only 8.4 Mpps, could you figure out why has
>>> performance drop?
>> Hi Michael,
>>
>> Thanks for checking it.
>> I tried to re-produce it, but I don't see the drop on my environment.
>> (My cpu is Xeon E5-2697-v2, and the performances of v6 and v9 patch are
>> almost 5.9Mpps)
>> Did you use totally same code except for vhost PMD?
> Yes, totally same code and same platform, only difference is versions of
> vhost PMD.
>
> BTW, I have set the frontend mergeable off.

I have checked below cases.
 - Case1: Disable mergeable feature in virtio-net PMD.
 - Case2: Disable mergeable feature in virtio-net PMD and use
'--txqflags=0xf01' option to use simple ring deploying.
Both cases,  I still cannot see the drop.

Anyway, I will send a few patch-series to determine the cause of drop.
So, could you please apply them and check the performance to determine
which cause the drop?

Thanks,
Tetsuya

> Thanks,
> Michael
>> Thanks,
>> Tetsuya
>>
>>> Thanks,
>>> Michael
>>> On 2/9/2016 5:38 PM, Tetsuya Mukawa wrote:
>>>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>>>> of librte_vhost.
>>>>
>>>>
>>>> PATCH v9 changes:
>>>>  - Fix a null pointer access issue implemented in v8 patch.
>>>>
>>>> PATCH v8 changes:
>>>>  - Manage ether devices list instead of internal structures list.
>>>>  - Remove needless NULL checking.
>>>>  - Replace "pthread_exit" to "return NULL".
>>>>  - Replace rte_panic to RTE_LOG, also add error handling.
>>>>  - Remove duplicated lines.
>>>>  - Remove needless casting.
>>>>  - Follow coding style.
>>>>  - Remove needless parenthesis.
>>>>
>>>> PATCH v7 changes:
>>>>  - Remove needless parenthesis.
>>>>  - Add release note.
>>>>  - Remove needless line wraps.
>>>>  - Add null pointer check in vring_state_changed().
>>>>  - Free queue memory in eth_queue_release().
>>>>  - Fix wrong variable name.
>>>>  - Fix error handling code of eth_dev_vhost_create() and
>>>>    rte_pmd_vhost_devuninit().
>>>>  - Remove needless null checking from rte_pmd_vhost_devinit/devuninit().
>>>>  - Use port id to create mac address.
>>>>  - Add doxygen style comments in "rte_eth_vhost.h".
>>>>  - Fix wrong comment in "mk/rte.app.mk".
>>>>
>>>> PATCH v6 changes:
>>>>  - Remove rte_vhost_driver_pmd_callback_registe().
>>>>  - Support link status interrupt.
>>>>  - Support queue state changed interrupt.
>>>>  - Add rte_eth_vhost_get_queue_event().
>>>>  - Support numa node detection when new device is connected.
>>>>
>>>> PATCH v5 changes:
>>>>  - Rebase on latest master.
>>>>  - Fix RX/TX routine to count RX/TX bytes.
>>>>  - Fix RX/TX routine not to count as error packets if enqueue/dequeue
>>>>    cannot send all packets.
>>>>  - Fix if-condition checking for multiqueues.
>>>>  - Add "static" to pthread variable.
>>>>  - Fix format.
>>>>  - Change default behavior not to receive queueing event from driver.
>>>>  - Split the patch to separate rte_eth_vhost_portid2vdev().
>>>>
>>>> PATCH v4 changes:
>>>>  - Rebase on latest DPDK tree.
>>>>  - Fix cording style.
>>>>  - Fix code not to invoke multiple messaging handling threads.
>>>>  - Fix code to handle vdev parameters correctly.
>>>>  - Remove needless cast.
>>>>  - Remove needless if-condition before rt_free().
>>>>
>>>> PATCH v3 changes:
>>>>  - Rebase on latest matser
>>>>  - Specify correct queue_id in RX/TX function.
>>>>
>>>> PATCH v2 changes:
>>>>  - Remove a below patch that fixes vhost library.
>>>>    The patch was applied as a separate patch.
>>>>    - vhost: fix crash with multiqueue enabled
>>>>  - Fix typos.
>>>>    (Thanks to Thomas, Monjalon)
>>>>  - Rebase on latest tree with above bernard's patches.
>>>>
>>>> PATCH v1 changes:
>>>>  - Support vhost multiple queues.
>>>>  - Rebase on "remove pci driver from vdevs".
>>>>  - Optimize RX/TX functions.
>>>>  - Fix resource leaks.
>>>>  - Fix compile issue.
>>>>  - Add patch to fix vhost library.
>>>>
>>>> RFC PATCH v3 changes:
>>>>  - Optimize performance.
>>>>    In RX/TX functions, change code to access only per core data.
>>>>  - Add below API to allow user to use vhost library APIs for a port managed
>>>>    by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
>>>>     - rte_eth_vhost_portid2vdev()
>>>>    To support this functionality, vhost library is also changed.
>>>>    Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
>>>>  - Add code to support vhost multiple queues.
>>>>    Actually, multiple queues functionality is not enabled so far.
>>>>
>>>> RFC PATCH v2 changes:
>>>>  - Fix issues reported by checkpatch.pl
>>>>    (Thanks to Stephen Hemminger)
>>>>
>>>>
>>>> Tetsuya Mukawa (2):
>>>>   ethdev: Add a new event type to notify a queue state changed event
>>>>   vhost: Add VHOST PMD
>>>>
>>>>  MAINTAINERS                                 |   4 +
>>>>  config/common_linuxapp                      |   6 +
>>>>  doc/guides/nics/index.rst                   |   1 +
>>>>  doc/guides/rel_notes/release_2_3.rst        |   4 +
>>>>  drivers/net/Makefile                        |   4 +
>>>>  drivers/net/vhost/Makefile                  |  62 ++
>>>>  drivers/net/vhost/rte_eth_vhost.c           | 911 ++++++++++++++++++++++++++++
>>>>  drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
>>>>  drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
>>>>  lib/librte_ether/rte_ethdev.h               |   2 +
>>>>  mk/rte.app.mk                               |   6 +
>>>>  11 files changed, 1120 insertions(+)
>>>>  create mode 100644 drivers/net/vhost/Makefile
>>>>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>>>>  create mode 100644 drivers/net/vhost/rte_eth_vhost.h
>>>>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
>>>>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v9 0/2] Add VHOST PMD
  2016-02-26  4:29                                       ` Tetsuya Mukawa
@ 2016-02-26  8:35                                         ` Tetsuya Mukawa
  2016-03-01  2:00                                           ` Qiu, Michael
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-02-26  8:35 UTC (permalink / raw)
  To: Qiu, Michael, dev; +Cc: ann.zhuangyanying, Liu, Yuanhan

On 2016/02/26 13:29, Tetsuya Mukawa wrote:
> On 2016/02/25 16:51, Qiu, Michael wrote:
>> On 2/24/2016 1:10 PM, Tetsuya Mukawa wrote:
>>> On 2016/02/24 11:45, Qiu, Michael wrote:
>>>> Hi,  Tetsuya
>>>>
>>>> When I applied your v6 patch, I could reach 9.5Mpps with 64B packet.
>>>>
>>>> But when apply v9 only 8.4 Mpps, could you figure out why has
>>>> performance drop?
>>> Hi Michael,
>>>
>>> Thanks for checking it.
>>> I tried to re-produce it, but I don't see the drop on my environment.
>>> (My cpu is Xeon E5-2697-v2, and the performances of v6 and v9 patch are
>>> almost 5.9Mpps)
>>> Did you use totally same code except for vhost PMD?
>> Yes, totally same code and same platform, only difference is versions of
>> vhost PMD.
>>
>> BTW, I have set the frontend mergeable off.
> I have checked below cases.
>  - Case1: Disable mergeable feature in virtio-net PMD.
>  - Case2: Disable mergeable feature in virtio-net PMD and use
> '--txqflags=0xf01' option to use simple ring deploying.
> Both cases,  I still cannot see the drop.
>
> Anyway, I will send a few patch-series to determine the cause of drop.
> So, could you please apply them and check the performance to determine
> which cause the drop?

Hi Michael,

I may find what causes the drop.
Could you please restart testpmd on guest when you see the drop, then
check performance again?

I guess the drop will occur only first time when testpmd on guest and
host is connected.
Here are rough steps.

1. Start testpmd on host
2. Start QEMU
3. Start testpmd on guest

Then you will see the drop.
Probably, if testpmd on guest is restarted, then you don't see the drop
again.

4. Type 'quit' on guest.
5. Start testpmd on guest again.

If so, I guess the drop is caused by queue notifying.
Could you please let me know whether your issue is above case?

Thanks,
Tetsuya

> Thanks,
> Tetsuya
>
>> Thanks,
>> Michael
>>> Thanks,
>>> Tetsuya
>>>
>>>> Thanks,
>>>> Michael
>>>> On 2/9/2016 5:38 PM, Tetsuya Mukawa wrote:
>>>>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>>>>> of librte_vhost.
>>>>>
>>>>>
>>>>> PATCH v9 changes:
>>>>>  - Fix a null pointer access issue implemented in v8 patch.
>>>>>
>>>>> PATCH v8 changes:
>>>>>  - Manage ether devices list instead of internal structures list.
>>>>>  - Remove needless NULL checking.
>>>>>  - Replace "pthread_exit" to "return NULL".
>>>>>  - Replace rte_panic to RTE_LOG, also add error handling.
>>>>>  - Remove duplicated lines.
>>>>>  - Remove needless casting.
>>>>>  - Follow coding style.
>>>>>  - Remove needless parenthesis.
>>>>>
>>>>> PATCH v7 changes:
>>>>>  - Remove needless parenthesis.
>>>>>  - Add release note.
>>>>>  - Remove needless line wraps.
>>>>>  - Add null pointer check in vring_state_changed().
>>>>>  - Free queue memory in eth_queue_release().
>>>>>  - Fix wrong variable name.
>>>>>  - Fix error handling code of eth_dev_vhost_create() and
>>>>>    rte_pmd_vhost_devuninit().
>>>>>  - Remove needless null checking from rte_pmd_vhost_devinit/devuninit().
>>>>>  - Use port id to create mac address.
>>>>>  - Add doxygen style comments in "rte_eth_vhost.h".
>>>>>  - Fix wrong comment in "mk/rte.app.mk".
>>>>>
>>>>> PATCH v6 changes:
>>>>>  - Remove rte_vhost_driver_pmd_callback_registe().
>>>>>  - Support link status interrupt.
>>>>>  - Support queue state changed interrupt.
>>>>>  - Add rte_eth_vhost_get_queue_event().
>>>>>  - Support numa node detection when new device is connected.
>>>>>
>>>>> PATCH v5 changes:
>>>>>  - Rebase on latest master.
>>>>>  - Fix RX/TX routine to count RX/TX bytes.
>>>>>  - Fix RX/TX routine not to count as error packets if enqueue/dequeue
>>>>>    cannot send all packets.
>>>>>  - Fix if-condition checking for multiqueues.
>>>>>  - Add "static" to pthread variable.
>>>>>  - Fix format.
>>>>>  - Change default behavior not to receive queueing event from driver.
>>>>>  - Split the patch to separate rte_eth_vhost_portid2vdev().
>>>>>
>>>>> PATCH v4 changes:
>>>>>  - Rebase on latest DPDK tree.
>>>>>  - Fix cording style.
>>>>>  - Fix code not to invoke multiple messaging handling threads.
>>>>>  - Fix code to handle vdev parameters correctly.
>>>>>  - Remove needless cast.
>>>>>  - Remove needless if-condition before rt_free().
>>>>>
>>>>> PATCH v3 changes:
>>>>>  - Rebase on latest matser
>>>>>  - Specify correct queue_id in RX/TX function.
>>>>>
>>>>> PATCH v2 changes:
>>>>>  - Remove a below patch that fixes vhost library.
>>>>>    The patch was applied as a separate patch.
>>>>>    - vhost: fix crash with multiqueue enabled
>>>>>  - Fix typos.
>>>>>    (Thanks to Thomas, Monjalon)
>>>>>  - Rebase on latest tree with above bernard's patches.
>>>>>
>>>>> PATCH v1 changes:
>>>>>  - Support vhost multiple queues.
>>>>>  - Rebase on "remove pci driver from vdevs".
>>>>>  - Optimize RX/TX functions.
>>>>>  - Fix resource leaks.
>>>>>  - Fix compile issue.
>>>>>  - Add patch to fix vhost library.
>>>>>
>>>>> RFC PATCH v3 changes:
>>>>>  - Optimize performance.
>>>>>    In RX/TX functions, change code to access only per core data.
>>>>>  - Add below API to allow user to use vhost library APIs for a port managed
>>>>>    by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
>>>>>     - rte_eth_vhost_portid2vdev()
>>>>>    To support this functionality, vhost library is also changed.
>>>>>    Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
>>>>>  - Add code to support vhost multiple queues.
>>>>>    Actually, multiple queues functionality is not enabled so far.
>>>>>
>>>>> RFC PATCH v2 changes:
>>>>>  - Fix issues reported by checkpatch.pl
>>>>>    (Thanks to Stephen Hemminger)
>>>>>
>>>>>
>>>>> Tetsuya Mukawa (2):
>>>>>   ethdev: Add a new event type to notify a queue state changed event
>>>>>   vhost: Add VHOST PMD
>>>>>
>>>>>  MAINTAINERS                                 |   4 +
>>>>>  config/common_linuxapp                      |   6 +
>>>>>  doc/guides/nics/index.rst                   |   1 +
>>>>>  doc/guides/rel_notes/release_2_3.rst        |   4 +
>>>>>  drivers/net/Makefile                        |   4 +
>>>>>  drivers/net/vhost/Makefile                  |  62 ++
>>>>>  drivers/net/vhost/rte_eth_vhost.c           | 911 ++++++++++++++++++++++++++++
>>>>>  drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
>>>>>  drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
>>>>>  lib/librte_ether/rte_ethdev.h               |   2 +
>>>>>  mk/rte.app.mk                               |   6 +
>>>>>  11 files changed, 1120 insertions(+)
>>>>>  create mode 100644 drivers/net/vhost/Makefile
>>>>>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>>>>>  create mode 100644 drivers/net/vhost/rte_eth_vhost.h
>>>>>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
>>>>>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v9 0/2] Add VHOST PMD
  2016-02-26  8:35                                         ` Tetsuya Mukawa
@ 2016-03-01  2:00                                           ` Qiu, Michael
  2016-03-01  2:19                                             ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Qiu, Michael @ 2016-03-01  2:00 UTC (permalink / raw)
  To: Tetsuya Mukawa, dev; +Cc: ann.zhuangyanying, Liu, Yuanhan

On 2/26/2016 4:36 PM, Tetsuya Mukawa wrote:
> On 2016/02/26 13:29, Tetsuya Mukawa wrote:
>> On 2016/02/25 16:51, Qiu, Michael wrote:
>>> On 2/24/2016 1:10 PM, Tetsuya Mukawa wrote:
>>>> On 2016/02/24 11:45, Qiu, Michael wrote:
>>>>> Hi,  Tetsuya
>>>>>
>>>>> When I applied your v6 patch, I could reach 9.5Mpps with 64B packet.
>>>>>
>>>>> But when apply v9 only 8.4 Mpps, could you figure out why has
>>>>> performance drop?
>>>> Hi Michael,
>>>>
>>>> Thanks for checking it.
>>>> I tried to re-produce it, but I don't see the drop on my environment.
>>>> (My cpu is Xeon E5-2697-v2, and the performances of v6 and v9 patch are
>>>> almost 5.9Mpps)
>>>> Did you use totally same code except for vhost PMD?
>>> Yes, totally same code and same platform, only difference is versions of
>>> vhost PMD.
>>>
>>> BTW, I have set the frontend mergeable off.
>> I have checked below cases.
>>  - Case1: Disable mergeable feature in virtio-net PMD.
>>  - Case2: Disable mergeable feature in virtio-net PMD and use
>> '--txqflags=0xf01' option to use simple ring deploying.
>> Both cases,  I still cannot see the drop.
>>
>> Anyway, I will send a few patch-series to determine the cause of drop.
>> So, could you please apply them and check the performance to determine
>> which cause the drop?
> Hi Michael,
>
> I may find what causes the drop.
> Could you please restart testpmd on guest when you see the drop, then
> check performance again?
>
> I guess the drop will occur only first time when testpmd on guest and
> host is connected.
> Here are rough steps.
>
> 1. Start testpmd on host
> 2. Start QEMU
> 3. Start testpmd on guest
>
> Then you will see the drop.
> Probably, if testpmd on guest is restarted, then you don't see the drop
> again.
>
> 4. Type 'quit' on guest.
> 5. Start testpmd on guest again.

OK, I will help to tested today.

Thanks,
Michael
> If so, I guess the drop is caused by queue notifying.
> Could you please let me know whether your issue is above case?
>
> Thanks,
> Tetsuya
>
>> Thanks,
>> Tetsuya
>>
>>> Thanks,
>>> Michael
>>>> Thanks,
>>>> Tetsuya
>>>>
>>>>> Thanks,
>>>>> Michael
>>>>> On 2/9/2016 5:38 PM, Tetsuya Mukawa wrote:
>>>>>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>>>>>> of librte_vhost.
>>>>>>
>>>>>>
>>>>>> PATCH v9 changes:
>>>>>>  - Fix a null pointer access issue implemented in v8 patch.
>>>>>>
>>>>>> PATCH v8 changes:
>>>>>>  - Manage ether devices list instead of internal structures list.
>>>>>>  - Remove needless NULL checking.
>>>>>>  - Replace "pthread_exit" to "return NULL".
>>>>>>  - Replace rte_panic to RTE_LOG, also add error handling.
>>>>>>  - Remove duplicated lines.
>>>>>>  - Remove needless casting.
>>>>>>  - Follow coding style.
>>>>>>  - Remove needless parenthesis.
>>>>>>
>>>>>> PATCH v7 changes:
>>>>>>  - Remove needless parenthesis.
>>>>>>  - Add release note.
>>>>>>  - Remove needless line wraps.
>>>>>>  - Add null pointer check in vring_state_changed().
>>>>>>  - Free queue memory in eth_queue_release().
>>>>>>  - Fix wrong variable name.
>>>>>>  - Fix error handling code of eth_dev_vhost_create() and
>>>>>>    rte_pmd_vhost_devuninit().
>>>>>>  - Remove needless null checking from rte_pmd_vhost_devinit/devuninit().
>>>>>>  - Use port id to create mac address.
>>>>>>  - Add doxygen style comments in "rte_eth_vhost.h".
>>>>>>  - Fix wrong comment in "mk/rte.app.mk".
>>>>>>
>>>>>> PATCH v6 changes:
>>>>>>  - Remove rte_vhost_driver_pmd_callback_registe().
>>>>>>  - Support link status interrupt.
>>>>>>  - Support queue state changed interrupt.
>>>>>>  - Add rte_eth_vhost_get_queue_event().
>>>>>>  - Support numa node detection when new device is connected.
>>>>>>
>>>>>> PATCH v5 changes:
>>>>>>  - Rebase on latest master.
>>>>>>  - Fix RX/TX routine to count RX/TX bytes.
>>>>>>  - Fix RX/TX routine not to count as error packets if enqueue/dequeue
>>>>>>    cannot send all packets.
>>>>>>  - Fix if-condition checking for multiqueues.
>>>>>>  - Add "static" to pthread variable.
>>>>>>  - Fix format.
>>>>>>  - Change default behavior not to receive queueing event from driver.
>>>>>>  - Split the patch to separate rte_eth_vhost_portid2vdev().
>>>>>>
>>>>>> PATCH v4 changes:
>>>>>>  - Rebase on latest DPDK tree.
>>>>>>  - Fix cording style.
>>>>>>  - Fix code not to invoke multiple messaging handling threads.
>>>>>>  - Fix code to handle vdev parameters correctly.
>>>>>>  - Remove needless cast.
>>>>>>  - Remove needless if-condition before rt_free().
>>>>>>
>>>>>> PATCH v3 changes:
>>>>>>  - Rebase on latest matser
>>>>>>  - Specify correct queue_id in RX/TX function.
>>>>>>
>>>>>> PATCH v2 changes:
>>>>>>  - Remove a below patch that fixes vhost library.
>>>>>>    The patch was applied as a separate patch.
>>>>>>    - vhost: fix crash with multiqueue enabled
>>>>>>  - Fix typos.
>>>>>>    (Thanks to Thomas, Monjalon)
>>>>>>  - Rebase on latest tree with above bernard's patches.
>>>>>>
>>>>>> PATCH v1 changes:
>>>>>>  - Support vhost multiple queues.
>>>>>>  - Rebase on "remove pci driver from vdevs".
>>>>>>  - Optimize RX/TX functions.
>>>>>>  - Fix resource leaks.
>>>>>>  - Fix compile issue.
>>>>>>  - Add patch to fix vhost library.
>>>>>>
>>>>>> RFC PATCH v3 changes:
>>>>>>  - Optimize performance.
>>>>>>    In RX/TX functions, change code to access only per core data.
>>>>>>  - Add below API to allow user to use vhost library APIs for a port managed
>>>>>>    by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
>>>>>>     - rte_eth_vhost_portid2vdev()
>>>>>>    To support this functionality, vhost library is also changed.
>>>>>>    Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
>>>>>>  - Add code to support vhost multiple queues.
>>>>>>    Actually, multiple queues functionality is not enabled so far.
>>>>>>
>>>>>> RFC PATCH v2 changes:
>>>>>>  - Fix issues reported by checkpatch.pl
>>>>>>    (Thanks to Stephen Hemminger)
>>>>>>
>>>>>>
>>>>>> Tetsuya Mukawa (2):
>>>>>>   ethdev: Add a new event type to notify a queue state changed event
>>>>>>   vhost: Add VHOST PMD
>>>>>>
>>>>>>  MAINTAINERS                                 |   4 +
>>>>>>  config/common_linuxapp                      |   6 +
>>>>>>  doc/guides/nics/index.rst                   |   1 +
>>>>>>  doc/guides/rel_notes/release_2_3.rst        |   4 +
>>>>>>  drivers/net/Makefile                        |   4 +
>>>>>>  drivers/net/vhost/Makefile                  |  62 ++
>>>>>>  drivers/net/vhost/rte_eth_vhost.c           | 911 ++++++++++++++++++++++++++++
>>>>>>  drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
>>>>>>  drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
>>>>>>  lib/librte_ether/rte_ethdev.h               |   2 +
>>>>>>  mk/rte.app.mk                               |   6 +
>>>>>>  11 files changed, 1120 insertions(+)
>>>>>>  create mode 100644 drivers/net/vhost/Makefile
>>>>>>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>>>>>>  create mode 100644 drivers/net/vhost/rte_eth_vhost.h
>>>>>>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
>>>>>>
>


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v9 0/2] Add VHOST PMD
  2016-03-01  2:00                                           ` Qiu, Michael
@ 2016-03-01  2:19                                             ` Tetsuya Mukawa
  2016-03-02  2:24                                               ` Qiu, Michael
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-01  2:19 UTC (permalink / raw)
  To: Qiu, Michael, dev; +Cc: ann.zhuangyanying, Liu, Yuanhan

On 2016/03/01 11:00, Qiu, Michael wrote:
> On 2/26/2016 4:36 PM, Tetsuya Mukawa wrote:
>> On 2016/02/26 13:29, Tetsuya Mukawa wrote:
>>> On 2016/02/25 16:51, Qiu, Michael wrote:
>>>> On 2/24/2016 1:10 PM, Tetsuya Mukawa wrote:
>>>>> On 2016/02/24 11:45, Qiu, Michael wrote:
>>>>>> Hi,  Tetsuya
>>>>>>
>>>>>> When I applied your v6 patch, I could reach 9.5Mpps with 64B packet.
>>>>>>
>>>>>> But when apply v9 only 8.4 Mpps, could you figure out why has
>>>>>> performance drop?
>>>>> Hi Michael,
>>>>>
>>>>> Thanks for checking it.
>>>>> I tried to re-produce it, but I don't see the drop on my environment.
>>>>> (My cpu is Xeon E5-2697-v2, and the performances of v6 and v9 patch are
>>>>> almost 5.9Mpps)
>>>>> Did you use totally same code except for vhost PMD?
>>>> Yes, totally same code and same platform, only difference is versions of
>>>> vhost PMD.
>>>>
>>>> BTW, I have set the frontend mergeable off.
>>> I have checked below cases.
>>>  - Case1: Disable mergeable feature in virtio-net PMD.
>>>  - Case2: Disable mergeable feature in virtio-net PMD and use
>>> '--txqflags=0xf01' option to use simple ring deploying.
>>> Both cases,  I still cannot see the drop.
>>>
>>> Anyway, I will send a few patch-series to determine the cause of drop.
>>> So, could you please apply them and check the performance to determine
>>> which cause the drop?
>> Hi Michael,
>>
>> I may find what causes the drop.
>> Could you please restart testpmd on guest when you see the drop, then
>> check performance again?
>>
>> I guess the drop will occur only first time when testpmd on guest and
>> host is connected.
>> Here are rough steps.
>>
>> 1. Start testpmd on host
>> 2. Start QEMU
>> 3. Start testpmd on guest
>>
>> Then you will see the drop.
>> Probably, if testpmd on guest is restarted, then you don't see the drop
>> again.
>>
>> 4. Type 'quit' on guest.
>> 5. Start testpmd on guest again.

Hi Michael,

I am sorry that above was caused by my miss configuration.
So please ignore it.
If you can have time today, could you please check v7 and v8 performance?

Thanks,
Tetsuya

> OK, I will help to tested today.
>
> Thanks,
> Michael
>> If so, I guess the drop is caused by queue notifying.
>> Could you please let me know whether your issue is above case?
>>
>> Thanks,
>> Tetsuya
>>
>>> Thanks,
>>> Tetsuya
>>>
>>>> Thanks,
>>>> Michael
>>>>> Thanks,
>>>>> Tetsuya
>>>>>
>>>>>> Thanks,
>>>>>> Michael
>>>>>> On 2/9/2016 5:38 PM, Tetsuya Mukawa wrote:
>>>>>>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>>>>>>> of librte_vhost.
>>>>>>>
>>>>>>>
>>>>>>> PATCH v9 changes:
>>>>>>>  - Fix a null pointer access issue implemented in v8 patch.
>>>>>>>
>>>>>>> PATCH v8 changes:
>>>>>>>  - Manage ether devices list instead of internal structures list.
>>>>>>>  - Remove needless NULL checking.
>>>>>>>  - Replace "pthread_exit" to "return NULL".
>>>>>>>  - Replace rte_panic to RTE_LOG, also add error handling.
>>>>>>>  - Remove duplicated lines.
>>>>>>>  - Remove needless casting.
>>>>>>>  - Follow coding style.
>>>>>>>  - Remove needless parenthesis.
>>>>>>>
>>>>>>> PATCH v7 changes:
>>>>>>>  - Remove needless parenthesis.
>>>>>>>  - Add release note.
>>>>>>>  - Remove needless line wraps.
>>>>>>>  - Add null pointer check in vring_state_changed().
>>>>>>>  - Free queue memory in eth_queue_release().
>>>>>>>  - Fix wrong variable name.
>>>>>>>  - Fix error handling code of eth_dev_vhost_create() and
>>>>>>>    rte_pmd_vhost_devuninit().
>>>>>>>  - Remove needless null checking from rte_pmd_vhost_devinit/devuninit().
>>>>>>>  - Use port id to create mac address.
>>>>>>>  - Add doxygen style comments in "rte_eth_vhost.h".
>>>>>>>  - Fix wrong comment in "mk/rte.app.mk".
>>>>>>>
>>>>>>> PATCH v6 changes:
>>>>>>>  - Remove rte_vhost_driver_pmd_callback_registe().
>>>>>>>  - Support link status interrupt.
>>>>>>>  - Support queue state changed interrupt.
>>>>>>>  - Add rte_eth_vhost_get_queue_event().
>>>>>>>  - Support numa node detection when new device is connected.
>>>>>>>
>>>>>>> PATCH v5 changes:
>>>>>>>  - Rebase on latest master.
>>>>>>>  - Fix RX/TX routine to count RX/TX bytes.
>>>>>>>  - Fix RX/TX routine not to count as error packets if enqueue/dequeue
>>>>>>>    cannot send all packets.
>>>>>>>  - Fix if-condition checking for multiqueues.
>>>>>>>  - Add "static" to pthread variable.
>>>>>>>  - Fix format.
>>>>>>>  - Change default behavior not to receive queueing event from driver.
>>>>>>>  - Split the patch to separate rte_eth_vhost_portid2vdev().
>>>>>>>
>>>>>>> PATCH v4 changes:
>>>>>>>  - Rebase on latest DPDK tree.
>>>>>>>  - Fix cording style.
>>>>>>>  - Fix code not to invoke multiple messaging handling threads.
>>>>>>>  - Fix code to handle vdev parameters correctly.
>>>>>>>  - Remove needless cast.
>>>>>>>  - Remove needless if-condition before rt_free().
>>>>>>>
>>>>>>> PATCH v3 changes:
>>>>>>>  - Rebase on latest matser
>>>>>>>  - Specify correct queue_id in RX/TX function.
>>>>>>>
>>>>>>> PATCH v2 changes:
>>>>>>>  - Remove a below patch that fixes vhost library.
>>>>>>>    The patch was applied as a separate patch.
>>>>>>>    - vhost: fix crash with multiqueue enabled
>>>>>>>  - Fix typos.
>>>>>>>    (Thanks to Thomas, Monjalon)
>>>>>>>  - Rebase on latest tree with above bernard's patches.
>>>>>>>
>>>>>>> PATCH v1 changes:
>>>>>>>  - Support vhost multiple queues.
>>>>>>>  - Rebase on "remove pci driver from vdevs".
>>>>>>>  - Optimize RX/TX functions.
>>>>>>>  - Fix resource leaks.
>>>>>>>  - Fix compile issue.
>>>>>>>  - Add patch to fix vhost library.
>>>>>>>
>>>>>>> RFC PATCH v3 changes:
>>>>>>>  - Optimize performance.
>>>>>>>    In RX/TX functions, change code to access only per core data.
>>>>>>>  - Add below API to allow user to use vhost library APIs for a port managed
>>>>>>>    by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
>>>>>>>     - rte_eth_vhost_portid2vdev()
>>>>>>>    To support this functionality, vhost library is also changed.
>>>>>>>    Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
>>>>>>>  - Add code to support vhost multiple queues.
>>>>>>>    Actually, multiple queues functionality is not enabled so far.
>>>>>>>
>>>>>>> RFC PATCH v2 changes:
>>>>>>>  - Fix issues reported by checkpatch.pl
>>>>>>>    (Thanks to Stephen Hemminger)
>>>>>>>
>>>>>>>
>>>>>>> Tetsuya Mukawa (2):
>>>>>>>   ethdev: Add a new event type to notify a queue state changed event
>>>>>>>   vhost: Add VHOST PMD
>>>>>>>
>>>>>>>  MAINTAINERS                                 |   4 +
>>>>>>>  config/common_linuxapp                      |   6 +
>>>>>>>  doc/guides/nics/index.rst                   |   1 +
>>>>>>>  doc/guides/rel_notes/release_2_3.rst        |   4 +
>>>>>>>  drivers/net/Makefile                        |   4 +
>>>>>>>  drivers/net/vhost/Makefile                  |  62 ++
>>>>>>>  drivers/net/vhost/rte_eth_vhost.c           | 911 ++++++++++++++++++++++++++++
>>>>>>>  drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
>>>>>>>  drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
>>>>>>>  lib/librte_ether/rte_ethdev.h               |   2 +
>>>>>>>  mk/rte.app.mk                               |   6 +
>>>>>>>  11 files changed, 1120 insertions(+)
>>>>>>>  create mode 100644 drivers/net/vhost/Makefile
>>>>>>>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>>>>>>>  create mode 100644 drivers/net/vhost/rte_eth_vhost.h
>>>>>>>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
>>>>>>>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v9 0/2] Add VHOST PMD
  2016-03-01  2:19                                             ` Tetsuya Mukawa
@ 2016-03-02  2:24                                               ` Qiu, Michael
  2016-03-04  1:12                                                 ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Qiu, Michael @ 2016-03-02  2:24 UTC (permalink / raw)
  To: Tetsuya Mukawa, dev; +Cc: ann.zhuangyanying, Liu, Yuanhan

On 3/1/2016 10:19 AM, Tetsuya Mukawa wrote:
> On 2016/03/01 11:00, Qiu, Michael wrote:
>> On 2/26/2016 4:36 PM, Tetsuya Mukawa wrote:
>>> On 2016/02/26 13:29, Tetsuya Mukawa wrote:
>>>>

[...]

>>>>
>>>> BTW, I have set the frontend mergeable off.
>>>> I have checked below cases.
>>>>  - Case1: Disable mergeable feature in virtio-net PMD.
>>>>  - Case2: Disable mergeable feature in virtio-net PMD and use
>>>> '--txqflags=0xf01' option to use simple ring deploying.
>>>> Both cases,  I still cannot see the drop.
>>>>
>>>> Anyway, I will send a few patch-series to determine the cause of drop.
>>>> So, could you please apply them and check the performance to determine
>>>> which cause the drop?
>>> Hi Michael,
>>>
>>> I may find what causes the drop.
>>> Could you please restart testpmd on guest when you see the drop, then
>>> check performance again?
>>>
>>> I guess the drop will occur only first time when testpmd on guest and
>>> host is connected.
>>> Here are rough steps.
>>>
>>> 1. Start testpmd on host
>>> 2. Start QEMU
>>> 3. Start testpmd on guest
>>>
>>> Then you will see the drop.
>>> Probably, if testpmd on guest is restarted, then you don't see the drop
>>> again.
>>>
>>> 4. Type 'quit' on guest.
>>> 5. Start testpmd on guest again.
> Hi Michael,
>
> I am sorry that above was caused by my miss configuration.
> So please ignore it.
> If you can have time today, could you please check v7 and v8 performance?

Hi, Tetsuya

I have tried the qemu case but seems it does not have any difference,
maybe my configuration is wrong.

What I used to test is container case from Jianfeng.  And I make a
mistake that V6 compiled by GCC 5.3, but V9 with GCC 4.8, after using
the same compiler, the performance almost the same.

Thanks,
Michael


> Thanks,
> Tetsuya
>
>> OK, I will help to tested today.
>>
>> Thanks,
>> Michael
>


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v9 0/2] Add VHOST PMD
  2016-03-02  2:24                                               ` Qiu, Michael
@ 2016-03-04  1:12                                                 ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-04  1:12 UTC (permalink / raw)
  To: Qiu, Michael, dev; +Cc: ann.zhuangyanying, Liu, Yuanhan

On 2016/03/02 11:24, Qiu, Michael wrote:
> On 3/1/2016 10:19 AM, Tetsuya Mukawa wrote:
>> On 2016/03/01 11:00, Qiu, Michael wrote:
>>> On 2/26/2016 4:36 PM, Tetsuya Mukawa wrote:
>>>> On 2016/02/26 13:29, Tetsuya Mukawa wrote:
> [...]
>
>>>>> BTW, I have set the frontend mergeable off.
>>>>> I have checked below cases.
>>>>>  - Case1: Disable mergeable feature in virtio-net PMD.
>>>>>  - Case2: Disable mergeable feature in virtio-net PMD and use
>>>>> '--txqflags=0xf01' option to use simple ring deploying.
>>>>> Both cases,  I still cannot see the drop.
>>>>>
>>>>> Anyway, I will send a few patch-series to determine the cause of drop.
>>>>> So, could you please apply them and check the performance to determine
>>>>> which cause the drop?
>>>> Hi Michael,
>>>>
>>>> I may find what causes the drop.
>>>> Could you please restart testpmd on guest when you see the drop, then
>>>> check performance again?
>>>>
>>>> I guess the drop will occur only first time when testpmd on guest and
>>>> host is connected.
>>>> Here are rough steps.
>>>>
>>>> 1. Start testpmd on host
>>>> 2. Start QEMU
>>>> 3. Start testpmd on guest
>>>>
>>>> Then you will see the drop.
>>>> Probably, if testpmd on guest is restarted, then you don't see the drop
>>>> again.
>>>>
>>>> 4. Type 'quit' on guest.
>>>> 5. Start testpmd on guest again.
>> Hi Michael,
>>
>> I am sorry that above was caused by my miss configuration.
>> So please ignore it.
>> If you can have time today, could you please check v7 and v8 performance?
> Hi, Tetsuya
>
> I have tried the qemu case but seems it does not have any difference,
> maybe my configuration is wrong.
>
> What I used to test is container case from Jianfeng.  And I make a
> mistake that V6 compiled by GCC 5.3, but V9 with GCC 4.8, after using
> the same compiler, the performance almost the same.


Hi Michael,

Sorry for late replying. And thanks for your checking.

Thanks,
Tetsuya

> Thanks,
> Michael
>
>
>> Thanks,
>> Tetsuya
>>
>>> OK, I will help to tested today.
>>>
>>> Thanks,
>>> Michael

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v10 0/2] Add VHOST PMD
  2016-02-09  9:38                               ` [PATCH v9 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
@ 2016-03-04  4:17                                 ` Tetsuya Mukawa
  2016-03-04  4:17                                 ` [PATCH v10 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
  2016-03-04  4:17                                 ` [PATCH v10 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  2 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-04  4:17 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying, yuanhan.liu

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.


PATCH v10 changes:
 - Rebase on latest master.
 - Fix DPDK version number(2.3 to 16.04)
 - Set port id to mbuf while receiving packets.

PATCH v9 changes:
 - Fix a null pointer access issue implemented in v8 patch.

PATCH v8 changes:
 - Manage ether devices list instead of internal structures list.
 - Remove needless NULL checking.
 - Replace "pthread_exit" to "return NULL".
 - Replace rte_panic to RTE_LOG, also add error handling.
 - Remove duplicated lines.
 - Remove needless casting.
 - Follow coding style.
 - Remove needless parenthesis.

PATCH v7 changes:
 - Remove needless parenthesis.
 - Add release note.
 - Remove needless line wraps.
 - Add null pointer check in vring_state_changed().
 - Free queue memory in eth_queue_release().
 - Fix wrong variable name.
 - Fix error handling code of eth_dev_vhost_create() and
   rte_pmd_vhost_devuninit().
 - Remove needless null checking from rte_pmd_vhost_devinit/devuninit().
 - Use port id to create mac address.
 - Add doxygen style comments in "rte_eth_vhost.h".
 - Fix wrong comment in "mk/rte.app.mk".

PATCH v6 changes:
 - Remove rte_vhost_driver_pmd_callback_registe().
 - Support link status interrupt.
 - Support queue state changed interrupt.
 - Add rte_eth_vhost_get_queue_event().
 - Support numa node detection when new device is connected.

PATCH v5 changes:
 - Rebase on latest master.
 - Fix RX/TX routine to count RX/TX bytes.
 - Fix RX/TX routine not to count as error packets if enqueue/dequeue
   cannot send all packets.
 - Fix if-condition checking for multiqueues.
 - Add "static" to pthread variable.
 - Fix format.
 - Change default behavior not to receive queueing event from driver.
 - Split the patch to separate rte_eth_vhost_portid2vdev().

PATCH v4 changes:
 - Rebase on latest DPDK tree.
 - Fix cording style.
 - Fix code not to invoke multiple messaging handling threads.
 - Fix code to handle vdev parameters correctly.
 - Remove needless cast.
 - Remove needless if-condition before rt_free().

PATCH v3 changes:
 - Rebase on latest matser
 - Specify correct queue_id in RX/TX function.

PATCH v2 changes:
 - Remove a below patch that fixes vhost library.
   The patch was applied as a separate patch.
   - vhost: fix crash with multiqueue enabled
 - Fix typos.
   (Thanks to Thomas, Monjalon)
 - Rebase on latest tree with above bernard's patches.

PATCH v1 changes:
 - Support vhost multiple queues.
 - Rebase on "remove pci driver from vdevs".
 - Optimize RX/TX functions.
 - Fix resource leaks.
 - Fix compile issue.
 - Add patch to fix vhost library.

RFC PATCH v3 changes:
 - Optimize performance.
   In RX/TX functions, change code to access only per core data.
 - Add below API to allow user to use vhost library APIs for a port managed
   by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
    - rte_eth_vhost_portid2vdev()
   To support this functionality, vhost library is also changed.
   Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
 - Add code to support vhost multiple queues.
   Actually, multiple queues functionality is not enabled so far.

RFC PATCH v2 changes:
 - Fix issues reported by checkpatch.pl
   (Thanks to Stephen Hemminger)



Tetsuya Mukawa (2):
  ethdev: Add a new event type to notify a queue state changed event
  vhost: Add VHOST PMD

 MAINTAINERS                                 |   4 +
 config/common_linuxapp                      |   6 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/rel_notes/release_16_04.rst      |   4 +
 drivers/net/Makefile                        |   1 +
 drivers/net/vhost/Makefile                  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c           | 916 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
 drivers/net/vhost/rte_pmd_vhost_version.map |  10 +
 lib/librte_ether/rte_ethdev.h               |   2 +
 mk/rte.app.mk                               |   6 +
 11 files changed, 1121 insertions(+)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

-- 
2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v10 1/2] ethdev: Add a new event type to notify a queue state changed event
  2016-02-09  9:38                               ` [PATCH v9 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
  2016-03-04  4:17                                 ` [PATCH v10 0/2] Add VHOST PMD Tetsuya Mukawa
@ 2016-03-04  4:17                                 ` Tetsuya Mukawa
  2016-03-04  4:17                                 ` [PATCH v10 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  2 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-04  4:17 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying, yuanhan.liu

This patch adds a below event type.
 - RTE_ETH_EVENT_QUEUE_STATE_CHANGE
This event is used for notifying a queue state changed event.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 lib/librte_ether/rte_ethdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index d53e362..7817d4a 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2661,6 +2661,8 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 enum rte_eth_event_type {
 	RTE_ETH_EVENT_UNKNOWN,  /**< unknown event type */
 	RTE_ETH_EVENT_INTR_LSC, /**< lsc interrupt event */
+	RTE_ETH_EVENT_QUEUE_STATE_CHANGE,
+				/**< queue state changed interrupt */
 	RTE_ETH_EVENT_MAX       /**< max value of this enum */
 };
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v10 2/2] vhost: Add VHOST PMD
  2016-02-09  9:38                               ` [PATCH v9 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
  2016-03-04  4:17                                 ` [PATCH v10 0/2] Add VHOST PMD Tetsuya Mukawa
  2016-03-04  4:17                                 ` [PATCH v10 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
@ 2016-03-04  4:17                                 ` Tetsuya Mukawa
  2016-03-04  8:39                                   ` Yuanhan Liu
                                                     ` (3 more replies)
  2 siblings, 4 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-04  4:17 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying, yuanhan.liu

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
 - iface:  The parameter is used to specify a path to connect to a
           virtio-net device.
 - queues: The parameter is used to specify the number of the queues
           virtio-net device has.
           (Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
        <snip>
        -chardev socket,id=chr0,path=/tmp/sock0 \
        -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
        -device virtio-net-pci,netdev=net0,mq=on

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 MAINTAINERS                                 |   4 +
 config/common_linuxapp                      |   6 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/rel_notes/release_16_04.rst      |   4 +
 drivers/net/Makefile                        |   1 +
 drivers/net/vhost/Makefile                  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c           | 916 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
 drivers/net/vhost/rte_pmd_vhost_version.map |  10 +
 mk/rte.app.mk                               |   6 +
 10 files changed, 1119 insertions(+)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 628bc05..2ec7cd4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -352,6 +352,10 @@ Null PMD
 M: Tetsuya Mukawa <mukawa@igel.co.jp>
 F: drivers/net/null/
 
+Vhost PMD
+M: Tetsuya Mukawa <mukawa@igel.co.jp>
+F: drivers/net/vhost/
+
 Intel AES-NI Multi-Buffer
 M: Declan Doherty <declan.doherty@intel.com>
 F: drivers/crypto/aesni_mb/
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 26df137..4915709 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -503,6 +503,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 
 #
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 8618114..6bb79b0 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -48,6 +48,7 @@ Network Interface Controller Drivers
     nfp
     szedata2
     virtio
+    vhost
     vmxnet3
     pcap_ring
 
diff --git a/doc/guides/rel_notes/release_16_04.rst b/doc/guides/rel_notes/release_16_04.rst
index 9442018..feae36a 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -57,6 +57,10 @@ This section should contain new features added in this release. Sample format:
 
 * **Added vhost-user live migration support.**
 
+* **Added vhost PMD.**
+
+  Added virtual PMD that wraps librte_vhost.
+
 
 Resolved Issues
 ---------------
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 0c3393f..ceb2010 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -51,5 +51,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2) += szedata2
 DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
 
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..f49a69b
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+#   BSD LICENSE
+#
+#   Copyright (c) 2010-2016 Intel Corporation.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..c4a1ac1
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,916 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2016 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co.,Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+#include <stdbool.h>
+#ifdef RTE_LIBRTE_VHOST_NUMA
+#include <numaif.h>
+#endif
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+#include <rte_spinlock.h>
+
+#include "rte_eth_vhost.h"
+
+#define ETH_VHOST_IFACE_ARG		"iface"
+#define ETH_VHOST_QUEUES_ARG		"queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+	ETH_VHOST_IFACE_ARG,
+	ETH_VHOST_QUEUES_ARG,
+	NULL
+};
+
+static struct ether_addr base_eth_addr = {
+	.addr_bytes = {
+		0x56 /* V */,
+		0x48 /* H */,
+		0x4F /* O */,
+		0x53 /* S */,
+		0x54 /* T */,
+		0x00
+	}
+};
+
+struct vhost_queue {
+	rte_atomic32_t allow_queuing;
+	rte_atomic32_t while_queuing;
+	struct virtio_net *device;
+	struct pmd_internal *internal;
+	struct rte_mempool *mb_pool;
+	uint8_t port;
+	uint16_t virtqueue_id;
+	uint64_t rx_pkts;
+	uint64_t tx_pkts;
+	uint64_t missed_pkts;
+	uint64_t rx_bytes;
+	uint64_t tx_bytes;
+};
+
+struct pmd_internal {
+	char *dev_name;
+	char *iface_name;
+	volatile uint16_t once;
+};
+
+struct internal_list {
+	TAILQ_ENTRY(internal_list) next;
+	struct rte_eth_dev *eth_dev;
+};
+
+TAILQ_HEAD(internal_list_head, internal_list);
+static struct internal_list_head internal_list =
+	TAILQ_HEAD_INITIALIZER(internal_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static rte_atomic16_t nb_started_ports;
+static pthread_t session_th;
+
+static struct rte_eth_link pmd_link = {
+		.link_speed = 10000,
+		.link_duplex = ETH_LINK_FULL_DUPLEX,
+		.link_status = 0
+};
+
+struct rte_vhost_vring_state {
+	rte_spinlock_t lock;
+
+	bool cur[RTE_MAX_QUEUES_PER_PORT * 2];
+	bool seen[RTE_MAX_QUEUES_PER_PORT * 2];
+	unsigned int index;
+	unsigned int max_vring;
+};
+
+static struct rte_vhost_vring_state *vring_states[RTE_MAX_ETHPORTS];
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_rx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Dequeue packets from guest TX queue */
+	nb_rx = rte_vhost_dequeue_burst(r->device,
+			r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
+
+	r->rx_pkts += nb_rx;
+
+	for (i = 0; likely(i < nb_rx); i++) {
+		bufs[i]->port = r->port;
+		r->rx_bytes += bufs[i]->pkt_len;
+	}
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_tx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Enqueue packets to guest RX queue */
+	nb_tx = rte_vhost_enqueue_burst(r->device,
+			r->virtqueue_id, bufs, nb_bufs);
+
+	r->tx_pkts += nb_tx;
+	r->missed_pkts += nb_bufs - nb_tx;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		r->tx_bytes += bufs[i]->pkt_len;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		rte_pktmbuf_free(bufs[i]);
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	return 0;
+}
+
+static inline struct internal_list *
+find_internal_resource(char *ifname)
+{
+	int found = 0;
+	struct internal_list *list;
+	struct pmd_internal *internal;
+
+	if (ifname == NULL)
+		return NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(list, &internal_list, next) {
+		internal = list->eth_dev->data->dev_private;
+		if (!strcmp(internal->iface_name, ifname)) {
+			found = 1;
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	if (!found)
+		return NULL;
+
+	return list;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct internal_list *list;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	list = find_internal_resource(dev->ifname);
+	if (list == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid device name\n");
+		return -1;
+	}
+
+	eth_dev = list->eth_dev;
+	internal = eth_dev->data->dev_private;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+		vq->port = eth_dev->data->port_id;
+		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+		vq->port = eth_dev->data->port_id;
+		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
+	}
+
+	dev->flags |= VIRTIO_DEV_RUNNING;
+	dev->priv = eth_dev;
+	eth_dev->data->dev_link.link_status = 1;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+
+	RTE_LOG(INFO, PMD, "New connection established\n");
+
+	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
+
+	return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return;
+	}
+
+	eth_dev = (struct rte_eth_dev *)dev->priv;
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+		return;
+	}
+
+	/* Wait until rx/tx_pkt_burst stops accessing vhost device */
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+
+	eth_dev->data->dev_link.link_status = 0;
+
+	dev->priv = NULL;
+	dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+
+	RTE_LOG(INFO, PMD, "Connection closed\n");
+
+	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
+}
+
+static int
+vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
+{
+	struct rte_vhost_vring_state *state;
+	struct rte_eth_dev *eth_dev;
+	struct internal_list *list;
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	int newnode, ret;
+#endif
+
+	if (dev == NULL) {
+		RTE_LOG(ERR, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	list = find_internal_resource(dev->ifname);
+	if (list == NULL) {
+		RTE_LOG(ERR, PMD, "Invalid interface name: %s\n", dev->ifname);
+		return -1;
+	}
+
+	eth_dev = list->eth_dev;
+	/* won't be NULL */
+	state = vring_states[eth_dev->data->port_id];
+
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	ret  = get_mempolicy(&newnode, NULL, 0, dev,
+			MPOL_F_NODE | MPOL_F_ADDR);
+	if (ret < 0) {
+		RTE_LOG(ERR, PMD, "Unknow numa node\n");
+		return -1;
+	}
+
+	eth_dev->data->numa_node = newnode;
+#endif
+	rte_spinlock_lock(&state->lock);
+	state->cur[vring] = enable;
+	state->max_vring = RTE_MAX(vring, state->max_vring);
+	rte_spinlock_unlock(&state->lock);
+
+	RTE_LOG(INFO, PMD, "vring%u is %s\n",
+			vring, enable ? "enabled" : "disabled");
+
+	_rte_eth_dev_callback_process(eth_dev,
+			RTE_ETH_EVENT_QUEUE_STATE_CHANGE);
+
+	return 0;
+}
+
+int
+rte_eth_vhost_get_queue_event(uint8_t port_id,
+		struct rte_eth_vhost_queue_event *event)
+{
+	struct rte_vhost_vring_state *state;
+	unsigned int i;
+	int idx;
+
+	if (port_id >= RTE_MAX_ETHPORTS) {
+		RTE_LOG(ERR, PMD, "Invalid port id\n");
+		return -1;
+	}
+
+	state = vring_states[port_id];
+	if (!state) {
+		RTE_LOG(ERR, PMD, "Unused port\n");
+		return -1;
+	}
+
+	rte_spinlock_lock(&state->lock);
+	for (i = 0; i <= state->max_vring; i++) {
+		idx = state->index++ % (state->max_vring + 1);
+
+		if (state->cur[idx] != state->seen[idx]) {
+			state->seen[idx] = state->cur[idx];
+			event->queue_id = idx / 2;
+			event->rx = idx & 1;
+			event->enable = state->cur[idx];
+			rte_spinlock_unlock(&state->lock);
+			return 0;
+		}
+	}
+	rte_spinlock_unlock(&state->lock);
+
+	return -1;
+}
+
+static void *
+vhost_driver_session(void *param __rte_unused)
+{
+	static struct virtio_net_device_ops vhost_ops;
+
+	/* set vhost arguments */
+	vhost_ops.new_device = new_device;
+	vhost_ops.destroy_device = destroy_device;
+	vhost_ops.vring_state_changed = vring_state_changed;
+	if (rte_vhost_driver_callback_register(&vhost_ops) < 0)
+		RTE_LOG(ERR, PMD, "Can't register callbacks\n");
+
+	/* start event handling */
+	rte_vhost_driver_session_start();
+
+	return NULL;
+}
+
+static int
+vhost_driver_session_start(void)
+{
+	int ret;
+
+	ret = pthread_create(&session_th,
+			NULL, vhost_driver_session, NULL);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Can't create a thread\n");
+
+	return ret;
+}
+
+static void
+vhost_driver_session_stop(void)
+{
+	int ret;
+
+	ret = pthread_cancel(session_th);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Can't cancel the thread\n");
+
+	ret = pthread_join(session_th, NULL);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+	int ret = 0;
+
+	if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+		ret = rte_vhost_driver_register(internal->iface_name);
+		if (ret)
+			return ret;
+	}
+
+	/* We need only one message handling thread */
+	if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
+		ret = vhost_driver_session_start();
+
+	return ret;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 1, 0))
+		rte_vhost_driver_unregister(internal->iface_name);
+
+	if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
+		vhost_driver_session_stop();
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct vhost_queue *vq;
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->mb_pool = mb_pool;
+	vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
+	dev->data->rx_queues[rx_queue_id] = vq;
+
+	return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct vhost_queue *vq;
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
+	dev->data->tx_queues[tx_queue_id] = vq;
+
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+	     struct rte_eth_dev_info *dev_info)
+{
+	dev_info->driver_name = drivername;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = (uint32_t)-1;
+	dev_info->max_rx_queues = dev->data->nb_rx_queues;
+	dev_info->max_tx_queues = dev->data->nb_tx_queues;
+	dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	unsigned i;
+	unsigned long rx_total = 0, tx_total = 0, tx_missed_total = 0;
+	unsigned long rx_total_bytes = 0, tx_total_bytes = 0;
+	struct vhost_queue *vq;
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+			i < dev->data->nb_rx_queues; i++) {
+		if (dev->data->rx_queues[i] == NULL)
+			continue;
+		vq = dev->data->rx_queues[i];
+		stats->q_ipackets[i] = vq->rx_pkts;
+		rx_total += stats->q_ipackets[i];
+
+		stats->q_ibytes[i] = vq->rx_bytes;
+		rx_total_bytes += stats->q_ibytes[i];
+	}
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+			i < dev->data->nb_tx_queues; i++) {
+		if (dev->data->tx_queues[i] == NULL)
+			continue;
+		vq = dev->data->tx_queues[i];
+		stats->q_opackets[i] = vq->tx_pkts;
+		tx_missed_total += vq->missed_pkts;
+		tx_total += stats->q_opackets[i];
+
+		stats->q_obytes[i] = vq->tx_bytes;
+		tx_total_bytes += stats->q_obytes[i];
+	}
+
+	stats->ipackets = rx_total;
+	stats->opackets = tx_total;
+	stats->imissed = tx_missed_total;
+	stats->ibytes = rx_total_bytes;
+	stats->obytes = tx_total_bytes;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct vhost_queue *vq;
+	unsigned i;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		if (dev->data->rx_queues[i] == NULL)
+			continue;
+		vq = dev->data->rx_queues[i];
+		vq->rx_pkts = 0;
+		vq->rx_bytes = 0;
+	}
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		if (dev->data->tx_queues[i] == NULL)
+			continue;
+		vq = dev->data->tx_queues[i];
+		vq->tx_pkts = 0;
+		vq->tx_bytes = 0;
+		vq->missed_pkts = 0;
+	}
+}
+
+static void
+eth_queue_release(void *q)
+{
+	rte_free(q);
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+/**
+ * Disable features in feature_mask. Returns 0 on success.
+ */
+int
+rte_eth_vhost_feature_disable(uint64_t feature_mask)
+{
+	return rte_vhost_feature_disable(feature_mask);
+}
+
+/**
+ * Enable features in feature_mask. Returns 0 on success.
+ */
+int
+rte_eth_vhost_feature_enable(uint64_t feature_mask)
+{
+	return rte_vhost_feature_enable(feature_mask);
+}
+
+/* Returns currently supported vhost features */
+uint64_t
+rte_eth_vhost_feature_get(void)
+{
+	return rte_vhost_feature_get();
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, char *iface_name, int16_t queues,
+		     const unsigned numa_node)
+{
+	struct rte_eth_dev_data *data = NULL;
+	struct pmd_internal *internal = NULL;
+	struct rte_eth_dev *eth_dev = NULL;
+	struct ether_addr *eth_addr = NULL;
+	struct rte_vhost_vring_state *vring_state = NULL;
+	struct internal_list *list = NULL;
+
+	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+		numa_node);
+
+	/* now do all data allocation - for eth_dev structure, dummy pci driver
+	 * and internal (private) data
+	 */
+	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+	if (data == NULL)
+		goto error;
+
+	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+	if (internal == NULL)
+		goto error;
+
+	list = rte_zmalloc_socket(name, sizeof(*list), 0, numa_node);
+	if (list == NULL)
+		goto error;
+
+	/* reserve an ethdev entry */
+	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+	if (eth_dev == NULL)
+		goto error;
+
+	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+	if (eth_addr == NULL)
+		goto error;
+	*eth_addr = base_eth_addr;
+	eth_addr->addr_bytes[5] = eth_dev->data->port_id;
+
+	vring_state = rte_zmalloc_socket(name,
+			sizeof(*vring_state), 0, numa_node);
+	if (vring_state == NULL)
+		goto error;
+
+	TAILQ_INIT(&eth_dev->link_intr_cbs);
+
+	/* now put it all together
+	 * - store queue data in internal,
+	 * - store numa_node info in ethdev data
+	 * - point eth_dev_data to internals
+	 * - and point eth_dev structure to new eth_dev_data structure
+	 */
+	internal->dev_name = strdup(name);
+	if (internal->dev_name == NULL)
+		goto error;
+	internal->iface_name = strdup(iface_name);
+	if (internal->iface_name == NULL)
+		goto error;
+
+	list->eth_dev = eth_dev;
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internal_list, list, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	rte_spinlock_init(&vring_state->lock);
+	vring_states[eth_dev->data->port_id] = vring_state;
+
+	data->dev_private = internal;
+	data->port_id = eth_dev->data->port_id;
+	memmove(data->name, eth_dev->data->name, sizeof(data->name));
+	data->nb_rx_queues = queues;
+	data->nb_tx_queues = queues;
+	data->dev_link = pmd_link;
+	data->mac_addrs = eth_addr;
+
+	/* We'll replace the 'data' originally allocated by eth_dev. So the
+	 * vhost PMD resources won't be shared between multi processes.
+	 */
+	eth_dev->data = data;
+	eth_dev->dev_ops = &ops;
+	eth_dev->driver = NULL;
+	data->dev_flags =
+		RTE_ETH_DEV_DETACHABLE | RTE_ETH_DEV_INTR_LSC;
+	data->kdrv = RTE_KDRV_NONE;
+	data->drv_name = internal->dev_name;
+	data->numa_node = numa_node;
+
+	/* finally assign rx and tx ops */
+	eth_dev->rx_pkt_burst = eth_vhost_rx;
+	eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+	return data->port_id;
+
+error:
+	if (internal)
+		free(internal->dev_name);
+	rte_free(vring_state);
+	rte_free(eth_addr);
+	if (eth_dev)
+		rte_eth_dev_release_port(eth_dev);
+	rte_free(internal);
+	rte_free(list);
+	rte_free(data);
+
+	return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	const char **iface_name = extra_args;
+
+	if (value == NULL)
+		return -1;
+
+	*iface_name = value;
+
+	return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *q = extra_args;
+
+	if (value == NULL || extra_args == NULL)
+		return -EINVAL;
+
+	*q = (uint16_t)strtoul(value, NULL, 0);
+	if (*q == USHRT_MAX && errno == ERANGE)
+		return -1;
+
+	if (*q > RTE_MAX_QUEUES_PER_PORT)
+		return -1;
+
+	return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
+	char *iface_name;
+	uint16_t queues;
+
+	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+		return -1;
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+					 &open_iface, &iface_name);
+		if (ret < 0)
+			goto out_free;
+	} else {
+		ret = -1;
+		goto out_free;
+	}
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+					 &open_queues, &queues);
+		if (ret < 0)
+			goto out_free;
+
+	} else
+		queues = 1;
+
+	eth_dev_vhost_create(name, iface_name, queues, rte_socket_id());
+
+out_free:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internal *internal;
+	struct internal_list *list;
+	unsigned int i;
+
+	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+	/* find an ethdev entry */
+	eth_dev = rte_eth_dev_allocated(name);
+	if (eth_dev == NULL)
+		return -ENODEV;
+
+	internal = eth_dev->data->dev_private;
+	if (internal == NULL)
+		return -ENODEV;
+
+	list = find_internal_resource(internal->iface_name);
+	if (list == NULL)
+		return -ENODEV;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_REMOVE(&internal_list, list, next);
+	pthread_mutex_unlock(&internal_list_lock);
+	rte_free(list);
+
+	eth_dev_stop(eth_dev);
+
+	rte_free(vring_states[eth_dev->data->port_id]);
+	vring_states[eth_dev->data->port_id] = NULL;
+
+	free(internal->dev_name);
+	free(internal->iface_name);
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++)
+		rte_free(eth_dev->data->rx_queues[i]);
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
+		rte_free(eth_dev->data->tx_queues[i]);
+
+	rte_free(eth_dev->data->mac_addrs);
+	rte_free(eth_dev->data);
+	rte_free(internal);
+
+	rte_eth_dev_release_port(eth_dev);
+
+	return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+	.name = "eth_vhost",
+	.type = PMD_VDEV,
+	.init = rte_pmd_vhost_devinit,
+	.uninit = rte_pmd_vhost_devuninit,
+};
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..e78cb74
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,109 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co., Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_VHOST_H_
+#define _RTE_ETH_VHOST_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+#include <stdbool.h>
+
+#include <rte_virtio_net.h>
+
+/**
+ * Disable features in feature_mask.
+ *
+ * @param feature_mask
+ *  Vhost features defined in "linux/virtio_net.h".
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_feature_disable(uint64_t feature_mask);
+
+/**
+ * Enable features in feature_mask.
+ *
+ * @param feature_mask
+ *  Vhost features defined in "linux/virtio_net.h".
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_feature_enable(uint64_t feature_mask);
+
+/**
+ * Returns currently supported vhost features.
+ *
+ * @return
+ *  Vhost features defined in "linux/virtio_net.h".
+ */
+uint64_t rte_eth_vhost_feature_get(void);
+
+/*
+ * Event description.
+ */
+struct rte_eth_vhost_queue_event {
+	uint16_t queue_id;
+	bool rx;
+	bool enable;
+};
+
+/**
+ * Get queue events from specified port.
+ * If a callback for below event is registered by
+ * rte_eth_dev_callback_register(), this function will describe what was
+ * changed.
+ *  - RTE_ETH_EVENT_QUEUE_STATE_CHANGE
+ * Multiple events may cause only one callback kicking, so call this function
+ * while returning 0.
+ *
+ * @param port_id
+ *  Port id.
+ * @param event
+ *  Pointer to a rte_eth_vhost_queue_event structure.
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_get_queue_event(uint8_t port_id,
+		struct rte_eth_vhost_queue_event *event);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..65bf3a8
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,10 @@
+DPDK_16.04 {
+	global:
+
+	rte_eth_vhost_feature_disable;
+	rte_eth_vhost_feature_enable;
+	rte_eth_vhost_feature_get;
+	rte_eth_vhost_get_queue_event;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index daac09f..40df26f 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -150,6 +150,12 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_QAT)        += -lrte_pmd_qat
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -lrte_pmd_aesni_mb
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
+
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
 
 _LDLIBS-y += $(EXECENV_LDLIBS)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [PATCH v10 2/2] vhost: Add VHOST PMD
  2016-03-04  4:17                                 ` [PATCH v10 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
@ 2016-03-04  8:39                                   ` Yuanhan Liu
  2016-03-04  9:58                                     ` Tetsuya Mukawa
  2016-03-07  2:07                                   ` [PATCH v11 0/2] " Tetsuya Mukawa
                                                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 200+ messages in thread
From: Yuanhan Liu @ 2016-03-04  8:39 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying, yuanhan.liu

On Fri, Mar 04, 2016 at 01:17:42PM +0900, Tetsuya Mukawa wrote:
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> The vhost messages will be handled only when a port is started. So start
> a port first, then invoke QEMU.
> 
> The PMD has 2 parameters.
>  - iface:  The parameter is used to specify a path to connect to a
>            virtio-net device.
>  - queues: The parameter is used to specify the number of the queues
>            virtio-net device has.
>            (Default: 1)
> 
> Here is an example.
> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
> 
> To connect above testpmd, here is qemu command example.
> 
> $ qemu-system-x86_64 \
>         <snip>
>         -chardev socket,id=chr0,path=/tmp/sock0 \
>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
>         -device virtio-net-pci,netdev=net0,mq=on
> 
> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

You should carry my Acked-by that I gave few versions before, as well
as the Reviewed-by and Tested-by from Rich, if my memory serves me
right. It's also a way to show respects to the reivew/test efforts
from them.

> ---
>  MAINTAINERS                                 |   4 +

Mind to add me to the MAINTAINER list as well if you send another
version? :) If so, don't put my email address wrongly, which you
have done many times ;-)

	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v10 2/2] vhost: Add VHOST PMD
  2016-03-04  8:39                                   ` Yuanhan Liu
@ 2016-03-04  9:58                                     ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-04  9:58 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, ann.zhuangyanying, yuanhan.liu

On 2016/03/04 17:39, Yuanhan Liu wrote:
> On Fri, Mar 04, 2016 at 01:17:42PM +0900, Tetsuya Mukawa wrote:
>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
>> The vhost messages will be handled only when a port is started. So start
>> a port first, then invoke QEMU.
>>
>> The PMD has 2 parameters.
>>  - iface:  The parameter is used to specify a path to connect to a
>>            virtio-net device.
>>  - queues: The parameter is used to specify the number of the queues
>>            virtio-net device has.
>>            (Default: 1)
>>
>> Here is an example.
>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
>>
>> To connect above testpmd, here is qemu command example.
>>
>> $ qemu-system-x86_64 \
>>         <snip>
>>         -chardev socket,id=chr0,path=/tmp/sock0 \
>>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
>>         -device virtio-net-pci,netdev=net0,mq=on
>>
>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
>> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> You should carry my Acked-by that I gave few versions before, as well
> as the Reviewed-by and Tested-by from Rich, if my memory serves me
> right. It's also a way to show respects to the reivew/test efforts
> from them.

Sure, I will submit one more patch, and will add them.

>> ---
>>  MAINTAINERS                                 |   4 +
> Mind to add me to the MAINTAINER list as well if you send another
> version? :) If so, don't put my email address wrongly, which you
> have done many times ;-)

Thank you so much for it. And sorry I get wrong your email address.
I will add you in MAINTAINERS in next patch.

Thanks,
Tetsuya

> 	--yliu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v11 0/2] Add VHOST PMD
  2016-03-04  4:17                                 ` [PATCH v10 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  2016-03-04  8:39                                   ` Yuanhan Liu
@ 2016-03-07  2:07                                   ` Tetsuya Mukawa
  2016-03-07  2:07                                   ` [PATCH v11 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
  2016-03-07  2:07                                   ` [PATCH v11 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  3 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-07  2:07 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.


PATCH v11 changes:
 - Rebase on latest master.
 - Fix MAINTAINERS file.
 - Fix Acked-by and Tested-by signatures of commit log.

PATCH v10 changes:
 - Rebase on latest master.
 - Fix DPDK version number(2.3 to 16.04)
 - Set port id to mbuf while receiving packets.

PATCH v9 changes:
 - Fix a null pointer access issue implemented in v8 patch.

PATCH v8 changes:
 - Manage ether devices list instead of internal structures list.
 - Remove needless NULL checking.
 - Replace "pthread_exit" to "return NULL".
 - Replace rte_panic to RTE_LOG, also add error handling.
 - Remove duplicated lines.
 - Remove needless casting.
 - Follow coding style.
 - Remove needless parenthesis.

PATCH v7 changes:
 - Remove needless parenthesis.
 - Add release note.
 - Remove needless line wraps.
 - Add null pointer check in vring_state_changed().
 - Free queue memory in eth_queue_release().
 - Fix wrong variable name.
 - Fix error handling code of eth_dev_vhost_create() and
   rte_pmd_vhost_devuninit().
 - Remove needless null checking from rte_pmd_vhost_devinit/devuninit().
 - Use port id to create mac address.
 - Add doxygen style comments in "rte_eth_vhost.h".
 - Fix wrong comment in "mk/rte.app.mk".

PATCH v6 changes:
 - Remove rte_vhost_driver_pmd_callback_registe().
 - Support link status interrupt.
 - Support queue state changed interrupt.
 - Add rte_eth_vhost_get_queue_event().
 - Support numa node detection when new device is connected.

PATCH v5 changes:
 - Rebase on latest master.
 - Fix RX/TX routine to count RX/TX bytes.
 - Fix RX/TX routine not to count as error packets if enqueue/dequeue
   cannot send all packets.
 - Fix if-condition checking for multiqueues.
 - Add "static" to pthread variable.
 - Fix format.
 - Change default behavior not to receive queueing event from driver.
 - Split the patch to separate rte_eth_vhost_portid2vdev().

PATCH v4 changes:
 - Rebase on latest DPDK tree.
 - Fix cording style.
 - Fix code not to invoke multiple messaging handling threads.
 - Fix code to handle vdev parameters correctly.
 - Remove needless cast.
 - Remove needless if-condition before rt_free().

PATCH v3 changes:
 - Rebase on latest matser
 - Specify correct queue_id in RX/TX function.

PATCH v2 changes:
 - Remove a below patch that fixes vhost library.
   The patch was applied as a separate patch.
   - vhost: fix crash with multiqueue enabled
 - Fix typos.
   (Thanks to Thomas, Monjalon)
 - Rebase on latest tree with above bernard's patches.

PATCH v1 changes:
 - Support vhost multiple queues.
 - Rebase on "remove pci driver from vdevs".
 - Optimize RX/TX functions.
 - Fix resource leaks.
 - Fix compile issue.
 - Add patch to fix vhost library.

RFC PATCH v3 changes:
 - Optimize performance.
   In RX/TX functions, change code to access only per core data.
 - Add below API to allow user to use vhost library APIs for a port managed
   by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
    - rte_eth_vhost_portid2vdev()
   To support this functionality, vhost library is also changed.
   Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
 - Add code to support vhost multiple queues.
   Actually, multiple queues functionality is not enabled so far.

RFC PATCH v2 changes:
 - Fix issues reported by checkpatch.pl
   (Thanks to Stephen Hemminger)


Tetsuya Mukawa (2):
  ethdev: Add a new event type to notify a queue state changed event
  vhost: Add VHOST PMD

 MAINTAINERS                                 |   5 +
 config/common_base                          |   6 +
 config/common_linuxapp                      |   1 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/rel_notes/release_16_04.rst      |   4 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c           | 916 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
 drivers/net/vhost/rte_pmd_vhost_version.map |  10 +
 lib/librte_ether/rte_ethdev.h               |   2 +
 mk/rte.app.mk                               |   6 +
 12 files changed, 1126 insertions(+)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

-- 
2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v11 1/2] ethdev: Add a new event type to notify a queue state changed event
  2016-03-04  4:17                                 ` [PATCH v10 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  2016-03-04  8:39                                   ` Yuanhan Liu
  2016-03-07  2:07                                   ` [PATCH v11 0/2] " Tetsuya Mukawa
@ 2016-03-07  2:07                                   ` Tetsuya Mukawa
  2016-03-07  2:07                                   ` [PATCH v11 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  3 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-07  2:07 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying

This patch adds a below event type.
 - RTE_ETH_EVENT_QUEUE_STATE_CHANGE
This event is used for notifying a queue state changed event.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
---
 lib/librte_ether/rte_ethdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index d53e362..7817d4a 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2661,6 +2661,8 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 enum rte_eth_event_type {
 	RTE_ETH_EVENT_UNKNOWN,  /**< unknown event type */
 	RTE_ETH_EVENT_INTR_LSC, /**< lsc interrupt event */
+	RTE_ETH_EVENT_QUEUE_STATE_CHANGE,
+				/**< queue state changed interrupt */
 	RTE_ETH_EVENT_MAX       /**< max value of this enum */
 };
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v11 2/2] vhost: Add VHOST PMD
  2016-03-04  4:17                                 ` [PATCH v10 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
                                                     ` (2 preceding siblings ...)
  2016-03-07  2:07                                   ` [PATCH v11 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
@ 2016-03-07  2:07                                   ` Tetsuya Mukawa
  2016-03-14 12:02                                     ` Bruce Richardson
                                                       ` (3 more replies)
  3 siblings, 4 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-07  2:07 UTC (permalink / raw)
  To: dev; +Cc: ann.zhuangyanying

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
 - iface:  The parameter is used to specify a path to connect to a
           virtio-net device.
 - queues: The parameter is used to specify the number of the queues
           virtio-net device has.
           (Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
        <snip>
        -chardev socket,id=chr0,path=/tmp/sock0 \
        -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
        -device virtio-net-pci,netdev=net0,mq=on

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
---
 MAINTAINERS                                 |   5 +
 config/common_base                          |   6 +
 config/common_linuxapp                      |   1 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/rel_notes/release_16_04.rst      |   4 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c           | 916 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
 drivers/net/vhost/rte_pmd_vhost_version.map |  10 +
 mk/rte.app.mk                               |   6 +
 11 files changed, 1124 insertions(+)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 628bc05..b5b11b0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -352,6 +352,11 @@ Null PMD
 M: Tetsuya Mukawa <mukawa@igel.co.jp>
 F: drivers/net/null/
 
+Vhost PMD
+M: Tetsuya Mukawa <mukawa@igel.co.jp>
+M: Yuanhan Liu <yuanhan.liu@linux.intel.com>
+F: drivers/net/vhost/
+
 Intel AES-NI Multi-Buffer
 M: Declan Doherty <declan.doherty@intel.com>
 F: drivers/crypto/aesni_mb/
diff --git a/config/common_base b/config/common_base
index 1af28c8..b054c00 100644
--- a/config/common_base
+++ b/config/common_base
@@ -495,6 +495,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 
 #
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=n
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/config/common_linuxapp b/config/common_linuxapp
index ffbe260..7e698e2 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -40,5 +40,6 @@ CONFIG_RTE_EAL_VFIO=y
 CONFIG_RTE_KNI_KMOD=y
 CONFIG_RTE_LIBRTE_KNI=y
 CONFIG_RTE_LIBRTE_VHOST=y
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
 CONFIG_RTE_LIBRTE_POWER=y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 8618114..6bb79b0 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -48,6 +48,7 @@ Network Interface Controller Drivers
     nfp
     szedata2
     virtio
+    vhost
     vmxnet3
     pcap_ring
 
diff --git a/doc/guides/rel_notes/release_16_04.rst b/doc/guides/rel_notes/release_16_04.rst
index 24f15bf..f8c069f 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -57,6 +57,10 @@ This section should contain new features added in this release. Sample format:
 
 * **Added vhost-user live migration support.**
 
+* **Added vhost PMD.**
+
+  Added virtual PMD that wraps librte_vhost.
+
 
 Resolved Issues
 ---------------
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 0c3393f..8ba37fb 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -52,4 +52,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..f49a69b
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+#   BSD LICENSE
+#
+#   Copyright (c) 2010-2016 Intel Corporation.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..c4a1ac1
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,916 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2016 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co.,Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+#include <stdbool.h>
+#ifdef RTE_LIBRTE_VHOST_NUMA
+#include <numaif.h>
+#endif
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+#include <rte_spinlock.h>
+
+#include "rte_eth_vhost.h"
+
+#define ETH_VHOST_IFACE_ARG		"iface"
+#define ETH_VHOST_QUEUES_ARG		"queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+	ETH_VHOST_IFACE_ARG,
+	ETH_VHOST_QUEUES_ARG,
+	NULL
+};
+
+static struct ether_addr base_eth_addr = {
+	.addr_bytes = {
+		0x56 /* V */,
+		0x48 /* H */,
+		0x4F /* O */,
+		0x53 /* S */,
+		0x54 /* T */,
+		0x00
+	}
+};
+
+struct vhost_queue {
+	rte_atomic32_t allow_queuing;
+	rte_atomic32_t while_queuing;
+	struct virtio_net *device;
+	struct pmd_internal *internal;
+	struct rte_mempool *mb_pool;
+	uint8_t port;
+	uint16_t virtqueue_id;
+	uint64_t rx_pkts;
+	uint64_t tx_pkts;
+	uint64_t missed_pkts;
+	uint64_t rx_bytes;
+	uint64_t tx_bytes;
+};
+
+struct pmd_internal {
+	char *dev_name;
+	char *iface_name;
+	volatile uint16_t once;
+};
+
+struct internal_list {
+	TAILQ_ENTRY(internal_list) next;
+	struct rte_eth_dev *eth_dev;
+};
+
+TAILQ_HEAD(internal_list_head, internal_list);
+static struct internal_list_head internal_list =
+	TAILQ_HEAD_INITIALIZER(internal_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static rte_atomic16_t nb_started_ports;
+static pthread_t session_th;
+
+static struct rte_eth_link pmd_link = {
+		.link_speed = 10000,
+		.link_duplex = ETH_LINK_FULL_DUPLEX,
+		.link_status = 0
+};
+
+struct rte_vhost_vring_state {
+	rte_spinlock_t lock;
+
+	bool cur[RTE_MAX_QUEUES_PER_PORT * 2];
+	bool seen[RTE_MAX_QUEUES_PER_PORT * 2];
+	unsigned int index;
+	unsigned int max_vring;
+};
+
+static struct rte_vhost_vring_state *vring_states[RTE_MAX_ETHPORTS];
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_rx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Dequeue packets from guest TX queue */
+	nb_rx = rte_vhost_dequeue_burst(r->device,
+			r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
+
+	r->rx_pkts += nb_rx;
+
+	for (i = 0; likely(i < nb_rx); i++) {
+		bufs[i]->port = r->port;
+		r->rx_bytes += bufs[i]->pkt_len;
+	}
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_tx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Enqueue packets to guest RX queue */
+	nb_tx = rte_vhost_enqueue_burst(r->device,
+			r->virtqueue_id, bufs, nb_bufs);
+
+	r->tx_pkts += nb_tx;
+	r->missed_pkts += nb_bufs - nb_tx;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		r->tx_bytes += bufs[i]->pkt_len;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		rte_pktmbuf_free(bufs[i]);
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	return 0;
+}
+
+static inline struct internal_list *
+find_internal_resource(char *ifname)
+{
+	int found = 0;
+	struct internal_list *list;
+	struct pmd_internal *internal;
+
+	if (ifname == NULL)
+		return NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(list, &internal_list, next) {
+		internal = list->eth_dev->data->dev_private;
+		if (!strcmp(internal->iface_name, ifname)) {
+			found = 1;
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	if (!found)
+		return NULL;
+
+	return list;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct internal_list *list;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	list = find_internal_resource(dev->ifname);
+	if (list == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid device name\n");
+		return -1;
+	}
+
+	eth_dev = list->eth_dev;
+	internal = eth_dev->data->dev_private;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+		vq->port = eth_dev->data->port_id;
+		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+		vq->port = eth_dev->data->port_id;
+		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
+	}
+
+	dev->flags |= VIRTIO_DEV_RUNNING;
+	dev->priv = eth_dev;
+	eth_dev->data->dev_link.link_status = 1;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+
+	RTE_LOG(INFO, PMD, "New connection established\n");
+
+	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
+
+	return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return;
+	}
+
+	eth_dev = (struct rte_eth_dev *)dev->priv;
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+		return;
+	}
+
+	/* Wait until rx/tx_pkt_burst stops accessing vhost device */
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+
+	eth_dev->data->dev_link.link_status = 0;
+
+	dev->priv = NULL;
+	dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+
+	RTE_LOG(INFO, PMD, "Connection closed\n");
+
+	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
+}
+
+static int
+vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
+{
+	struct rte_vhost_vring_state *state;
+	struct rte_eth_dev *eth_dev;
+	struct internal_list *list;
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	int newnode, ret;
+#endif
+
+	if (dev == NULL) {
+		RTE_LOG(ERR, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	list = find_internal_resource(dev->ifname);
+	if (list == NULL) {
+		RTE_LOG(ERR, PMD, "Invalid interface name: %s\n", dev->ifname);
+		return -1;
+	}
+
+	eth_dev = list->eth_dev;
+	/* won't be NULL */
+	state = vring_states[eth_dev->data->port_id];
+
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	ret  = get_mempolicy(&newnode, NULL, 0, dev,
+			MPOL_F_NODE | MPOL_F_ADDR);
+	if (ret < 0) {
+		RTE_LOG(ERR, PMD, "Unknow numa node\n");
+		return -1;
+	}
+
+	eth_dev->data->numa_node = newnode;
+#endif
+	rte_spinlock_lock(&state->lock);
+	state->cur[vring] = enable;
+	state->max_vring = RTE_MAX(vring, state->max_vring);
+	rte_spinlock_unlock(&state->lock);
+
+	RTE_LOG(INFO, PMD, "vring%u is %s\n",
+			vring, enable ? "enabled" : "disabled");
+
+	_rte_eth_dev_callback_process(eth_dev,
+			RTE_ETH_EVENT_QUEUE_STATE_CHANGE);
+
+	return 0;
+}
+
+int
+rte_eth_vhost_get_queue_event(uint8_t port_id,
+		struct rte_eth_vhost_queue_event *event)
+{
+	struct rte_vhost_vring_state *state;
+	unsigned int i;
+	int idx;
+
+	if (port_id >= RTE_MAX_ETHPORTS) {
+		RTE_LOG(ERR, PMD, "Invalid port id\n");
+		return -1;
+	}
+
+	state = vring_states[port_id];
+	if (!state) {
+		RTE_LOG(ERR, PMD, "Unused port\n");
+		return -1;
+	}
+
+	rte_spinlock_lock(&state->lock);
+	for (i = 0; i <= state->max_vring; i++) {
+		idx = state->index++ % (state->max_vring + 1);
+
+		if (state->cur[idx] != state->seen[idx]) {
+			state->seen[idx] = state->cur[idx];
+			event->queue_id = idx / 2;
+			event->rx = idx & 1;
+			event->enable = state->cur[idx];
+			rte_spinlock_unlock(&state->lock);
+			return 0;
+		}
+	}
+	rte_spinlock_unlock(&state->lock);
+
+	return -1;
+}
+
+static void *
+vhost_driver_session(void *param __rte_unused)
+{
+	static struct virtio_net_device_ops vhost_ops;
+
+	/* set vhost arguments */
+	vhost_ops.new_device = new_device;
+	vhost_ops.destroy_device = destroy_device;
+	vhost_ops.vring_state_changed = vring_state_changed;
+	if (rte_vhost_driver_callback_register(&vhost_ops) < 0)
+		RTE_LOG(ERR, PMD, "Can't register callbacks\n");
+
+	/* start event handling */
+	rte_vhost_driver_session_start();
+
+	return NULL;
+}
+
+static int
+vhost_driver_session_start(void)
+{
+	int ret;
+
+	ret = pthread_create(&session_th,
+			NULL, vhost_driver_session, NULL);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Can't create a thread\n");
+
+	return ret;
+}
+
+static void
+vhost_driver_session_stop(void)
+{
+	int ret;
+
+	ret = pthread_cancel(session_th);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Can't cancel the thread\n");
+
+	ret = pthread_join(session_th, NULL);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+	int ret = 0;
+
+	if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+		ret = rte_vhost_driver_register(internal->iface_name);
+		if (ret)
+			return ret;
+	}
+
+	/* We need only one message handling thread */
+	if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
+		ret = vhost_driver_session_start();
+
+	return ret;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 1, 0))
+		rte_vhost_driver_unregister(internal->iface_name);
+
+	if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
+		vhost_driver_session_stop();
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct vhost_queue *vq;
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->mb_pool = mb_pool;
+	vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
+	dev->data->rx_queues[rx_queue_id] = vq;
+
+	return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct vhost_queue *vq;
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
+	dev->data->tx_queues[tx_queue_id] = vq;
+
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+	     struct rte_eth_dev_info *dev_info)
+{
+	dev_info->driver_name = drivername;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = (uint32_t)-1;
+	dev_info->max_rx_queues = dev->data->nb_rx_queues;
+	dev_info->max_tx_queues = dev->data->nb_tx_queues;
+	dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	unsigned i;
+	unsigned long rx_total = 0, tx_total = 0, tx_missed_total = 0;
+	unsigned long rx_total_bytes = 0, tx_total_bytes = 0;
+	struct vhost_queue *vq;
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+			i < dev->data->nb_rx_queues; i++) {
+		if (dev->data->rx_queues[i] == NULL)
+			continue;
+		vq = dev->data->rx_queues[i];
+		stats->q_ipackets[i] = vq->rx_pkts;
+		rx_total += stats->q_ipackets[i];
+
+		stats->q_ibytes[i] = vq->rx_bytes;
+		rx_total_bytes += stats->q_ibytes[i];
+	}
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+			i < dev->data->nb_tx_queues; i++) {
+		if (dev->data->tx_queues[i] == NULL)
+			continue;
+		vq = dev->data->tx_queues[i];
+		stats->q_opackets[i] = vq->tx_pkts;
+		tx_missed_total += vq->missed_pkts;
+		tx_total += stats->q_opackets[i];
+
+		stats->q_obytes[i] = vq->tx_bytes;
+		tx_total_bytes += stats->q_obytes[i];
+	}
+
+	stats->ipackets = rx_total;
+	stats->opackets = tx_total;
+	stats->imissed = tx_missed_total;
+	stats->ibytes = rx_total_bytes;
+	stats->obytes = tx_total_bytes;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct vhost_queue *vq;
+	unsigned i;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		if (dev->data->rx_queues[i] == NULL)
+			continue;
+		vq = dev->data->rx_queues[i];
+		vq->rx_pkts = 0;
+		vq->rx_bytes = 0;
+	}
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		if (dev->data->tx_queues[i] == NULL)
+			continue;
+		vq = dev->data->tx_queues[i];
+		vq->tx_pkts = 0;
+		vq->tx_bytes = 0;
+		vq->missed_pkts = 0;
+	}
+}
+
+static void
+eth_queue_release(void *q)
+{
+	rte_free(q);
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+/**
+ * Disable features in feature_mask. Returns 0 on success.
+ */
+int
+rte_eth_vhost_feature_disable(uint64_t feature_mask)
+{
+	return rte_vhost_feature_disable(feature_mask);
+}
+
+/**
+ * Enable features in feature_mask. Returns 0 on success.
+ */
+int
+rte_eth_vhost_feature_enable(uint64_t feature_mask)
+{
+	return rte_vhost_feature_enable(feature_mask);
+}
+
+/* Returns currently supported vhost features */
+uint64_t
+rte_eth_vhost_feature_get(void)
+{
+	return rte_vhost_feature_get();
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, char *iface_name, int16_t queues,
+		     const unsigned numa_node)
+{
+	struct rte_eth_dev_data *data = NULL;
+	struct pmd_internal *internal = NULL;
+	struct rte_eth_dev *eth_dev = NULL;
+	struct ether_addr *eth_addr = NULL;
+	struct rte_vhost_vring_state *vring_state = NULL;
+	struct internal_list *list = NULL;
+
+	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+		numa_node);
+
+	/* now do all data allocation - for eth_dev structure, dummy pci driver
+	 * and internal (private) data
+	 */
+	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+	if (data == NULL)
+		goto error;
+
+	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+	if (internal == NULL)
+		goto error;
+
+	list = rte_zmalloc_socket(name, sizeof(*list), 0, numa_node);
+	if (list == NULL)
+		goto error;
+
+	/* reserve an ethdev entry */
+	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+	if (eth_dev == NULL)
+		goto error;
+
+	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+	if (eth_addr == NULL)
+		goto error;
+	*eth_addr = base_eth_addr;
+	eth_addr->addr_bytes[5] = eth_dev->data->port_id;
+
+	vring_state = rte_zmalloc_socket(name,
+			sizeof(*vring_state), 0, numa_node);
+	if (vring_state == NULL)
+		goto error;
+
+	TAILQ_INIT(&eth_dev->link_intr_cbs);
+
+	/* now put it all together
+	 * - store queue data in internal,
+	 * - store numa_node info in ethdev data
+	 * - point eth_dev_data to internals
+	 * - and point eth_dev structure to new eth_dev_data structure
+	 */
+	internal->dev_name = strdup(name);
+	if (internal->dev_name == NULL)
+		goto error;
+	internal->iface_name = strdup(iface_name);
+	if (internal->iface_name == NULL)
+		goto error;
+
+	list->eth_dev = eth_dev;
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internal_list, list, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	rte_spinlock_init(&vring_state->lock);
+	vring_states[eth_dev->data->port_id] = vring_state;
+
+	data->dev_private = internal;
+	data->port_id = eth_dev->data->port_id;
+	memmove(data->name, eth_dev->data->name, sizeof(data->name));
+	data->nb_rx_queues = queues;
+	data->nb_tx_queues = queues;
+	data->dev_link = pmd_link;
+	data->mac_addrs = eth_addr;
+
+	/* We'll replace the 'data' originally allocated by eth_dev. So the
+	 * vhost PMD resources won't be shared between multi processes.
+	 */
+	eth_dev->data = data;
+	eth_dev->dev_ops = &ops;
+	eth_dev->driver = NULL;
+	data->dev_flags =
+		RTE_ETH_DEV_DETACHABLE | RTE_ETH_DEV_INTR_LSC;
+	data->kdrv = RTE_KDRV_NONE;
+	data->drv_name = internal->dev_name;
+	data->numa_node = numa_node;
+
+	/* finally assign rx and tx ops */
+	eth_dev->rx_pkt_burst = eth_vhost_rx;
+	eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+	return data->port_id;
+
+error:
+	if (internal)
+		free(internal->dev_name);
+	rte_free(vring_state);
+	rte_free(eth_addr);
+	if (eth_dev)
+		rte_eth_dev_release_port(eth_dev);
+	rte_free(internal);
+	rte_free(list);
+	rte_free(data);
+
+	return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	const char **iface_name = extra_args;
+
+	if (value == NULL)
+		return -1;
+
+	*iface_name = value;
+
+	return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *q = extra_args;
+
+	if (value == NULL || extra_args == NULL)
+		return -EINVAL;
+
+	*q = (uint16_t)strtoul(value, NULL, 0);
+	if (*q == USHRT_MAX && errno == ERANGE)
+		return -1;
+
+	if (*q > RTE_MAX_QUEUES_PER_PORT)
+		return -1;
+
+	return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
+	char *iface_name;
+	uint16_t queues;
+
+	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+		return -1;
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+					 &open_iface, &iface_name);
+		if (ret < 0)
+			goto out_free;
+	} else {
+		ret = -1;
+		goto out_free;
+	}
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+					 &open_queues, &queues);
+		if (ret < 0)
+			goto out_free;
+
+	} else
+		queues = 1;
+
+	eth_dev_vhost_create(name, iface_name, queues, rte_socket_id());
+
+out_free:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internal *internal;
+	struct internal_list *list;
+	unsigned int i;
+
+	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+	/* find an ethdev entry */
+	eth_dev = rte_eth_dev_allocated(name);
+	if (eth_dev == NULL)
+		return -ENODEV;
+
+	internal = eth_dev->data->dev_private;
+	if (internal == NULL)
+		return -ENODEV;
+
+	list = find_internal_resource(internal->iface_name);
+	if (list == NULL)
+		return -ENODEV;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_REMOVE(&internal_list, list, next);
+	pthread_mutex_unlock(&internal_list_lock);
+	rte_free(list);
+
+	eth_dev_stop(eth_dev);
+
+	rte_free(vring_states[eth_dev->data->port_id]);
+	vring_states[eth_dev->data->port_id] = NULL;
+
+	free(internal->dev_name);
+	free(internal->iface_name);
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++)
+		rte_free(eth_dev->data->rx_queues[i]);
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
+		rte_free(eth_dev->data->tx_queues[i]);
+
+	rte_free(eth_dev->data->mac_addrs);
+	rte_free(eth_dev->data);
+	rte_free(internal);
+
+	rte_eth_dev_release_port(eth_dev);
+
+	return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+	.name = "eth_vhost",
+	.type = PMD_VDEV,
+	.init = rte_pmd_vhost_devinit,
+	.uninit = rte_pmd_vhost_devuninit,
+};
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..e78cb74
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,109 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co., Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_VHOST_H_
+#define _RTE_ETH_VHOST_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+#include <stdbool.h>
+
+#include <rte_virtio_net.h>
+
+/**
+ * Disable features in feature_mask.
+ *
+ * @param feature_mask
+ *  Vhost features defined in "linux/virtio_net.h".
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_feature_disable(uint64_t feature_mask);
+
+/**
+ * Enable features in feature_mask.
+ *
+ * @param feature_mask
+ *  Vhost features defined in "linux/virtio_net.h".
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_feature_enable(uint64_t feature_mask);
+
+/**
+ * Returns currently supported vhost features.
+ *
+ * @return
+ *  Vhost features defined in "linux/virtio_net.h".
+ */
+uint64_t rte_eth_vhost_feature_get(void);
+
+/*
+ * Event description.
+ */
+struct rte_eth_vhost_queue_event {
+	uint16_t queue_id;
+	bool rx;
+	bool enable;
+};
+
+/**
+ * Get queue events from specified port.
+ * If a callback for below event is registered by
+ * rte_eth_dev_callback_register(), this function will describe what was
+ * changed.
+ *  - RTE_ETH_EVENT_QUEUE_STATE_CHANGE
+ * Multiple events may cause only one callback kicking, so call this function
+ * while returning 0.
+ *
+ * @param port_id
+ *  Port id.
+ * @param event
+ *  Pointer to a rte_eth_vhost_queue_event structure.
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_get_queue_event(uint8_t port_id,
+		struct rte_eth_vhost_queue_event *event);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..65bf3a8
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,10 @@
+DPDK_16.04 {
+	global:
+
+	rte_eth_vhost_feature_disable;
+	rte_eth_vhost_feature_enable;
+	rte_eth_vhost_feature_get;
+	rte_eth_vhost_get_queue_event;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index daac09f..40df26f 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -150,6 +150,12 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_QAT)        += -lrte_pmd_qat
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -lrte_pmd_aesni_mb
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
+
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
 
 _LDLIBS-y += $(EXECENV_LDLIBS)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [PATCH v11 2/2] vhost: Add VHOST PMD
  2016-03-07  2:07                                   ` [PATCH v11 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
@ 2016-03-14 12:02                                     ` Bruce Richardson
  2016-03-15  5:35                                       ` Tetsuya Mukawa
  2016-03-15  8:31                                     ` [PATCH v12 0/2] " Tetsuya Mukawa
                                                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 200+ messages in thread
From: Bruce Richardson @ 2016-03-14 12:02 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Mon, Mar 07, 2016 at 11:07:14AM +0900, Tetsuya Mukawa wrote:
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> The vhost messages will be handled only when a port is started. So start
> a port first, then invoke QEMU.
> 
> The PMD has 2 parameters.
>  - iface:  The parameter is used to specify a path to connect to a
>            virtio-net device.
>  - queues: The parameter is used to specify the number of the queues
>            virtio-net device has.
>            (Default: 1)
> 
> Here is an example.
> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
> 
> To connect above testpmd, here is qemu command example.
> 
> $ qemu-system-x86_64 \
>         <snip>
>         -chardev socket,id=chr0,path=/tmp/sock0 \
>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
>         -device virtio-net-pci,netdev=net0,mq=on
> 
> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> Acked-by: Rich Lane <rich.lane@bigswitch.com>
> Tested-by: Rich Lane <rich.lane@bigswitch.com>
> ---
>  MAINTAINERS                                 |   5 +
>  config/common_base                          |   6 +
>  config/common_linuxapp                      |   1 +
>  doc/guides/nics/index.rst                   |   1 +

This adds a new entry for vhost PMD into the index, but there is no vhost.rst
file present in this patchset. Did you forget to add it?

>  doc/guides/rel_notes/release_16_04.rst      |   4 +
>  drivers/net/Makefile                        |   4 +
>  drivers/net/vhost/Makefile                  |  62 ++
>  drivers/net/vhost/rte_eth_vhost.c           | 916 ++++++++++++++++++++++++++++
>  drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
>  drivers/net/vhost/rte_pmd_vhost_version.map |  10 +
>  mk/rte.app.mk                               |   6 +
>  11 files changed, 1124 insertions(+)
>  create mode 100644 drivers/net/vhost/Makefile
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.h
>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
<snip>

/Bruce

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v11 2/2] vhost: Add VHOST PMD
  2016-03-14 12:02                                     ` Bruce Richardson
@ 2016-03-15  5:35                                       ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-15  5:35 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, ann.zhuangyanying

On 2016/03/14 21:02, Bruce Richardson wrote:
> On Mon, Mar 07, 2016 at 11:07:14AM +0900, Tetsuya Mukawa wrote:
>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
>> The vhost messages will be handled only when a port is started. So start
>> a port first, then invoke QEMU.
>>
>> The PMD has 2 parameters.
>>  - iface:  The parameter is used to specify a path to connect to a
>>            virtio-net device.
>>  - queues: The parameter is used to specify the number of the queues
>>            virtio-net device has.
>>            (Default: 1)
>>
>> Here is an example.
>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
>>
>> To connect above testpmd, here is qemu command example.
>>
>> $ qemu-system-x86_64 \
>>         <snip>
>>         -chardev socket,id=chr0,path=/tmp/sock0 \
>>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
>>         -device virtio-net-pci,netdev=net0,mq=on
>>
>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
>> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
>> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>> Acked-by: Rich Lane <rich.lane@bigswitch.com>
>> Tested-by: Rich Lane <rich.lane@bigswitch.com>
>> ---
>>  MAINTAINERS                                 |   5 +
>>  config/common_base                          |   6 +
>>  config/common_linuxapp                      |   1 +
>>  doc/guides/nics/index.rst                   |   1 +
> This adds a new entry for vhost PMD into the index, but there is no vhost.rst
> file present in this patchset. Did you forget to add it?

Yes, it seems so. The file is only on my environment.
I will add it.

Thanks,
Tetsuya

>
>>  doc/guides/rel_notes/release_16_04.rst      |   4 +
>>  drivers/net/Makefile                        |   4 +
>>  drivers/net/vhost/Makefile                  |  62 ++
>>  drivers/net/vhost/rte_eth_vhost.c           | 916 ++++++++++++++++++++++++++++
>>  drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
>>  drivers/net/vhost/rte_pmd_vhost_version.map |  10 +
>>  mk/rte.app.mk                               |   6 +
>>  11 files changed, 1124 insertions(+)
>>  create mode 100644 drivers/net/vhost/Makefile
>>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>>  create mode 100644 drivers/net/vhost/rte_eth_vhost.h
>>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
> <snip>
>
> /Bruce
>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v12 0/2] Add VHOST PMD
  2016-03-07  2:07                                   ` [PATCH v11 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  2016-03-14 12:02                                     ` Bruce Richardson
@ 2016-03-15  8:31                                     ` Tetsuya Mukawa
  2016-03-15  8:31                                     ` [PATCH v12 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
  2016-03-15  8:31                                     ` [PATCH v12 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  3 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-15  8:31 UTC (permalink / raw)
  To: dev, bruce.richardson; +Cc: ann.zhuangyanying, Tetsuya Mukawa

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.

PATCH v12 changes:
 - Rebase on latest master.
 - Add a missing documentation.

PATCH v11 changes:
 - Rebase on latest master.
 - Fix MAINTAINERS file.
 - Fix Acked-by and Tested-by signatures of commit log.

PATCH v10 changes:
 - Rebase on latest master.
 - Fix DPDK version number(2.3 to 16.04)
 - Set port id to mbuf while receiving packets.

PATCH v9 changes:
 - Fix a null pointer access issue implemented in v8 patch.

PATCH v8 changes:
 - Manage ether devices list instead of internal structures list.
 - Remove needless NULL checking.
 - Replace "pthread_exit" to "return NULL".
 - Replace rte_panic to RTE_LOG, also add error handling.
 - Remove duplicated lines.
 - Remove needless casting.
 - Follow coding style.
 - Remove needless parenthesis.

PATCH v7 changes:
 - Remove needless parenthesis.
 - Add release note.
 - Remove needless line wraps.
 - Add null pointer check in vring_state_changed().
 - Free queue memory in eth_queue_release().
 - Fix wrong variable name.
 - Fix error handling code of eth_dev_vhost_create() and
   rte_pmd_vhost_devuninit().
 - Remove needless null checking from rte_pmd_vhost_devinit/devuninit().
 - Use port id to create mac address.
 - Add doxygen style comments in "rte_eth_vhost.h".
 - Fix wrong comment in "mk/rte.app.mk".

PATCH v6 changes:
 - Remove rte_vhost_driver_pmd_callback_registe().
 - Support link status interrupt.
 - Support queue state changed interrupt.
 - Add rte_eth_vhost_get_queue_event().
 - Support numa node detection when new device is connected.

PATCH v5 changes:
 - Rebase on latest master.
 - Fix RX/TX routine to count RX/TX bytes.
 - Fix RX/TX routine not to count as error packets if enqueue/dequeue
   cannot send all packets.
 - Fix if-condition checking for multiqueues.
 - Add "static" to pthread variable.
 - Fix format.
 - Change default behavior not to receive queueing event from driver.
 - Split the patch to separate rte_eth_vhost_portid2vdev().

PATCH v4 changes:
 - Rebase on latest DPDK tree.
 - Fix cording style.
 - Fix code not to invoke multiple messaging handling threads.
 - Fix code to handle vdev parameters correctly.
 - Remove needless cast.
 - Remove needless if-condition before rt_free().

PATCH v3 changes:
 - Rebase on latest matser
 - Specify correct queue_id in RX/TX function.

PATCH v2 changes:
 - Remove a below patch that fixes vhost library.
   The patch was applied as a separate patch.
   - vhost: fix crash with multiqueue enabled
 - Fix typos.
   (Thanks to Thomas, Monjalon)
 - Rebase on latest tree with above bernard's patches.

PATCH v1 changes:
 - Support vhost multiple queues.
 - Rebase on "remove pci driver from vdevs".
 - Optimize RX/TX functions.
 - Fix resource leaks.
 - Fix compile issue.
 - Add patch to fix vhost library.

RFC PATCH v3 changes:
 - Optimize performance.
   In RX/TX functions, change code to access only per core data.
 - Add below API to allow user to use vhost library APIs for a port managed
   by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
    - rte_eth_vhost_portid2vdev()
   To support this functionality, vhost library is also changed.
   Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
 - Add code to support vhost multiple queues.
   Actually, multiple queues functionality is not enabled so far.

RFC PATCH v2 changes:
 - Fix issues reported by checkpatch.pl
   (Thanks to Stephen Hemminger)


Tetsuya Mukawa (2):
  ethdev: Add a new event type to notify a queue state changed event
  vhost: Add VHOST PMD

 MAINTAINERS                                 |   5 +
 config/common_base                          |   6 +
 config/common_linuxapp                      |   1 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/nics/vhost.rst                   | 110 ++++
 doc/guides/rel_notes/release_16_04.rst      |   5 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c           | 916 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
 drivers/net/vhost/rte_pmd_vhost_version.map |  10 +
 lib/librte_ether/rte_ethdev.h               |   2 +
 mk/rte.app.mk                               |   6 +
 13 files changed, 1237 insertions(+)
 create mode 100644 doc/guides/nics/vhost.rst
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

-- 
2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v12 1/2] ethdev: Add a new event type to notify a queue state changed event
  2016-03-07  2:07                                   ` [PATCH v11 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  2016-03-14 12:02                                     ` Bruce Richardson
  2016-03-15  8:31                                     ` [PATCH v12 0/2] " Tetsuya Mukawa
@ 2016-03-15  8:31                                     ` Tetsuya Mukawa
  2016-03-18 13:54                                       ` Thomas Monjalon
  2016-03-15  8:31                                     ` [PATCH v12 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  3 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-15  8:31 UTC (permalink / raw)
  To: dev, bruce.richardson; +Cc: ann.zhuangyanying, Tetsuya Mukawa

This patch adds a below event type.
 - RTE_ETH_EVENT_QUEUE_STATE_CHANGE
This event is used for notifying a queue state changed event.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
---
 lib/librte_ether/rte_ethdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index d867976..0680a71 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2915,6 +2915,8 @@ rte_eth_tx_buffer_count_callback(struct rte_mbuf **pkts, uint16_t unsent,
 enum rte_eth_event_type {
 	RTE_ETH_EVENT_UNKNOWN,  /**< unknown event type */
 	RTE_ETH_EVENT_INTR_LSC, /**< lsc interrupt event */
+	RTE_ETH_EVENT_QUEUE_STATE_CHANGE,
+				/**< queue state changed interrupt */
 	RTE_ETH_EVENT_MAX       /**< max value of this enum */
 };
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v12 2/2] vhost: Add VHOST PMD
  2016-03-07  2:07                                   ` [PATCH v11 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
                                                       ` (2 preceding siblings ...)
  2016-03-15  8:31                                     ` [PATCH v12 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
@ 2016-03-15  8:31                                     ` Tetsuya Mukawa
  2016-03-18 12:27                                       ` Bruce Richardson
                                                         ` (3 more replies)
  3 siblings, 4 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-15  8:31 UTC (permalink / raw)
  To: dev, bruce.richardson; +Cc: ann.zhuangyanying, Tetsuya Mukawa

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
 - iface:  The parameter is used to specify a path to connect to a
           virtio-net device.
 - queues: The parameter is used to specify the number of the queues
           virtio-net device has.
           (Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
        <snip>
        -chardev socket,id=chr0,path=/tmp/sock0 \
        -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
        -device virtio-net-pci,netdev=net0,mq=on

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
---
 MAINTAINERS                                 |   5 +
 config/common_base                          |   6 +
 config/common_linuxapp                      |   1 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/nics/vhost.rst                   | 110 ++++
 doc/guides/rel_notes/release_16_04.rst      |   5 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c           | 916 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
 drivers/net/vhost/rte_pmd_vhost_version.map |  10 +
 mk/rte.app.mk                               |   6 +
 12 files changed, 1235 insertions(+)
 create mode 100644 doc/guides/nics/vhost.rst
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index f10b26a..8ec1972 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -351,6 +351,11 @@ Null PMD
 M: Tetsuya Mukawa <mukawa@igel.co.jp>
 F: drivers/net/null/
 
+Vhost PMD
+M: Tetsuya Mukawa <mukawa@igel.co.jp>
+M: Yuanhan Liu <yuanhan.liu@linux.intel.com>
+F: drivers/net/vhost/
+
 Intel AES-NI GCM PMD
 M: Declan Doherty <declan.doherty@intel.com>
 F: drivers/crypto/aesni_gcm/
diff --git a/config/common_base b/config/common_base
index 52bd34f..3d753e1 100644
--- a/config/common_base
+++ b/config/common_base
@@ -505,6 +505,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 
 #
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=n
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/config/common_linuxapp b/config/common_linuxapp
index ffbe260..7e698e2 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -40,5 +40,6 @@ CONFIG_RTE_EAL_VFIO=y
 CONFIG_RTE_KNI_KMOD=y
 CONFIG_RTE_LIBRTE_KNI=y
 CONFIG_RTE_LIBRTE_VHOST=y
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
 CONFIG_RTE_LIBRTE_POWER=y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 0b353a8..d53b0c7 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -49,6 +49,7 @@ Network Interface Controller Drivers
     nfp
     szedata2
     virtio
+    vhost
     vmxnet3
     pcap_ring
 
diff --git a/doc/guides/nics/vhost.rst b/doc/guides/nics/vhost.rst
new file mode 100644
index 0000000..50e8a3a
--- /dev/null
+++ b/doc/guides/nics/vhost.rst
@@ -0,0 +1,110 @@
+..  BSD LICENSE
+    Copyright(c) 2016 IGEL Co., Ltd.. All rights reserved.
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of IGEL Co., Ltd. nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Poll Mode Driver that wraps vhost library
+=========================================
+
+This PMD is a thin wrapper of the DPDK vhost library.
+The user can handle virtqueues as one of normal DPDK port.
+
+Vhost Implementation in DPDK
+----------------------------
+
+Please refer to Chapter "Vhost Library" of *DPDK Programmer's Guide* to know detail of vhost.
+
+Features and Limitations of vhost PMD
+-------------------------------------
+
+Currently, the vhost PMD provides the basic functionality of packet reception, transmission and event handling.
+
+*   It has multiple queues support.
+
+*   It supports ``RTE_ETH_EVENT_INTR_LSC`` and ``RTE_ETH_EVENT_QUEUE_STATE_CHANGE`` events.
+
+*   It supports Port Hotplug functionality.
+
+*   Don't need to stop RX/TX, when the user wants to stop a guest or a virtio-net driver on guest.
+
+Vhost PMD arguments
+-------------------
+
+The user can specify below arguments in `--vdev` option.
+
+#.  ``iface``:
+
+    It is used to specify a path to connect to a QEMU virtio-net device.
+
+#.  ``queues``:
+
+    It is used to specify the number of queues virtio-net device has.
+    (Default: 1)
+
+Vhost PMD event handling
+------------------------
+
+This section describes how to handle vhost PMD events.
+
+The user can register an event callback handler with ``rte_eth_dev_callback_register()``.
+The registered callback handler will be invoked with one of below event types.
+
+#.  ``RTE_ETH_EVENT_INTR_LSC``:
+
+    It means link status of the port was changed.
+
+#.  ``RTE_ETH_EVENT_QUEUE_STATE_CHANGE``:
+
+    It means some of queue statuses were changed. Call ``rte_eth_vhost_get_queue_event()`` in the callback handler.
+    Because changing multiple statuses may occur only one event, call the function repeatedly as long as it doesn't return negative value.
+
+Vhost PMD with testpmd application
+----------------------------------
+
+This section demonstrates vhost PMD with testpmd DPDK sample application.
+
+#.  Launch the testpmd with vhost PMD:
+
+    .. code-block:: console
+
+        ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
+
+    Other basic DPDK preparations like hugepage enabling here.
+    Please refer to the *DPDK Getting Started Guide* for detailed instructions.
+
+#.  Launch the QEMU:
+
+    .. code-block:: console
+
+       qemu-system-x86_64 <snip>
+                   -chardev socket,id=chr0,path=/tmp/sock0 \
+                   -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
+                   -device virtio-net-pci,netdev=net0
+
+    This command attaches one virtio-net device to QEMU guest.
+    After initialization processes between QEMU and DPDK vhost library are done, status of the port will be linked up.
diff --git a/doc/guides/rel_notes/release_16_04.rst b/doc/guides/rel_notes/release_16_04.rst
index b729b67..a0b08cb 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -127,6 +127,11 @@ This section should contain new features added in this release. Sample format:
 
   Added new Crypto PMD to support null crypto operations in SW.
 
+* **Added vhost PMD.**
+
+  Added virtual PMD that wraps librte_vhost.
+
+
 Resolved Issues
 ---------------
 
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 0c3393f..8ba37fb 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -52,4 +52,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..f49a69b
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+#   BSD LICENSE
+#
+#   Copyright (c) 2010-2016 Intel Corporation.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..c4a1ac1
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,916 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2016 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co.,Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+#include <stdbool.h>
+#ifdef RTE_LIBRTE_VHOST_NUMA
+#include <numaif.h>
+#endif
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+#include <rte_spinlock.h>
+
+#include "rte_eth_vhost.h"
+
+#define ETH_VHOST_IFACE_ARG		"iface"
+#define ETH_VHOST_QUEUES_ARG		"queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+	ETH_VHOST_IFACE_ARG,
+	ETH_VHOST_QUEUES_ARG,
+	NULL
+};
+
+static struct ether_addr base_eth_addr = {
+	.addr_bytes = {
+		0x56 /* V */,
+		0x48 /* H */,
+		0x4F /* O */,
+		0x53 /* S */,
+		0x54 /* T */,
+		0x00
+	}
+};
+
+struct vhost_queue {
+	rte_atomic32_t allow_queuing;
+	rte_atomic32_t while_queuing;
+	struct virtio_net *device;
+	struct pmd_internal *internal;
+	struct rte_mempool *mb_pool;
+	uint8_t port;
+	uint16_t virtqueue_id;
+	uint64_t rx_pkts;
+	uint64_t tx_pkts;
+	uint64_t missed_pkts;
+	uint64_t rx_bytes;
+	uint64_t tx_bytes;
+};
+
+struct pmd_internal {
+	char *dev_name;
+	char *iface_name;
+	volatile uint16_t once;
+};
+
+struct internal_list {
+	TAILQ_ENTRY(internal_list) next;
+	struct rte_eth_dev *eth_dev;
+};
+
+TAILQ_HEAD(internal_list_head, internal_list);
+static struct internal_list_head internal_list =
+	TAILQ_HEAD_INITIALIZER(internal_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static rte_atomic16_t nb_started_ports;
+static pthread_t session_th;
+
+static struct rte_eth_link pmd_link = {
+		.link_speed = 10000,
+		.link_duplex = ETH_LINK_FULL_DUPLEX,
+		.link_status = 0
+};
+
+struct rte_vhost_vring_state {
+	rte_spinlock_t lock;
+
+	bool cur[RTE_MAX_QUEUES_PER_PORT * 2];
+	bool seen[RTE_MAX_QUEUES_PER_PORT * 2];
+	unsigned int index;
+	unsigned int max_vring;
+};
+
+static struct rte_vhost_vring_state *vring_states[RTE_MAX_ETHPORTS];
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_rx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Dequeue packets from guest TX queue */
+	nb_rx = rte_vhost_dequeue_burst(r->device,
+			r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
+
+	r->rx_pkts += nb_rx;
+
+	for (i = 0; likely(i < nb_rx); i++) {
+		bufs[i]->port = r->port;
+		r->rx_bytes += bufs[i]->pkt_len;
+	}
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_tx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Enqueue packets to guest RX queue */
+	nb_tx = rte_vhost_enqueue_burst(r->device,
+			r->virtqueue_id, bufs, nb_bufs);
+
+	r->tx_pkts += nb_tx;
+	r->missed_pkts += nb_bufs - nb_tx;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		r->tx_bytes += bufs[i]->pkt_len;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		rte_pktmbuf_free(bufs[i]);
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	return 0;
+}
+
+static inline struct internal_list *
+find_internal_resource(char *ifname)
+{
+	int found = 0;
+	struct internal_list *list;
+	struct pmd_internal *internal;
+
+	if (ifname == NULL)
+		return NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(list, &internal_list, next) {
+		internal = list->eth_dev->data->dev_private;
+		if (!strcmp(internal->iface_name, ifname)) {
+			found = 1;
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	if (!found)
+		return NULL;
+
+	return list;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct internal_list *list;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	list = find_internal_resource(dev->ifname);
+	if (list == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid device name\n");
+		return -1;
+	}
+
+	eth_dev = list->eth_dev;
+	internal = eth_dev->data->dev_private;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+		vq->port = eth_dev->data->port_id;
+		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+		vq->port = eth_dev->data->port_id;
+		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
+	}
+
+	dev->flags |= VIRTIO_DEV_RUNNING;
+	dev->priv = eth_dev;
+	eth_dev->data->dev_link.link_status = 1;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+
+	RTE_LOG(INFO, PMD, "New connection established\n");
+
+	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
+
+	return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return;
+	}
+
+	eth_dev = (struct rte_eth_dev *)dev->priv;
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+		return;
+	}
+
+	/* Wait until rx/tx_pkt_burst stops accessing vhost device */
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+
+	eth_dev->data->dev_link.link_status = 0;
+
+	dev->priv = NULL;
+	dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+
+	RTE_LOG(INFO, PMD, "Connection closed\n");
+
+	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
+}
+
+static int
+vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
+{
+	struct rte_vhost_vring_state *state;
+	struct rte_eth_dev *eth_dev;
+	struct internal_list *list;
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	int newnode, ret;
+#endif
+
+	if (dev == NULL) {
+		RTE_LOG(ERR, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	list = find_internal_resource(dev->ifname);
+	if (list == NULL) {
+		RTE_LOG(ERR, PMD, "Invalid interface name: %s\n", dev->ifname);
+		return -1;
+	}
+
+	eth_dev = list->eth_dev;
+	/* won't be NULL */
+	state = vring_states[eth_dev->data->port_id];
+
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	ret  = get_mempolicy(&newnode, NULL, 0, dev,
+			MPOL_F_NODE | MPOL_F_ADDR);
+	if (ret < 0) {
+		RTE_LOG(ERR, PMD, "Unknow numa node\n");
+		return -1;
+	}
+
+	eth_dev->data->numa_node = newnode;
+#endif
+	rte_spinlock_lock(&state->lock);
+	state->cur[vring] = enable;
+	state->max_vring = RTE_MAX(vring, state->max_vring);
+	rte_spinlock_unlock(&state->lock);
+
+	RTE_LOG(INFO, PMD, "vring%u is %s\n",
+			vring, enable ? "enabled" : "disabled");
+
+	_rte_eth_dev_callback_process(eth_dev,
+			RTE_ETH_EVENT_QUEUE_STATE_CHANGE);
+
+	return 0;
+}
+
+int
+rte_eth_vhost_get_queue_event(uint8_t port_id,
+		struct rte_eth_vhost_queue_event *event)
+{
+	struct rte_vhost_vring_state *state;
+	unsigned int i;
+	int idx;
+
+	if (port_id >= RTE_MAX_ETHPORTS) {
+		RTE_LOG(ERR, PMD, "Invalid port id\n");
+		return -1;
+	}
+
+	state = vring_states[port_id];
+	if (!state) {
+		RTE_LOG(ERR, PMD, "Unused port\n");
+		return -1;
+	}
+
+	rte_spinlock_lock(&state->lock);
+	for (i = 0; i <= state->max_vring; i++) {
+		idx = state->index++ % (state->max_vring + 1);
+
+		if (state->cur[idx] != state->seen[idx]) {
+			state->seen[idx] = state->cur[idx];
+			event->queue_id = idx / 2;
+			event->rx = idx & 1;
+			event->enable = state->cur[idx];
+			rte_spinlock_unlock(&state->lock);
+			return 0;
+		}
+	}
+	rte_spinlock_unlock(&state->lock);
+
+	return -1;
+}
+
+static void *
+vhost_driver_session(void *param __rte_unused)
+{
+	static struct virtio_net_device_ops vhost_ops;
+
+	/* set vhost arguments */
+	vhost_ops.new_device = new_device;
+	vhost_ops.destroy_device = destroy_device;
+	vhost_ops.vring_state_changed = vring_state_changed;
+	if (rte_vhost_driver_callback_register(&vhost_ops) < 0)
+		RTE_LOG(ERR, PMD, "Can't register callbacks\n");
+
+	/* start event handling */
+	rte_vhost_driver_session_start();
+
+	return NULL;
+}
+
+static int
+vhost_driver_session_start(void)
+{
+	int ret;
+
+	ret = pthread_create(&session_th,
+			NULL, vhost_driver_session, NULL);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Can't create a thread\n");
+
+	return ret;
+}
+
+static void
+vhost_driver_session_stop(void)
+{
+	int ret;
+
+	ret = pthread_cancel(session_th);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Can't cancel the thread\n");
+
+	ret = pthread_join(session_th, NULL);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+	int ret = 0;
+
+	if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+		ret = rte_vhost_driver_register(internal->iface_name);
+		if (ret)
+			return ret;
+	}
+
+	/* We need only one message handling thread */
+	if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
+		ret = vhost_driver_session_start();
+
+	return ret;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 1, 0))
+		rte_vhost_driver_unregister(internal->iface_name);
+
+	if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
+		vhost_driver_session_stop();
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct vhost_queue *vq;
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->mb_pool = mb_pool;
+	vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
+	dev->data->rx_queues[rx_queue_id] = vq;
+
+	return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct vhost_queue *vq;
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
+	dev->data->tx_queues[tx_queue_id] = vq;
+
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+	     struct rte_eth_dev_info *dev_info)
+{
+	dev_info->driver_name = drivername;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = (uint32_t)-1;
+	dev_info->max_rx_queues = dev->data->nb_rx_queues;
+	dev_info->max_tx_queues = dev->data->nb_tx_queues;
+	dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	unsigned i;
+	unsigned long rx_total = 0, tx_total = 0, tx_missed_total = 0;
+	unsigned long rx_total_bytes = 0, tx_total_bytes = 0;
+	struct vhost_queue *vq;
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+			i < dev->data->nb_rx_queues; i++) {
+		if (dev->data->rx_queues[i] == NULL)
+			continue;
+		vq = dev->data->rx_queues[i];
+		stats->q_ipackets[i] = vq->rx_pkts;
+		rx_total += stats->q_ipackets[i];
+
+		stats->q_ibytes[i] = vq->rx_bytes;
+		rx_total_bytes += stats->q_ibytes[i];
+	}
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+			i < dev->data->nb_tx_queues; i++) {
+		if (dev->data->tx_queues[i] == NULL)
+			continue;
+		vq = dev->data->tx_queues[i];
+		stats->q_opackets[i] = vq->tx_pkts;
+		tx_missed_total += vq->missed_pkts;
+		tx_total += stats->q_opackets[i];
+
+		stats->q_obytes[i] = vq->tx_bytes;
+		tx_total_bytes += stats->q_obytes[i];
+	}
+
+	stats->ipackets = rx_total;
+	stats->opackets = tx_total;
+	stats->imissed = tx_missed_total;
+	stats->ibytes = rx_total_bytes;
+	stats->obytes = tx_total_bytes;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct vhost_queue *vq;
+	unsigned i;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		if (dev->data->rx_queues[i] == NULL)
+			continue;
+		vq = dev->data->rx_queues[i];
+		vq->rx_pkts = 0;
+		vq->rx_bytes = 0;
+	}
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		if (dev->data->tx_queues[i] == NULL)
+			continue;
+		vq = dev->data->tx_queues[i];
+		vq->tx_pkts = 0;
+		vq->tx_bytes = 0;
+		vq->missed_pkts = 0;
+	}
+}
+
+static void
+eth_queue_release(void *q)
+{
+	rte_free(q);
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+/**
+ * Disable features in feature_mask. Returns 0 on success.
+ */
+int
+rte_eth_vhost_feature_disable(uint64_t feature_mask)
+{
+	return rte_vhost_feature_disable(feature_mask);
+}
+
+/**
+ * Enable features in feature_mask. Returns 0 on success.
+ */
+int
+rte_eth_vhost_feature_enable(uint64_t feature_mask)
+{
+	return rte_vhost_feature_enable(feature_mask);
+}
+
+/* Returns currently supported vhost features */
+uint64_t
+rte_eth_vhost_feature_get(void)
+{
+	return rte_vhost_feature_get();
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, char *iface_name, int16_t queues,
+		     const unsigned numa_node)
+{
+	struct rte_eth_dev_data *data = NULL;
+	struct pmd_internal *internal = NULL;
+	struct rte_eth_dev *eth_dev = NULL;
+	struct ether_addr *eth_addr = NULL;
+	struct rte_vhost_vring_state *vring_state = NULL;
+	struct internal_list *list = NULL;
+
+	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+		numa_node);
+
+	/* now do all data allocation - for eth_dev structure, dummy pci driver
+	 * and internal (private) data
+	 */
+	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+	if (data == NULL)
+		goto error;
+
+	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+	if (internal == NULL)
+		goto error;
+
+	list = rte_zmalloc_socket(name, sizeof(*list), 0, numa_node);
+	if (list == NULL)
+		goto error;
+
+	/* reserve an ethdev entry */
+	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+	if (eth_dev == NULL)
+		goto error;
+
+	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+	if (eth_addr == NULL)
+		goto error;
+	*eth_addr = base_eth_addr;
+	eth_addr->addr_bytes[5] = eth_dev->data->port_id;
+
+	vring_state = rte_zmalloc_socket(name,
+			sizeof(*vring_state), 0, numa_node);
+	if (vring_state == NULL)
+		goto error;
+
+	TAILQ_INIT(&eth_dev->link_intr_cbs);
+
+	/* now put it all together
+	 * - store queue data in internal,
+	 * - store numa_node info in ethdev data
+	 * - point eth_dev_data to internals
+	 * - and point eth_dev structure to new eth_dev_data structure
+	 */
+	internal->dev_name = strdup(name);
+	if (internal->dev_name == NULL)
+		goto error;
+	internal->iface_name = strdup(iface_name);
+	if (internal->iface_name == NULL)
+		goto error;
+
+	list->eth_dev = eth_dev;
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internal_list, list, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	rte_spinlock_init(&vring_state->lock);
+	vring_states[eth_dev->data->port_id] = vring_state;
+
+	data->dev_private = internal;
+	data->port_id = eth_dev->data->port_id;
+	memmove(data->name, eth_dev->data->name, sizeof(data->name));
+	data->nb_rx_queues = queues;
+	data->nb_tx_queues = queues;
+	data->dev_link = pmd_link;
+	data->mac_addrs = eth_addr;
+
+	/* We'll replace the 'data' originally allocated by eth_dev. So the
+	 * vhost PMD resources won't be shared between multi processes.
+	 */
+	eth_dev->data = data;
+	eth_dev->dev_ops = &ops;
+	eth_dev->driver = NULL;
+	data->dev_flags =
+		RTE_ETH_DEV_DETACHABLE | RTE_ETH_DEV_INTR_LSC;
+	data->kdrv = RTE_KDRV_NONE;
+	data->drv_name = internal->dev_name;
+	data->numa_node = numa_node;
+
+	/* finally assign rx and tx ops */
+	eth_dev->rx_pkt_burst = eth_vhost_rx;
+	eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+	return data->port_id;
+
+error:
+	if (internal)
+		free(internal->dev_name);
+	rte_free(vring_state);
+	rte_free(eth_addr);
+	if (eth_dev)
+		rte_eth_dev_release_port(eth_dev);
+	rte_free(internal);
+	rte_free(list);
+	rte_free(data);
+
+	return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	const char **iface_name = extra_args;
+
+	if (value == NULL)
+		return -1;
+
+	*iface_name = value;
+
+	return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *q = extra_args;
+
+	if (value == NULL || extra_args == NULL)
+		return -EINVAL;
+
+	*q = (uint16_t)strtoul(value, NULL, 0);
+	if (*q == USHRT_MAX && errno == ERANGE)
+		return -1;
+
+	if (*q > RTE_MAX_QUEUES_PER_PORT)
+		return -1;
+
+	return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
+	char *iface_name;
+	uint16_t queues;
+
+	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+		return -1;
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+					 &open_iface, &iface_name);
+		if (ret < 0)
+			goto out_free;
+	} else {
+		ret = -1;
+		goto out_free;
+	}
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+					 &open_queues, &queues);
+		if (ret < 0)
+			goto out_free;
+
+	} else
+		queues = 1;
+
+	eth_dev_vhost_create(name, iface_name, queues, rte_socket_id());
+
+out_free:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internal *internal;
+	struct internal_list *list;
+	unsigned int i;
+
+	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+	/* find an ethdev entry */
+	eth_dev = rte_eth_dev_allocated(name);
+	if (eth_dev == NULL)
+		return -ENODEV;
+
+	internal = eth_dev->data->dev_private;
+	if (internal == NULL)
+		return -ENODEV;
+
+	list = find_internal_resource(internal->iface_name);
+	if (list == NULL)
+		return -ENODEV;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_REMOVE(&internal_list, list, next);
+	pthread_mutex_unlock(&internal_list_lock);
+	rte_free(list);
+
+	eth_dev_stop(eth_dev);
+
+	rte_free(vring_states[eth_dev->data->port_id]);
+	vring_states[eth_dev->data->port_id] = NULL;
+
+	free(internal->dev_name);
+	free(internal->iface_name);
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++)
+		rte_free(eth_dev->data->rx_queues[i]);
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
+		rte_free(eth_dev->data->tx_queues[i]);
+
+	rte_free(eth_dev->data->mac_addrs);
+	rte_free(eth_dev->data);
+	rte_free(internal);
+
+	rte_eth_dev_release_port(eth_dev);
+
+	return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+	.name = "eth_vhost",
+	.type = PMD_VDEV,
+	.init = rte_pmd_vhost_devinit,
+	.uninit = rte_pmd_vhost_devuninit,
+};
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..e78cb74
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,109 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co., Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_VHOST_H_
+#define _RTE_ETH_VHOST_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+#include <stdbool.h>
+
+#include <rte_virtio_net.h>
+
+/**
+ * Disable features in feature_mask.
+ *
+ * @param feature_mask
+ *  Vhost features defined in "linux/virtio_net.h".
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_feature_disable(uint64_t feature_mask);
+
+/**
+ * Enable features in feature_mask.
+ *
+ * @param feature_mask
+ *  Vhost features defined in "linux/virtio_net.h".
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_feature_enable(uint64_t feature_mask);
+
+/**
+ * Returns currently supported vhost features.
+ *
+ * @return
+ *  Vhost features defined in "linux/virtio_net.h".
+ */
+uint64_t rte_eth_vhost_feature_get(void);
+
+/*
+ * Event description.
+ */
+struct rte_eth_vhost_queue_event {
+	uint16_t queue_id;
+	bool rx;
+	bool enable;
+};
+
+/**
+ * Get queue events from specified port.
+ * If a callback for below event is registered by
+ * rte_eth_dev_callback_register(), this function will describe what was
+ * changed.
+ *  - RTE_ETH_EVENT_QUEUE_STATE_CHANGE
+ * Multiple events may cause only one callback kicking, so call this function
+ * while returning 0.
+ *
+ * @param port_id
+ *  Port id.
+ * @param event
+ *  Pointer to a rte_eth_vhost_queue_event structure.
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_get_queue_event(uint8_t port_id,
+		struct rte_eth_vhost_queue_event *event);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..65bf3a8
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,10 @@
+DPDK_16.04 {
+	global:
+
+	rte_eth_vhost_feature_disable;
+	rte_eth_vhost_feature_enable;
+	rte_eth_vhost_feature_get;
+	rte_eth_vhost_get_queue_event;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index a1cd9a3..bd973e8 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -166,6 +166,12 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SNOW3G)     += -lrte_pmd_snow3g
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SNOW3G)     += -L$(LIBSSO_PATH)/build -lsso
 endif # CONFIG_RTE_LIBRTE_CRYPTODEV
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
+
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
 
 _LDLIBS-y += $(EXECENV_LDLIBS)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [PATCH v12 2/2] vhost: Add VHOST PMD
  2016-03-15  8:31                                     ` [PATCH v12 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
@ 2016-03-18 12:27                                       ` Bruce Richardson
  2016-03-18 13:41                                         ` Tetsuya Mukawa
  2016-03-21  5:41                                         ` Tetsuya Mukawa
  2016-03-21  5:45                                       ` [PATCH v13 0/2] " Tetsuya Mukawa
                                                         ` (2 subsequent siblings)
  3 siblings, 2 replies; 200+ messages in thread
From: Bruce Richardson @ 2016-03-18 12:27 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying

On Tue, Mar 15, 2016 at 05:31:41PM +0900, Tetsuya Mukawa wrote:
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> The vhost messages will be handled only when a port is started. So start
> a port first, then invoke QEMU.
> 
> The PMD has 2 parameters.
>  - iface:  The parameter is used to specify a path to connect to a
>            virtio-net device.
>  - queues: The parameter is used to specify the number of the queues
>            virtio-net device has.
>            (Default: 1)
> 
> Here is an example.
> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
> 
> To connect above testpmd, here is qemu command example.
> 
> $ qemu-system-x86_64 \
>         <snip>
>         -chardev socket,id=chr0,path=/tmp/sock0 \
>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
>         -device virtio-net-pci,netdev=net0,mq=on
> 
> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> Acked-by: Rich Lane <rich.lane@bigswitch.com>
> Tested-by: Rich Lane <rich.lane@bigswitch.com>

Hi Tetsuya,

I hope to get this set merged for RC2 very soon. Can you provide an update for
the nic overview.rst doc listing out the features of this new PMD. If you want,
you can provide it as a separate patch, that I will merge into this one for you
on apply to next-net.

If you do decide to respin this patchset with the extra doc, please take into
account the following patchwork issues also - otherwise I'll also fix them on
apply:

WARNING:STATIC_CONST_CHAR_ARRAY: static const char * array should probably be static const char * const
#364: FILE: drivers/net/vhost/rte_eth_vhost.c:56:
+static const char *valid_arguments[] = {

WARNING:LINE_SPACING: Missing a blank line after declarations
#399: FILE: drivers/net/vhost/rte_eth_vhost.c:91:
+       char *iface_name;
+       volatile uint16_t once;

WARNING:TYPO_SPELLING: 'Unknow' may be misspelled - perhaps 'Unknown'?
#684: FILE: drivers/net/vhost/rte_eth_vhost.c:376:
+               RTE_LOG(ERR, PMD, "Unknow numa node\n");

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v12 2/2] vhost: Add VHOST PMD
  2016-03-18 12:27                                       ` Bruce Richardson
@ 2016-03-18 13:41                                         ` Tetsuya Mukawa
  2016-03-18 13:52                                           ` Thomas Monjalon
  2016-03-21  5:41                                         ` Tetsuya Mukawa
  1 sibling, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-18 13:41 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, Zhuangyanying

2016/03/18 午後9:27 "Bruce Richardson" <bruce.richardson@intel.com>:
>
> On Tue, Mar 15, 2016 at 05:31:41PM +0900, Tetsuya Mukawa wrote:
> > The patch introduces a new PMD. This PMD is implemented as thin wrapper
> > of librte_vhost. It means librte_vhost is also needed to compile the
PMD.
> > The vhost messages will be handled only when a port is started. So start
> > a port first, then invoke QEMU.
> >
> > The PMD has 2 parameters.
> >  - iface:  The parameter is used to specify a path to connect to a
> >            virtio-net device.
> >  - queues: The parameter is used to specify the number of the queues
> >            virtio-net device has.
> >            (Default: 1)
> >
> > Here is an example.
> > $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' --
-i
> >
> > To connect above testpmd, here is qemu command example.
> >
> > $ qemu-system-x86_64 \
> >         <snip>
> >         -chardev socket,id=chr0,path=/tmp/sock0 \
> >         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
> >         -device virtio-net-pci,netdev=net0,mq=on
> >
> > Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> > Acked-by: Rich Lane <rich.lane@bigswitch.com>
> > Tested-by: Rich Lane <rich.lane@bigswitch.com>
>
> Hi Tetsuya,
>
> I hope to get this set merged for RC2 very soon. Can you provide an
update for
> the nic overview.rst doc listing out the features of this new PMD. If you
want,
> you can provide it as a separate patch, that I will merge into this one
for you
> on apply to next-net.
>
> If you do decide to respin this patchset with the extra doc, please take
into
> account the following patchwork issues also - otherwise I'll also fix
them on
> apply:
>
> WARNING:STATIC_CONST_CHAR_ARRAY: static const char * array should
probably be static const char * const
> #364: FILE: drivers/net/vhost/rte_eth_vhost.c:56:
> +static const char *valid_arguments[] = {
>
> WARNING:LINE_SPACING: Missing a blank line after declarations
> #399: FILE: drivers/net/vhost/rte_eth_vhost.c:91:
> +       char *iface_name;
> +       volatile uint16_t once;
>
> WARNING:TYPO_SPELLING: 'Unknow' may be misspelled - perhaps 'Unknown'?
> #684: FILE: drivers/net/vhost/rte_eth_vhost.c:376:
> +               RTE_LOG(ERR, PMD, "Unknow numa node\n");
>
> Regards,
> /Bruce
>

Hi Bruce,

I've sent the v12 patch with vhost.rst.
Could you please check below?

http://dpdk.org/dev/patchwork/project/dpdk/list/?submitter=64

Is this the documentation I need to add?

Anyway, it contains above nits. So could you please fix it before merging,
if it's the documentation?

Regards,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v12 2/2] vhost: Add VHOST PMD
  2016-03-18 13:41                                         ` Tetsuya Mukawa
@ 2016-03-18 13:52                                           ` Thomas Monjalon
  2016-03-18 14:03                                             ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Thomas Monjalon @ 2016-03-18 13:52 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, Bruce Richardson, Zhuangyanying

2016-03-18 22:41, Tetsuya Mukawa:
> 2016/03/18 午後9:27 "Bruce Richardson" <bruce.richardson@intel.com>:
> > I hope to get this set merged for RC2 very soon. Can you provide an
> > update for the nic overview.rst doc listing out the features of
> > this new PMD.
[...]
> I've sent the v12 patch with vhost.rst.
> Could you please check below?

Bruce is talking about the table of features in overview.rst.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v12 1/2] ethdev: Add a new event type to notify a queue state changed event
  2016-03-15  8:31                                     ` [PATCH v12 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
@ 2016-03-18 13:54                                       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Monjalon @ 2016-03-18 13:54 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, bruce.richardson, ann.zhuangyanying

2016-03-15 17:31, Tetsuya Mukawa:
> This patch adds a below event type.
>  - RTE_ETH_EVENT_QUEUE_STATE_CHANGE
> This event is used for notifying a queue state changed event.
[...]
>  enum rte_eth_event_type {
>  	RTE_ETH_EVENT_UNKNOWN,  /**< unknown event type */
>  	RTE_ETH_EVENT_INTR_LSC, /**< lsc interrupt event */
> +	RTE_ETH_EVENT_QUEUE_STATE_CHANGE,
> +				/**< queue state changed interrupt */

This comment is not really helpful.
Please could you describe what is a queue state?
Is it only applicable to vhost?
Can we say there is a real interrupt or simply an event?

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v12 2/2] vhost: Add VHOST PMD
  2016-03-18 13:52                                           ` Thomas Monjalon
@ 2016-03-18 14:03                                             ` Tetsuya Mukawa
  2016-03-18 14:13                                               ` Bruce Richardson
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-18 14:03 UTC (permalink / raw)
  To: Thomas Monjalon, Bruce Richardson; +Cc: dev, Zhuangyanying

On 2016/03/18 22:52, Thomas Monjalon wrote:
> 2016-03-18 22:41, Tetsuya Mukawa:
>> 2016/03/18 午後9:27 "Bruce Richardson" <bruce.richardson@intel.com>:
>>> I hope to get this set merged for RC2 very soon. Can you provide an
>>> update for the nic overview.rst doc listing out the features of
>>> this new PMD.
> [...]
>> I've sent the v12 patch with vhost.rst.
>> Could you please check below?
> Bruce is talking about the table of features in overview.rst.

Hi Bruce and Thomas,

Thanks, I've got it.
Now I am out of office, so I will send the patch separately by hopefully
tomorrow.
Could you please apply it separately?

Regards,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v12 2/2] vhost: Add VHOST PMD
  2016-03-18 14:03                                             ` Tetsuya Mukawa
@ 2016-03-18 14:13                                               ` Bruce Richardson
  2016-03-18 14:21                                                 ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Bruce Richardson @ 2016-03-18 14:13 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: Thomas Monjalon, dev, Zhuangyanying

On Fri, Mar 18, 2016 at 11:03:56PM +0900, Tetsuya Mukawa wrote:
> On 2016/03/18 22:52, Thomas Monjalon wrote:
> > 2016-03-18 22:41, Tetsuya Mukawa:
> >> 2016/03/18 午後9:27 "Bruce Richardson" <bruce.richardson@intel.com>:
> >>> I hope to get this set merged for RC2 very soon. Can you provide an
> >>> update for the nic overview.rst doc listing out the features of
> >>> this new PMD.
> > [...]
> >> I've sent the v12 patch with vhost.rst.
> >> Could you please check below?
> > Bruce is talking about the table of features in overview.rst.
> 
> Hi Bruce and Thomas,
> 
> Thanks, I've got it.
> Now I am out of office, so I will send the patch separately by hopefully
> tomorrow.
> Could you please apply it separately?
> 
I'll hold applying the patchset until I have all relevant bits ready to go together.
If it's in by Monday, it should be ok. Please also supply a more detailed comment
for the new flag addition that Thomas has called out on patch 1. That needs to be
resolved too before apply.

/Bruce

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v12 2/2] vhost: Add VHOST PMD
  2016-03-18 14:13                                               ` Bruce Richardson
@ 2016-03-18 14:21                                                 ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-18 14:21 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Thomas Monjalon, dev, Zhuangyanying

On 2016/03/18 23:13, Bruce Richardson wrote:
> On Fri, Mar 18, 2016 at 11:03:56PM +0900, Tetsuya Mukawa wrote:
>> On 2016/03/18 22:52, Thomas Monjalon wrote:
>>> 2016-03-18 22:41, Tetsuya Mukawa:
>>>> 2016/03/18 午後9:27 "Bruce Richardson" <bruce.richardson@intel.com>:
>>>>> I hope to get this set merged for RC2 very soon. Can you provide an
>>>>> update for the nic overview.rst doc listing out the features of
>>>>> this new PMD.
>>> [...]
>>>> I've sent the v12 patch with vhost.rst.
>>>> Could you please check below?
>>> Bruce is talking about the table of features in overview.rst.
>> Hi Bruce and Thomas,
>>
>> Thanks, I've got it.
>> Now I am out of office, so I will send the patch separately by hopefully
>> tomorrow.
>> Could you please apply it separately?
>>
> I'll hold applying the patchset until I have all relevant bits ready to go together.
> If it's in by Monday, it should be ok. 

I appreciate it.

> Please also supply a more detailed comment
> for the new flag addition that Thomas has called out on patch 1. That needs to be
> resolved too before apply.

Sure, I will add more description.

Regards,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v12 2/2] vhost: Add VHOST PMD
  2016-03-18 12:27                                       ` Bruce Richardson
  2016-03-18 13:41                                         ` Tetsuya Mukawa
@ 2016-03-21  5:41                                         ` Tetsuya Mukawa
  1 sibling, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-21  5:41 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, ann.zhuangyanying

On 2016/03/18 21:27, Bruce Richardson wrote:
> On Tue, Mar 15, 2016 at 05:31:41PM +0900, Tetsuya Mukawa wrote:
>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
>> The vhost messages will be handled only when a port is started. So start
>> a port first, then invoke QEMU.
>>
>> The PMD has 2 parameters.
>>  - iface:  The parameter is used to specify a path to connect to a
>>            virtio-net device.
>>  - queues: The parameter is used to specify the number of the queues
>>            virtio-net device has.
>>            (Default: 1)
>>
>> Here is an example.
>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
>>
>> To connect above testpmd, here is qemu command example.
>>
>> $ qemu-system-x86_64 \
>>         <snip>
>>         -chardev socket,id=chr0,path=/tmp/sock0 \
>>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
>>         -device virtio-net-pci,netdev=net0,mq=on
>>
>> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
>> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
>> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>> Acked-by: Rich Lane <rich.lane@bigswitch.com>
>> Tested-by: Rich Lane <rich.lane@bigswitch.com>
> Hi Tetsuya,
>
> I hope to get this set merged for RC2 very soon. Can you provide an update for
> the nic overview.rst doc listing out the features of this new PMD. If you want,
> you can provide it as a separate patch, that I will merge into this one for you
> on apply to next-net.
>
> If you do decide to respin this patchset with the extra doc, please take into
> account the following patchwork issues also - otherwise I'll also fix them on
> apply:
>
> WARNING:STATIC_CONST_CHAR_ARRAY: static const char * array should probably be static const char * const
> #364: FILE: drivers/net/vhost/rte_eth_vhost.c:56:
> +static const char *valid_arguments[] = {

It seems this is false positive. So I will leave it.

Regards,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v13 0/2] Add VHOST PMD
  2016-03-15  8:31                                     ` [PATCH v12 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  2016-03-18 12:27                                       ` Bruce Richardson
@ 2016-03-21  5:45                                       ` Tetsuya Mukawa
  2016-03-21 12:42                                         ` Bruce Richardson
  2016-03-21  5:45                                       ` [PATCH v13 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
  2016-03-21  5:45                                       ` [PATCH v13 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  3 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-21  5:45 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, ann.zhuangyanying, thomas.monjalon, Tetsuya Mukawa

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.

PATCH v13 changes:
 - Rebase on latest master.
 - Fix commit log of below patch.
   - ethdev: Add a new event type to notify a queue state changed event
 - Fix warnings from checkpatch.pl.
 - Add description to doc/guides/nics/overview.rst.

PATCH v12 changes:
 - Rebase on latest master.
 - Add a missing documentation.

PATCH v11 changes:
 - Rebase on latest master.
 - Fix MAINTAINERS file.
 - Fix Acked-by and Tested-by signatures of commit log.

PATCH v10 changes:
 - Rebase on latest master.
 - Fix DPDK version number(2.3 to 16.04)
 - Set port id to mbuf while receiving packets.

PATCH v9 changes:
 - Fix a null pointer access issue implemented in v8 patch.

PATCH v8 changes:
 - Manage ether devices list instead of internal structures list.
 - Remove needless NULL checking.
 - Replace "pthread_exit" to "return NULL".
 - Replace rte_panic to RTE_LOG, also add error handling.
 - Remove duplicated lines.
 - Remove needless casting.
 - Follow coding style.
 - Remove needless parenthesis.

PATCH v7 changes:
 - Remove needless parenthesis.
 - Add release note.
 - Remove needless line wraps.
 - Add null pointer check in vring_state_changed().
 - Free queue memory in eth_queue_release().
 - Fix wrong variable name.
 - Fix error handling code of eth_dev_vhost_create() and
   rte_pmd_vhost_devuninit().
 - Remove needless null checking from rte_pmd_vhost_devinit/devuninit().
 - Use port id to create mac address.
 - Add doxygen style comments in "rte_eth_vhost.h".
 - Fix wrong comment in "mk/rte.app.mk".

PATCH v6 changes:
 - Remove rte_vhost_driver_pmd_callback_registe().
 - Support link status interrupt.
 - Support queue state changed interrupt.
 - Add rte_eth_vhost_get_queue_event().
 - Support numa node detection when new device is connected.

PATCH v5 changes:
 - Rebase on latest master.
 - Fix RX/TX routine to count RX/TX bytes.
 - Fix RX/TX routine not to count as error packets if enqueue/dequeue
   cannot send all packets.
 - Fix if-condition checking for multiqueues.
 - Add "static" to pthread variable.
 - Fix format.
 - Change default behavior not to receive queueing event from driver.
 - Split the patch to separate rte_eth_vhost_portid2vdev().

PATCH v4 changes:
 - Rebase on latest DPDK tree.
 - Fix cording style.
 - Fix code not to invoke multiple messaging handling threads.
 - Fix code to handle vdev parameters correctly.
 - Remove needless cast.
 - Remove needless if-condition before rt_free().

PATCH v3 changes:
 - Rebase on latest matser
 - Specify correct queue_id in RX/TX function.

PATCH v2 changes:
 - Remove a below patch that fixes vhost library.
   The patch was applied as a separate patch.
   - vhost: fix crash with multiqueue enabled
 - Fix typos.
   (Thanks to Thomas, Monjalon)
 - Rebase on latest tree with above bernard's patches.

PATCH v1 changes:
 - Support vhost multiple queues.
 - Rebase on "remove pci driver from vdevs".
 - Optimize RX/TX functions.
 - Fix resource leaks.
 - Fix compile issue.
 - Add patch to fix vhost library.

RFC PATCH v3 changes:
 - Optimize performance.
   In RX/TX functions, change code to access only per core data.
 - Add below API to allow user to use vhost library APIs for a port managed
   by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
    - rte_eth_vhost_portid2vdev()
   To support this functionality, vhost library is also changed.
   Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
 - Add code to support vhost multiple queues.
   Actually, multiple queues functionality is not enabled so far.

RFC PATCH v2 changes:
 - Fix issues reported by checkpatch.pl
   (Thanks to Stephen Hemminger)


Tetsuya Mukawa (2):
  ethdev: Add a new event type to notify a queue state changed event
  vhost: Add VHOST PMD

 MAINTAINERS                                 |   5 +
 config/common_base                          |   6 +
 config/common_linuxapp                      |   1 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/nics/overview.rst                |  37 +-
 doc/guides/nics/vhost.rst                   | 110 ++++
 doc/guides/rel_notes/release_16_04.rst      |   4 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c           | 917 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
 drivers/net/vhost/rte_pmd_vhost_version.map |  10 +
 lib/librte_ether/rte_ethdev.h               |   2 +
 mk/rte.app.mk                               |   6 +
 14 files changed, 1256 insertions(+), 18 deletions(-)
 create mode 100644 doc/guides/nics/vhost.rst
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

-- 
2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v13 1/2] ethdev: Add a new event type to notify a queue state changed event
  2016-03-15  8:31                                     ` [PATCH v12 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  2016-03-18 12:27                                       ` Bruce Richardson
  2016-03-21  5:45                                       ` [PATCH v13 0/2] " Tetsuya Mukawa
@ 2016-03-21  5:45                                       ` Tetsuya Mukawa
  2016-03-21  8:37                                         ` Thomas Monjalon
  2016-03-21  5:45                                       ` [PATCH v13 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
  3 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-21  5:45 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, ann.zhuangyanying, thomas.monjalon, Tetsuya Mukawa

This patch adds a below event type.
 - RTE_ETH_EVENT_QUEUE_STATE_CHANGE

This event will be occured when some queues are enabled or disabled.
So far, only vhost PMD supports the event, and it indicates some queues
are enabled or disabled by virtio-net device. Such an event is needed
because virtio-net device may not enable all queues vhost PMD prepare.

Because only vhost PMD uses the event so far, it isn't an actual hardware
interrupt but a simple software event.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
---
 lib/librte_ether/rte_ethdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index b5704e1..470d7a5 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2917,6 +2917,8 @@ rte_eth_tx_buffer_count_callback(struct rte_mbuf **pkts, uint16_t unsent,
 enum rte_eth_event_type {
 	RTE_ETH_EVENT_UNKNOWN,  /**< unknown event type */
 	RTE_ETH_EVENT_INTR_LSC, /**< lsc interrupt event */
+	RTE_ETH_EVENT_QUEUE_STATE_CHANGE,
+				/**< queue state changed interrupt */
 	RTE_ETH_EVENT_MAX       /**< max value of this enum */
 };
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v13 2/2] vhost: Add VHOST PMD
  2016-03-15  8:31                                     ` [PATCH v12 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
                                                         ` (2 preceding siblings ...)
  2016-03-21  5:45                                       ` [PATCH v13 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
@ 2016-03-21  5:45                                       ` Tetsuya Mukawa
  2016-03-21 15:40                                         ` Loftus, Ciara
  3 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-21  5:45 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, ann.zhuangyanying, thomas.monjalon, Tetsuya Mukawa

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
 - iface:  The parameter is used to specify a path to connect to a
           virtio-net device.
 - queues: The parameter is used to specify the number of the queues
           virtio-net device has.
           (Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \
        <snip>
        -chardev socket,id=chr0,path=/tmp/sock0 \
        -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
        -device virtio-net-pci,netdev=net0,mq=on

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
---
 MAINTAINERS                                 |   5 +
 config/common_base                          |   6 +
 config/common_linuxapp                      |   1 +
 doc/guides/nics/index.rst                   |   1 +
 doc/guides/nics/overview.rst                |  37 +-
 doc/guides/nics/vhost.rst                   | 110 ++++
 doc/guides/rel_notes/release_16_04.rst      |   4 +
 drivers/net/Makefile                        |   4 +
 drivers/net/vhost/Makefile                  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c           | 917 ++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
 drivers/net/vhost/rte_pmd_vhost_version.map |  10 +
 mk/rte.app.mk                               |   6 +
 13 files changed, 1254 insertions(+), 18 deletions(-)
 create mode 100644 doc/guides/nics/vhost.rst
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 8b21979..7a47fc0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -352,6 +352,11 @@ Null PMD
 M: Tetsuya Mukawa <mukawa@igel.co.jp>
 F: drivers/net/null/
 
+Vhost PMD
+M: Tetsuya Mukawa <mukawa@igel.co.jp>
+M: Yuanhan Liu <yuanhan.liu@linux.intel.com>
+F: drivers/net/vhost/
+
 Intel AES-NI GCM PMD
 M: Declan Doherty <declan.doherty@intel.com>
 F: drivers/crypto/aesni_gcm/
diff --git a/config/common_base b/config/common_base
index dbd405b..5efee07 100644
--- a/config/common_base
+++ b/config/common_base
@@ -514,6 +514,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 
 #
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=n
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/config/common_linuxapp b/config/common_linuxapp
index ffbe260..7e698e2 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -40,5 +40,6 @@ CONFIG_RTE_EAL_VFIO=y
 CONFIG_RTE_KNI_KMOD=y
 CONFIG_RTE_LIBRTE_KNI=y
 CONFIG_RTE_LIBRTE_VHOST=y
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
 CONFIG_RTE_LIBRTE_POWER=y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 0b353a8..d53b0c7 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -49,6 +49,7 @@ Network Interface Controller Drivers
     nfp
     szedata2
     virtio
+    vhost
     vmxnet3
     pcap_ring
 
diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst
index 2d4f014..40ca5ec 100644
--- a/doc/guides/nics/overview.rst
+++ b/doc/guides/nics/overview.rst
@@ -74,20 +74,21 @@ Most of these differences are summarized below.
 
 .. table:: Features availability in networking drivers
 
-   ==================== = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
-   Feature              a b b b c e e i i i i i i i i i i f f m m m n n p r s v v v x
-                        f n n o x 1 n 4 4 4 4 g g x x x x m m l l p f u c i z i i m e
-                        p x x n g 0 i 0 0 0 0 b b g g g g 1 1 x x i p l a n e r r x n
-                        a 2 2 d b 0 c e e e e   v b b b b 0 0 4 5 p   l p g d t t n v
-                        c x x i e 0     . v v   f e e e e k k     e         a i i e i
-                        k   v n         . f f       . v v   .               t o o t r
-                        e   f g         .   .       . f f   .               a   . 3 t
-                        t               v   v       v   v   v               2   v
-                                        e   e       e   e   e                   e
-                                        c   c       c   c   c                   c
-   ==================== = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
-   link status                  X     X X                                   X
-   link status event                  X X
+   ==================== = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
+   Feature              a b b b c e e i i i i i i i i i i f f m m m n n p r s v v v v x
+                        f n n o x 1 n 4 4 4 4 g g x x x x m m l l p f u c i z h i i m e
+                        p x x n g 0 i 0 0 0 0 b b g g g g 1 1 x x i p l a n e o r r x n
+                        a 2 2 d b 0 c e e e e   v b b b b 0 0 4 5 p   l p g d s t t n v
+                        c x x i e 0     . v v   f e e e e k k     e         a t i i e i
+                        k   v n         . f f       . v v   .               t   o o t r
+                        e   f g         .   .       . f f   .               a     . 3 t
+                        t               v   v       v   v   v               2     v
+                                        e   e       e   e   e                     e
+                                        c   c       c   c   c                     c
+   ==================== = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
+   link status                  X     X X                                   X X
+   link status event                  X X                                     X
+   queue status event                                                         X
    Rx interrupt                       X X X X
    queue start/stop             X     X X X X                               X
    MTU update                   X
@@ -125,7 +126,7 @@ Most of these differences are summarized below.
    inner L4 checksum                  X   X
    packet type parsing          X     X   X
    timesync                           X X
-   basic stats                  X     X X X X                               X
+   basic stats                  X     X X X X                               X X
    extended stats                     X X X X
    stats per queue              X                                           X
    EEPROM dump
@@ -139,9 +140,9 @@ Most of these differences are summarized below.
    ARMv8
    Power8
    TILE-Gx
-   x86-32                       X     X X X X
-   x86-64                       X     X X X X                               X
+   x86-32                       X     X X X X                                 X
+   x86-64                       X     X X X X                               X X
    usage doc                    X                                           X
    design doc
    perf doc
-   ==================== = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
+   ==================== = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
diff --git a/doc/guides/nics/vhost.rst b/doc/guides/nics/vhost.rst
new file mode 100644
index 0000000..50e8a3a
--- /dev/null
+++ b/doc/guides/nics/vhost.rst
@@ -0,0 +1,110 @@
+..  BSD LICENSE
+    Copyright(c) 2016 IGEL Co., Ltd.. All rights reserved.
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of IGEL Co., Ltd. nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Poll Mode Driver that wraps vhost library
+=========================================
+
+This PMD is a thin wrapper of the DPDK vhost library.
+The user can handle virtqueues as one of normal DPDK port.
+
+Vhost Implementation in DPDK
+----------------------------
+
+Please refer to Chapter "Vhost Library" of *DPDK Programmer's Guide* to know detail of vhost.
+
+Features and Limitations of vhost PMD
+-------------------------------------
+
+Currently, the vhost PMD provides the basic functionality of packet reception, transmission and event handling.
+
+*   It has multiple queues support.
+
+*   It supports ``RTE_ETH_EVENT_INTR_LSC`` and ``RTE_ETH_EVENT_QUEUE_STATE_CHANGE`` events.
+
+*   It supports Port Hotplug functionality.
+
+*   Don't need to stop RX/TX, when the user wants to stop a guest or a virtio-net driver on guest.
+
+Vhost PMD arguments
+-------------------
+
+The user can specify below arguments in `--vdev` option.
+
+#.  ``iface``:
+
+    It is used to specify a path to connect to a QEMU virtio-net device.
+
+#.  ``queues``:
+
+    It is used to specify the number of queues virtio-net device has.
+    (Default: 1)
+
+Vhost PMD event handling
+------------------------
+
+This section describes how to handle vhost PMD events.
+
+The user can register an event callback handler with ``rte_eth_dev_callback_register()``.
+The registered callback handler will be invoked with one of below event types.
+
+#.  ``RTE_ETH_EVENT_INTR_LSC``:
+
+    It means link status of the port was changed.
+
+#.  ``RTE_ETH_EVENT_QUEUE_STATE_CHANGE``:
+
+    It means some of queue statuses were changed. Call ``rte_eth_vhost_get_queue_event()`` in the callback handler.
+    Because changing multiple statuses may occur only one event, call the function repeatedly as long as it doesn't return negative value.
+
+Vhost PMD with testpmd application
+----------------------------------
+
+This section demonstrates vhost PMD with testpmd DPDK sample application.
+
+#.  Launch the testpmd with vhost PMD:
+
+    .. code-block:: console
+
+        ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
+
+    Other basic DPDK preparations like hugepage enabling here.
+    Please refer to the *DPDK Getting Started Guide* for detailed instructions.
+
+#.  Launch the QEMU:
+
+    .. code-block:: console
+
+       qemu-system-x86_64 <snip>
+                   -chardev socket,id=chr0,path=/tmp/sock0 \
+                   -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
+                   -device virtio-net-pci,netdev=net0
+
+    This command attaches one virtio-net device to QEMU guest.
+    After initialization processes between QEMU and DPDK vhost library are done, status of the port will be linked up.
diff --git a/doc/guides/rel_notes/release_16_04.rst b/doc/guides/rel_notes/release_16_04.rst
index 2785b29..2e4bbb3 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -248,6 +248,10 @@ This section should contain new features added in this release. Sample format:
 
   New application implementing an IPsec Security Gateway.
 
+* **Added vhost PMD.**
+
+  Added virtual PMD that wraps librte_vhost.
+
 
 Resolved Issues
 ---------------
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 0c3393f..8ba37fb 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -52,4 +52,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 0000000..f49a69b
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+#   BSD LICENSE
+#
+#   Copyright (c) 2010-2016 Intel Corporation.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
new file mode 100644
index 0000000..6b9d287
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -0,0 +1,917 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2016 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co.,Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <unistd.h>
+#include <pthread.h>
+#include <stdbool.h>
+#ifdef RTE_LIBRTE_VHOST_NUMA
+#include <numaif.h>
+#endif
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_dev.h>
+#include <rte_kvargs.h>
+#include <rte_virtio_net.h>
+#include <rte_spinlock.h>
+
+#include "rte_eth_vhost.h"
+
+#define ETH_VHOST_IFACE_ARG		"iface"
+#define ETH_VHOST_QUEUES_ARG		"queues"
+
+static const char *drivername = "VHOST PMD";
+
+static const char *valid_arguments[] = {
+	ETH_VHOST_IFACE_ARG,
+	ETH_VHOST_QUEUES_ARG,
+	NULL
+};
+
+static struct ether_addr base_eth_addr = {
+	.addr_bytes = {
+		0x56 /* V */,
+		0x48 /* H */,
+		0x4F /* O */,
+		0x53 /* S */,
+		0x54 /* T */,
+		0x00
+	}
+};
+
+struct vhost_queue {
+	rte_atomic32_t allow_queuing;
+	rte_atomic32_t while_queuing;
+	struct virtio_net *device;
+	struct pmd_internal *internal;
+	struct rte_mempool *mb_pool;
+	uint8_t port;
+	uint16_t virtqueue_id;
+	uint64_t rx_pkts;
+	uint64_t tx_pkts;
+	uint64_t missed_pkts;
+	uint64_t rx_bytes;
+	uint64_t tx_bytes;
+};
+
+struct pmd_internal {
+	char *dev_name;
+	char *iface_name;
+
+	volatile uint16_t once;
+};
+
+struct internal_list {
+	TAILQ_ENTRY(internal_list) next;
+	struct rte_eth_dev *eth_dev;
+};
+
+TAILQ_HEAD(internal_list_head, internal_list);
+static struct internal_list_head internal_list =
+	TAILQ_HEAD_INITIALIZER(internal_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static rte_atomic16_t nb_started_ports;
+static pthread_t session_th;
+
+static struct rte_eth_link pmd_link = {
+		.link_speed = 10000,
+		.link_duplex = ETH_LINK_FULL_DUPLEX,
+		.link_status = 0
+};
+
+struct rte_vhost_vring_state {
+	rte_spinlock_t lock;
+
+	bool cur[RTE_MAX_QUEUES_PER_PORT * 2];
+	bool seen[RTE_MAX_QUEUES_PER_PORT * 2];
+	unsigned int index;
+	unsigned int max_vring;
+};
+
+static struct rte_vhost_vring_state *vring_states[RTE_MAX_ETHPORTS];
+
+static uint16_t
+eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_rx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Dequeue packets from guest TX queue */
+	nb_rx = rte_vhost_dequeue_burst(r->device,
+			r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
+
+	r->rx_pkts += nb_rx;
+
+	for (i = 0; likely(i < nb_rx); i++) {
+		bufs[i]->port = r->port;
+		r->rx_bytes += bufs[i]->pkt_len;
+	}
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_rx;
+}
+
+static uint16_t
+eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhost_queue *r = q;
+	uint16_t i, nb_tx = 0;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Enqueue packets to guest RX queue */
+	nb_tx = rte_vhost_enqueue_burst(r->device,
+			r->virtqueue_id, bufs, nb_bufs);
+
+	r->tx_pkts += nb_tx;
+	r->missed_pkts += nb_bufs - nb_tx;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		r->tx_bytes += bufs[i]->pkt_len;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		rte_pktmbuf_free(bufs[i]);
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_tx;
+}
+
+static int
+eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	return 0;
+}
+
+static inline struct internal_list *
+find_internal_resource(char *ifname)
+{
+	int found = 0;
+	struct internal_list *list;
+	struct pmd_internal *internal;
+
+	if (ifname == NULL)
+		return NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(list, &internal_list, next) {
+		internal = list->eth_dev->data->dev_private;
+		if (!strcmp(internal->iface_name, ifname)) {
+			found = 1;
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	if (!found)
+		return NULL;
+
+	return list;
+}
+
+static int
+new_device(struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct internal_list *list;
+	struct pmd_internal *internal;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	list = find_internal_resource(dev->ifname);
+	if (list == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid device name\n");
+		return -1;
+	}
+
+	eth_dev = list->eth_dev;
+	internal = eth_dev->data->dev_private;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+		vq->port = eth_dev->data->port_id;
+		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = dev;
+		vq->internal = internal;
+		vq->port = eth_dev->data->port_id;
+		rte_vhost_enable_guest_notification(dev, vq->virtqueue_id, 0);
+	}
+
+	dev->flags |= VIRTIO_DEV_RUNNING;
+	dev->priv = eth_dev;
+	eth_dev->data->dev_link.link_status = 1;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 1);
+	}
+
+	RTE_LOG(INFO, PMD, "New connection established\n");
+
+	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
+
+	return 0;
+}
+
+static void
+destroy_device(volatile struct virtio_net *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	struct vhost_queue *vq;
+	unsigned i;
+
+	if (dev == NULL) {
+		RTE_LOG(INFO, PMD, "Invalid argument\n");
+		return;
+	}
+
+	eth_dev = (struct rte_eth_dev *)dev->priv;
+	if (eth_dev == NULL) {
+		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+		return;
+	}
+
+	/* Wait until rx/tx_pkt_burst stops accessing vhost device */
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, 0);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+
+	eth_dev->data->dev_link.link_status = 0;
+
+	dev->priv = NULL;
+	dev->flags &= ~VIRTIO_DEV_RUNNING;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		vq = eth_dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		vq = eth_dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		vq->device = NULL;
+	}
+
+	RTE_LOG(INFO, PMD, "Connection closed\n");
+
+	_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_INTR_LSC);
+}
+
+static int
+vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
+{
+	struct rte_vhost_vring_state *state;
+	struct rte_eth_dev *eth_dev;
+	struct internal_list *list;
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	int newnode, ret;
+#endif
+
+	if (dev == NULL) {
+		RTE_LOG(ERR, PMD, "Invalid argument\n");
+		return -1;
+	}
+
+	list = find_internal_resource(dev->ifname);
+	if (list == NULL) {
+		RTE_LOG(ERR, PMD, "Invalid interface name: %s\n", dev->ifname);
+		return -1;
+	}
+
+	eth_dev = list->eth_dev;
+	/* won't be NULL */
+	state = vring_states[eth_dev->data->port_id];
+
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	ret  = get_mempolicy(&newnode, NULL, 0, dev,
+			MPOL_F_NODE | MPOL_F_ADDR);
+	if (ret < 0) {
+		RTE_LOG(ERR, PMD, "Unknown numa node\n");
+		return -1;
+	}
+
+	eth_dev->data->numa_node = newnode;
+#endif
+	rte_spinlock_lock(&state->lock);
+	state->cur[vring] = enable;
+	state->max_vring = RTE_MAX(vring, state->max_vring);
+	rte_spinlock_unlock(&state->lock);
+
+	RTE_LOG(INFO, PMD, "vring%u is %s\n",
+			vring, enable ? "enabled" : "disabled");
+
+	_rte_eth_dev_callback_process(eth_dev,
+			RTE_ETH_EVENT_QUEUE_STATE_CHANGE);
+
+	return 0;
+}
+
+int
+rte_eth_vhost_get_queue_event(uint8_t port_id,
+		struct rte_eth_vhost_queue_event *event)
+{
+	struct rte_vhost_vring_state *state;
+	unsigned int i;
+	int idx;
+
+	if (port_id >= RTE_MAX_ETHPORTS) {
+		RTE_LOG(ERR, PMD, "Invalid port id\n");
+		return -1;
+	}
+
+	state = vring_states[port_id];
+	if (!state) {
+		RTE_LOG(ERR, PMD, "Unused port\n");
+		return -1;
+	}
+
+	rte_spinlock_lock(&state->lock);
+	for (i = 0; i <= state->max_vring; i++) {
+		idx = state->index++ % (state->max_vring + 1);
+
+		if (state->cur[idx] != state->seen[idx]) {
+			state->seen[idx] = state->cur[idx];
+			event->queue_id = idx / 2;
+			event->rx = idx & 1;
+			event->enable = state->cur[idx];
+			rte_spinlock_unlock(&state->lock);
+			return 0;
+		}
+	}
+	rte_spinlock_unlock(&state->lock);
+
+	return -1;
+}
+
+static void *
+vhost_driver_session(void *param __rte_unused)
+{
+	static struct virtio_net_device_ops vhost_ops;
+
+	/* set vhost arguments */
+	vhost_ops.new_device = new_device;
+	vhost_ops.destroy_device = destroy_device;
+	vhost_ops.vring_state_changed = vring_state_changed;
+	if (rte_vhost_driver_callback_register(&vhost_ops) < 0)
+		RTE_LOG(ERR, PMD, "Can't register callbacks\n");
+
+	/* start event handling */
+	rte_vhost_driver_session_start();
+
+	return NULL;
+}
+
+static int
+vhost_driver_session_start(void)
+{
+	int ret;
+
+	ret = pthread_create(&session_th,
+			NULL, vhost_driver_session, NULL);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Can't create a thread\n");
+
+	return ret;
+}
+
+static void
+vhost_driver_session_stop(void)
+{
+	int ret;
+
+	ret = pthread_cancel(session_th);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Can't cancel the thread\n");
+
+	ret = pthread_join(session_th, NULL);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Can't join the thread\n");
+}
+
+static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+	int ret = 0;
+
+	if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
+		ret = rte_vhost_driver_register(internal->iface_name);
+		if (ret)
+			return ret;
+	}
+
+	/* We need only one message handling thread */
+	if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
+		ret = vhost_driver_session_start();
+
+	return ret;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+	struct pmd_internal *internal = dev->data->dev_private;
+
+	if (rte_atomic16_cmpset(&internal->once, 1, 0))
+		rte_vhost_driver_unregister(internal->iface_name);
+
+	if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
+		vhost_driver_session_stop();
+}
+
+static int
+eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool)
+{
+	struct vhost_queue *vq;
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->mb_pool = mb_pool;
+	vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
+	dev->data->rx_queues[rx_queue_id] = vq;
+
+	return 0;
+}
+
+static int
+eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct vhost_queue *vq;
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
+	dev->data->tx_queues[tx_queue_id] = vq;
+
+	return 0;
+}
+
+static void
+eth_dev_info(struct rte_eth_dev *dev,
+	     struct rte_eth_dev_info *dev_info)
+{
+	dev_info->driver_name = drivername;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = (uint32_t)-1;
+	dev_info->max_rx_queues = dev->data->nb_rx_queues;
+	dev_info->max_tx_queues = dev->data->nb_tx_queues;
+	dev_info->min_rx_bufsize = 0;
+}
+
+static void
+eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	unsigned i;
+	unsigned long rx_total = 0, tx_total = 0, tx_missed_total = 0;
+	unsigned long rx_total_bytes = 0, tx_total_bytes = 0;
+	struct vhost_queue *vq;
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+			i < dev->data->nb_rx_queues; i++) {
+		if (dev->data->rx_queues[i] == NULL)
+			continue;
+		vq = dev->data->rx_queues[i];
+		stats->q_ipackets[i] = vq->rx_pkts;
+		rx_total += stats->q_ipackets[i];
+
+		stats->q_ibytes[i] = vq->rx_bytes;
+		rx_total_bytes += stats->q_ibytes[i];
+	}
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+			i < dev->data->nb_tx_queues; i++) {
+		if (dev->data->tx_queues[i] == NULL)
+			continue;
+		vq = dev->data->tx_queues[i];
+		stats->q_opackets[i] = vq->tx_pkts;
+		tx_missed_total += vq->missed_pkts;
+		tx_total += stats->q_opackets[i];
+
+		stats->q_obytes[i] = vq->tx_bytes;
+		tx_total_bytes += stats->q_obytes[i];
+	}
+
+	stats->ipackets = rx_total;
+	stats->opackets = tx_total;
+	stats->imissed = tx_missed_total;
+	stats->ibytes = rx_total_bytes;
+	stats->obytes = tx_total_bytes;
+}
+
+static void
+eth_stats_reset(struct rte_eth_dev *dev)
+{
+	struct vhost_queue *vq;
+	unsigned i;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		if (dev->data->rx_queues[i] == NULL)
+			continue;
+		vq = dev->data->rx_queues[i];
+		vq->rx_pkts = 0;
+		vq->rx_bytes = 0;
+	}
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		if (dev->data->tx_queues[i] == NULL)
+			continue;
+		vq = dev->data->tx_queues[i];
+		vq->tx_pkts = 0;
+		vq->tx_bytes = 0;
+		vq->missed_pkts = 0;
+	}
+}
+
+static void
+eth_queue_release(void *q)
+{
+	rte_free(q);
+}
+
+static int
+eth_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+/**
+ * Disable features in feature_mask. Returns 0 on success.
+ */
+int
+rte_eth_vhost_feature_disable(uint64_t feature_mask)
+{
+	return rte_vhost_feature_disable(feature_mask);
+}
+
+/**
+ * Enable features in feature_mask. Returns 0 on success.
+ */
+int
+rte_eth_vhost_feature_enable(uint64_t feature_mask)
+{
+	return rte_vhost_feature_enable(feature_mask);
+}
+
+/* Returns currently supported vhost features */
+uint64_t
+rte_eth_vhost_feature_get(void)
+{
+	return rte_vhost_feature_get();
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start = eth_dev_start,
+	.dev_stop = eth_dev_stop,
+	.dev_configure = eth_dev_configure,
+	.dev_infos_get = eth_dev_info,
+	.rx_queue_setup = eth_rx_queue_setup,
+	.tx_queue_setup = eth_tx_queue_setup,
+	.rx_queue_release = eth_queue_release,
+	.tx_queue_release = eth_queue_release,
+	.link_update = eth_link_update,
+	.stats_get = eth_stats_get,
+	.stats_reset = eth_stats_reset,
+};
+
+static int
+eth_dev_vhost_create(const char *name, char *iface_name, int16_t queues,
+		     const unsigned numa_node)
+{
+	struct rte_eth_dev_data *data = NULL;
+	struct pmd_internal *internal = NULL;
+	struct rte_eth_dev *eth_dev = NULL;
+	struct ether_addr *eth_addr = NULL;
+	struct rte_vhost_vring_state *vring_state = NULL;
+	struct internal_list *list = NULL;
+
+	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
+		numa_node);
+
+	/* now do all data allocation - for eth_dev structure, dummy pci driver
+	 * and internal (private) data
+	 */
+	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+	if (data == NULL)
+		goto error;
+
+	internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
+	if (internal == NULL)
+		goto error;
+
+	list = rte_zmalloc_socket(name, sizeof(*list), 0, numa_node);
+	if (list == NULL)
+		goto error;
+
+	/* reserve an ethdev entry */
+	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+	if (eth_dev == NULL)
+		goto error;
+
+	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
+	if (eth_addr == NULL)
+		goto error;
+	*eth_addr = base_eth_addr;
+	eth_addr->addr_bytes[5] = eth_dev->data->port_id;
+
+	vring_state = rte_zmalloc_socket(name,
+			sizeof(*vring_state), 0, numa_node);
+	if (vring_state == NULL)
+		goto error;
+
+	TAILQ_INIT(&eth_dev->link_intr_cbs);
+
+	/* now put it all together
+	 * - store queue data in internal,
+	 * - store numa_node info in ethdev data
+	 * - point eth_dev_data to internals
+	 * - and point eth_dev structure to new eth_dev_data structure
+	 */
+	internal->dev_name = strdup(name);
+	if (internal->dev_name == NULL)
+		goto error;
+	internal->iface_name = strdup(iface_name);
+	if (internal->iface_name == NULL)
+		goto error;
+
+	list->eth_dev = eth_dev;
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internal_list, list, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	rte_spinlock_init(&vring_state->lock);
+	vring_states[eth_dev->data->port_id] = vring_state;
+
+	data->dev_private = internal;
+	data->port_id = eth_dev->data->port_id;
+	memmove(data->name, eth_dev->data->name, sizeof(data->name));
+	data->nb_rx_queues = queues;
+	data->nb_tx_queues = queues;
+	data->dev_link = pmd_link;
+	data->mac_addrs = eth_addr;
+
+	/* We'll replace the 'data' originally allocated by eth_dev. So the
+	 * vhost PMD resources won't be shared between multi processes.
+	 */
+	eth_dev->data = data;
+	eth_dev->dev_ops = &ops;
+	eth_dev->driver = NULL;
+	data->dev_flags =
+		RTE_ETH_DEV_DETACHABLE | RTE_ETH_DEV_INTR_LSC;
+	data->kdrv = RTE_KDRV_NONE;
+	data->drv_name = internal->dev_name;
+	data->numa_node = numa_node;
+
+	/* finally assign rx and tx ops */
+	eth_dev->rx_pkt_burst = eth_vhost_rx;
+	eth_dev->tx_pkt_burst = eth_vhost_tx;
+
+	return data->port_id;
+
+error:
+	if (internal)
+		free(internal->dev_name);
+	rte_free(vring_state);
+	rte_free(eth_addr);
+	if (eth_dev)
+		rte_eth_dev_release_port(eth_dev);
+	rte_free(internal);
+	rte_free(list);
+	rte_free(data);
+
+	return -1;
+}
+
+static inline int
+open_iface(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	const char **iface_name = extra_args;
+
+	if (value == NULL)
+		return -1;
+
+	*iface_name = value;
+
+	return 0;
+}
+
+static inline int
+open_queues(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *q = extra_args;
+
+	if (value == NULL || extra_args == NULL)
+		return -EINVAL;
+
+	*q = (uint16_t)strtoul(value, NULL, 0);
+	if (*q == USHRT_MAX && errno == ERANGE)
+		return -1;
+
+	if (*q > RTE_MAX_QUEUES_PER_PORT)
+		return -1;
+
+	return 0;
+}
+
+static int
+rte_pmd_vhost_devinit(const char *name, const char *params)
+{
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
+	char *iface_name;
+	uint16_t queues;
+
+	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+		return -1;
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
+					 &open_iface, &iface_name);
+		if (ret < 0)
+			goto out_free;
+	} else {
+		ret = -1;
+		goto out_free;
+	}
+
+	if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
+					 &open_queues, &queues);
+		if (ret < 0)
+			goto out_free;
+
+	} else
+		queues = 1;
+
+	eth_dev_vhost_create(name, iface_name, queues, rte_socket_id());
+
+out_free:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+static int
+rte_pmd_vhost_devuninit(const char *name)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internal *internal;
+	struct internal_list *list;
+	unsigned int i;
+
+	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
+
+	/* find an ethdev entry */
+	eth_dev = rte_eth_dev_allocated(name);
+	if (eth_dev == NULL)
+		return -ENODEV;
+
+	internal = eth_dev->data->dev_private;
+	if (internal == NULL)
+		return -ENODEV;
+
+	list = find_internal_resource(internal->iface_name);
+	if (list == NULL)
+		return -ENODEV;
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_REMOVE(&internal_list, list, next);
+	pthread_mutex_unlock(&internal_list_lock);
+	rte_free(list);
+
+	eth_dev_stop(eth_dev);
+
+	rte_free(vring_states[eth_dev->data->port_id]);
+	vring_states[eth_dev->data->port_id] = NULL;
+
+	free(internal->dev_name);
+	free(internal->iface_name);
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++)
+		rte_free(eth_dev->data->rx_queues[i]);
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
+		rte_free(eth_dev->data->tx_queues[i]);
+
+	rte_free(eth_dev->data->mac_addrs);
+	rte_free(eth_dev->data);
+	rte_free(internal);
+
+	rte_eth_dev_release_port(eth_dev);
+
+	return 0;
+}
+
+static struct rte_driver pmd_vhost_drv = {
+	.name = "eth_vhost",
+	.type = PMD_VDEV,
+	.init = rte_pmd_vhost_devinit,
+	.uninit = rte_pmd_vhost_devuninit,
+};
+
+PMD_REGISTER_DRIVER(pmd_vhost_drv);
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
new file mode 100644
index 0000000..e78cb74
--- /dev/null
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -0,0 +1,109 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 IGEL Co., Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL Co., Ltd. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_VHOST_H_
+#define _RTE_ETH_VHOST_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+#include <stdbool.h>
+
+#include <rte_virtio_net.h>
+
+/**
+ * Disable features in feature_mask.
+ *
+ * @param feature_mask
+ *  Vhost features defined in "linux/virtio_net.h".
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_feature_disable(uint64_t feature_mask);
+
+/**
+ * Enable features in feature_mask.
+ *
+ * @param feature_mask
+ *  Vhost features defined in "linux/virtio_net.h".
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_feature_enable(uint64_t feature_mask);
+
+/**
+ * Returns currently supported vhost features.
+ *
+ * @return
+ *  Vhost features defined in "linux/virtio_net.h".
+ */
+uint64_t rte_eth_vhost_feature_get(void);
+
+/*
+ * Event description.
+ */
+struct rte_eth_vhost_queue_event {
+	uint16_t queue_id;
+	bool rx;
+	bool enable;
+};
+
+/**
+ * Get queue events from specified port.
+ * If a callback for below event is registered by
+ * rte_eth_dev_callback_register(), this function will describe what was
+ * changed.
+ *  - RTE_ETH_EVENT_QUEUE_STATE_CHANGE
+ * Multiple events may cause only one callback kicking, so call this function
+ * while returning 0.
+ *
+ * @param port_id
+ *  Port id.
+ * @param event
+ *  Pointer to a rte_eth_vhost_queue_event structure.
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eth_vhost_get_queue_event(uint8_t port_id,
+		struct rte_eth_vhost_queue_event *event);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
new file mode 100644
index 0000000..65bf3a8
--- /dev/null
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -0,0 +1,10 @@
+DPDK_16.04 {
+	global:
+
+	rte_eth_vhost_feature_disable;
+	rte_eth_vhost_feature_enable;
+	rte_eth_vhost_feature_get;
+	rte_eth_vhost_get_queue_event;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index a1cd9a3..bd973e8 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -166,6 +166,12 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SNOW3G)     += -lrte_pmd_snow3g
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SNOW3G)     += -L$(LIBSSO_PATH)/build -lsso
 endif # CONFIG_RTE_LIBRTE_CRYPTODEV
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
+
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
 
 _LDLIBS-y += $(EXECENV_LDLIBS)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [PATCH v13 1/2] ethdev: Add a new event type to notify a queue state changed event
  2016-03-21  5:45                                       ` [PATCH v13 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
@ 2016-03-21  8:37                                         ` Thomas Monjalon
  2016-03-21  9:24                                           ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Thomas Monjalon @ 2016-03-21  8:37 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, bruce.richardson, ann.zhuangyanying

2016-03-21 14:45, Tetsuya Mukawa:
> This event will be occured when some queues are enabled or disabled.
> So far, only vhost PMD supports the event, and it indicates some queues
> are enabled or disabled by virtio-net device. Such an event is needed
> because virtio-net device may not enable all queues vhost PMD prepare.
> 
> Because only vhost PMD uses the event so far, it isn't an actual hardware
> interrupt but a simple software event.
[...]
> 
> +	RTE_ETH_EVENT_QUEUE_STATE_CHANGE,
> +				/**< queue state changed interrupt */

Is the shorter RTE_ETH_EVENT_QUEUE_STATE descriptive enough?

What about this comment?
/**< queue state event (enabled/disabled) */

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v13 1/2] ethdev: Add a new event type to notify a queue state changed event
  2016-03-21  8:37                                         ` Thomas Monjalon
@ 2016-03-21  9:24                                           ` Tetsuya Mukawa
  2016-03-21 11:05                                             ` Bruce Richardson
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-21  9:24 UTC (permalink / raw)
  To: Thomas Monjalon, Richardson, Bruce; +Cc: dev, Zhuangyanying

2016-03-21 17:37 GMT+09:00 Thomas Monjalon <thomas.monjalon@6wind.com>:
> 2016-03-21 14:45, Tetsuya Mukawa:
>> This event will be occured when some queues are enabled or disabled.
>> So far, only vhost PMD supports the event, and it indicates some queues
>> are enabled or disabled by virtio-net device. Such an event is needed
>> because virtio-net device may not enable all queues vhost PMD prepare.
>>
>> Because only vhost PMD uses the event so far, it isn't an actual hardware
>> interrupt but a simple software event.
> [...]
>>
>> +     RTE_ETH_EVENT_QUEUE_STATE_CHANGE,
>> +                             /**< queue state changed interrupt */
>
> Is the shorter RTE_ETH_EVENT_QUEUE_STATE descriptive enough?
>
> What about this comment?
> /**< queue state event (enabled/disabled) */

Hi Thomas,

Yes, it's enough, and above comment is nice.
Thanks for suggestion.


Hi Bruce,

If today is the deadline, could you kindly please replace above
changes while merging?
I need half a day to re-submit the patch. Sorry for asking it.
I will ask my company to let me have vpn access. ;)

Regards,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v13 1/2] ethdev: Add a new event type to notify a queue state changed event
  2016-03-21  9:24                                           ` Tetsuya Mukawa
@ 2016-03-21 11:05                                             ` Bruce Richardson
  2016-03-21 13:51                                               ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Bruce Richardson @ 2016-03-21 11:05 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: Thomas Monjalon, dev, Zhuangyanying

On Mon, Mar 21, 2016 at 06:24:36PM +0900, Tetsuya Mukawa wrote:
> 2016-03-21 17:37 GMT+09:00 Thomas Monjalon <thomas.monjalon@6wind.com>:
> > 2016-03-21 14:45, Tetsuya Mukawa:
> >> This event will be occured when some queues are enabled or disabled.
> >> So far, only vhost PMD supports the event, and it indicates some queues
> >> are enabled or disabled by virtio-net device. Such an event is needed
> >> because virtio-net device may not enable all queues vhost PMD prepare.
> >>
> >> Because only vhost PMD uses the event so far, it isn't an actual hardware
> >> interrupt but a simple software event.
> > [...]
> >>
> >> +     RTE_ETH_EVENT_QUEUE_STATE_CHANGE,
> >> +                             /**< queue state changed interrupt */
> >
> > Is the shorter RTE_ETH_EVENT_QUEUE_STATE descriptive enough?
> >
> > What about this comment?
> > /**< queue state event (enabled/disabled) */
> 
> Hi Thomas,
> 
> Yes, it's enough, and above comment is nice.
> Thanks for suggestion.
> 
> 
> Hi Bruce,
> 
> If today is the deadline, could you kindly please replace above
> changes while merging?
> I need half a day to re-submit the patch. Sorry for asking it.
> I will ask my company to let me have vpn access. ;)
> 
> Regards,
> Tetsuya

Yes, I can fix on apply to dpdk-next-net.

/Bruce

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v13 0/2] Add VHOST PMD
  2016-03-21  5:45                                       ` [PATCH v13 0/2] " Tetsuya Mukawa
@ 2016-03-21 12:42                                         ` Bruce Richardson
  0 siblings, 0 replies; 200+ messages in thread
From: Bruce Richardson @ 2016-03-21 12:42 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, ann.zhuangyanying, thomas.monjalon

On Mon, Mar 21, 2016 at 02:45:06PM +0900, Tetsuya Mukawa wrote:
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost.
> 
<snip> 
> 
> Tetsuya Mukawa (2):
>   ethdev: Add a new event type to notify a queue state changed event
>   vhost: Add VHOST PMD
> 
Applied to dpdk-next-net/rel_16_04

Thanks,
/Bruce

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v13 1/2] ethdev: Add a new event type to notify a queue state changed event
  2016-03-21 11:05                                             ` Bruce Richardson
@ 2016-03-21 13:51                                               ` Tetsuya Mukawa
  0 siblings, 0 replies; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-21 13:51 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Thomas Monjalon, dev, Zhuangyanying

2016-03-21 20:05 GMT+09:00 Bruce Richardson <bruce.richardson@intel.com>:
> On Mon, Mar 21, 2016 at 06:24:36PM +0900, Tetsuya Mukawa wrote:
>> 2016-03-21 17:37 GMT+09:00 Thomas Monjalon <thomas.monjalon@6wind.com>:
>> > 2016-03-21 14:45, Tetsuya Mukawa:
>> >> This event will be occured when some queues are enabled or disabled.
>> >> So far, only vhost PMD supports the event, and it indicates some queues
>> >> are enabled or disabled by virtio-net device. Such an event is needed
>> >> because virtio-net device may not enable all queues vhost PMD prepare.
>> >>
>> >> Because only vhost PMD uses the event so far, it isn't an actual hardware
>> >> interrupt but a simple software event.
>> > [...]
>> >>
>> >> +     RTE_ETH_EVENT_QUEUE_STATE_CHANGE,
>> >> +                             /**< queue state changed interrupt */
>> >
>> > Is the shorter RTE_ETH_EVENT_QUEUE_STATE descriptive enough?
>> >
>> > What about this comment?
>> > /**< queue state event (enabled/disabled) */
>>
>> Hi Thomas,
>>
>> Yes, it's enough, and above comment is nice.
>> Thanks for suggestion.
>>
>>
>> Hi Bruce,
>>
>> If today is the deadline, could you kindly please replace above
>> changes while merging?
>> I need half a day to re-submit the patch. Sorry for asking it.
>> I will ask my company to let me have vpn access. ;)
>>
>> Regards,
>> Tetsuya
>
> Yes, I can fix on apply to dpdk-next-net.
>
> /Bruce

I appreciate your helping.

Regards,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v13 2/2] vhost: Add VHOST PMD
  2016-03-21  5:45                                       ` [PATCH v13 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
@ 2016-03-21 15:40                                         ` Loftus, Ciara
  2016-03-22  1:55                                           ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Loftus, Ciara @ 2016-03-21 15:40 UTC (permalink / raw)
  To: Tetsuya Mukawa, dev; +Cc: Richardson, Bruce, ann.zhuangyanying, thomas.monjalon

Hi Tetsuya,

Thanks for the patches. Just one query below re max queue numbers.

Thanks,
Ciara

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tetsuya Mukawa
> Sent: Monday, March 21, 2016 5:45 AM
> To: dev@dpdk.org
> Cc: Richardson, Bruce <bruce.richardson@intel.com>;
> ann.zhuangyanying@huawei.com; thomas.monjalon@6wind.com; Tetsuya
> Mukawa <mukawa@igel.co.jp>
> Subject: [dpdk-dev] [PATCH v13 2/2] vhost: Add VHOST PMD
> 
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> The vhost messages will be handled only when a port is started. So start
> a port first, then invoke QEMU.
> 
> The PMD has 2 parameters.
>  - iface:  The parameter is used to specify a path to connect to a
>            virtio-net device.
>  - queues: The parameter is used to specify the number of the queues
>            virtio-net device has.
>            (Default: 1)
> 
> Here is an example.
> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
> 
> To connect above testpmd, here is qemu command example.
> 
> $ qemu-system-x86_64 \
>         <snip>
>         -chardev socket,id=chr0,path=/tmp/sock0 \
>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
>         -device virtio-net-pci,netdev=net0,mq=on
> 
> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> Acked-by: Rich Lane <rich.lane@bigswitch.com>
> Tested-by: Rich Lane <rich.lane@bigswitch.com>
> ---
>  MAINTAINERS                                 |   5 +
>  config/common_base                          |   6 +
>  config/common_linuxapp                      |   1 +
>  doc/guides/nics/index.rst                   |   1 +
>  doc/guides/nics/overview.rst                |  37 +-
>  doc/guides/nics/vhost.rst                   | 110 ++++
>  doc/guides/rel_notes/release_16_04.rst      |   4 +
>  drivers/net/Makefile                        |   4 +
>  drivers/net/vhost/Makefile                  |  62 ++
>  drivers/net/vhost/rte_eth_vhost.c           | 917
> ++++++++++++++++++++++++++++
>  drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
>  drivers/net/vhost/rte_pmd_vhost_version.map |  10 +
>  mk/rte.app.mk                               |   6 +
>  13 files changed, 1254 insertions(+), 18 deletions(-)
>  create mode 100644 doc/guides/nics/vhost.rst
>  create mode 100644 drivers/net/vhost/Makefile
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.h
>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 8b21979..7a47fc0 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -352,6 +352,11 @@ Null PMD
>  M: Tetsuya Mukawa <mukawa@igel.co.jp>
>  F: drivers/net/null/
> 
> +Vhost PMD
> +M: Tetsuya Mukawa <mukawa@igel.co.jp>
> +M: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> +F: drivers/net/vhost/
> +
>  Intel AES-NI GCM PMD
>  M: Declan Doherty <declan.doherty@intel.com>
>  F: drivers/crypto/aesni_gcm/
> diff --git a/config/common_base b/config/common_base
> index dbd405b..5efee07 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -514,6 +514,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
>  CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
> 
>  #
> +# Compile vhost PMD
> +# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
> +#
> +CONFIG_RTE_LIBRTE_PMD_VHOST=n
> +
> +#
>  #Compile Xen domain0 support
>  #
>  CONFIG_RTE_LIBRTE_XEN_DOM0=n
> diff --git a/config/common_linuxapp b/config/common_linuxapp
> index ffbe260..7e698e2 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -40,5 +40,6 @@ CONFIG_RTE_EAL_VFIO=y
>  CONFIG_RTE_KNI_KMOD=y
>  CONFIG_RTE_LIBRTE_KNI=y
>  CONFIG_RTE_LIBRTE_VHOST=y
> +CONFIG_RTE_LIBRTE_PMD_VHOST=y
>  CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
>  CONFIG_RTE_LIBRTE_POWER=y
> diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
> index 0b353a8..d53b0c7 100644
> --- a/doc/guides/nics/index.rst
> +++ b/doc/guides/nics/index.rst
> @@ -49,6 +49,7 @@ Network Interface Controller Drivers
>      nfp
>      szedata2
>      virtio
> +    vhost
>      vmxnet3
>      pcap_ring
> 
> diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst
> index 2d4f014..40ca5ec 100644
> --- a/doc/guides/nics/overview.rst
> +++ b/doc/guides/nics/overview.rst
> @@ -74,20 +74,21 @@ Most of these differences are summarized below.
> 
>  .. table:: Features availability in networking drivers
> 
> -   ==================== = = = = = = = = = = = = = = = = = = = = = = = = = =
> = = = = =
> -   Feature              a b b b c e e i i i i i i i i i i f f m m m n n p r s v v v x
> -                        f n n o x 1 n 4 4 4 4 g g x x x x m m l l p f u c i z i i m e
> -                        p x x n g 0 i 0 0 0 0 b b g g g g 1 1 x x i p l a n e r r x n
> -                        a 2 2 d b 0 c e e e e   v b b b b 0 0 4 5 p   l p g d t t n v
> -                        c x x i e 0     . v v   f e e e e k k     e         a i i e i
> -                        k   v n         . f f       . v v   .               t o o t r
> -                        e   f g         .   .       . f f   .               a   . 3 t
> -                        t               v   v       v   v   v               2   v
> -                                        e   e       e   e   e                   e
> -                                        c   c       c   c   c                   c
> -   ==================== = = = = = = = = = = = = = = = = = = = = = = = = = =
> = = = = =
> -   link status                  X     X X                                   X
> -   link status event                  X X
> +   ==================== = = = = = = = = = = = = = = = = = = = = = = = = = =
> = = = = = =
> +   Feature              a b b b c e e i i i i i i i i i i f f m m m n n p r s v v v v x
> +                        f n n o x 1 n 4 4 4 4 g g x x x x m m l l p f u c i z h i i m e
> +                        p x x n g 0 i 0 0 0 0 b b g g g g 1 1 x x i p l a n e o r r x n
> +                        a 2 2 d b 0 c e e e e   v b b b b 0 0 4 5 p   l p g d s t t n v
> +                        c x x i e 0     . v v   f e e e e k k     e         a t i i e i
> +                        k   v n         . f f       . v v   .               t   o o t r
> +                        e   f g         .   .       . f f   .               a     . 3 t
> +                        t               v   v       v   v   v               2     v
> +                                        e   e       e   e   e                     e
> +                                        c   c       c   c   c                     c
> +   ==================== = = = = = = = = = = = = = = = = = = = = = = = = = =
> = = = = = =
> +   link status                  X     X X                                   X X
> +   link status event                  X X                                     X
> +   queue status event                                                         X
>     Rx interrupt                       X X X X
>     queue start/stop             X     X X X X                               X
>     MTU update                   X
> @@ -125,7 +126,7 @@ Most of these differences are summarized below.
>     inner L4 checksum                  X   X
>     packet type parsing          X     X   X
>     timesync                           X X
> -   basic stats                  X     X X X X                               X
> +   basic stats                  X     X X X X                               X X
>     extended stats                     X X X X
>     stats per queue              X                                           X
>     EEPROM dump
> @@ -139,9 +140,9 @@ Most of these differences are summarized below.
>     ARMv8
>     Power8
>     TILE-Gx
> -   x86-32                       X     X X X X
> -   x86-64                       X     X X X X                               X
> +   x86-32                       X     X X X X                                 X
> +   x86-64                       X     X X X X                               X X
>     usage doc                    X                                           X
>     design doc
>     perf doc
> -   ==================== = = = = = = = = = = = = = = = = = = = = = = = = = =
> = = = = =
> +   ==================== = = = = = = = = = = = = = = = = = = = = = = = = = =
> = = = = = =
> diff --git a/doc/guides/nics/vhost.rst b/doc/guides/nics/vhost.rst
> new file mode 100644
> index 0000000..50e8a3a
> --- /dev/null
> +++ b/doc/guides/nics/vhost.rst
> @@ -0,0 +1,110 @@
> +..  BSD LICENSE
> +    Copyright(c) 2016 IGEL Co., Ltd.. All rights reserved.
> +    All rights reserved.
> +
> +    Redistribution and use in source and binary forms, with or without
> +    modification, are permitted provided that the following conditions
> +    are met:
> +
> +    * Redistributions of source code must retain the above copyright
> +    notice, this list of conditions and the following disclaimer.
> +    * Redistributions in binary form must reproduce the above copyright
> +    notice, this list of conditions and the following disclaimer in
> +    the documentation and/or other materials provided with the
> +    distribution.
> +    * Neither the name of IGEL Co., Ltd. nor the names of its
> +    contributors may be used to endorse or promote products derived
> +    from this software without specific prior written permission.
> +
> +    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> +    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> +    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> +    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> +    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> +    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> +    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE,
> +    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> +    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> +    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> +
> +Poll Mode Driver that wraps vhost library
> +=========================================
> +
> +This PMD is a thin wrapper of the DPDK vhost library.
> +The user can handle virtqueues as one of normal DPDK port.
> +
> +Vhost Implementation in DPDK
> +----------------------------
> +
> +Please refer to Chapter "Vhost Library" of *DPDK Programmer's Guide* to
> know detail of vhost.
> +
> +Features and Limitations of vhost PMD
> +-------------------------------------
> +
> +Currently, the vhost PMD provides the basic functionality of packet
> reception, transmission and event handling.
> +
> +*   It has multiple queues support.
> +
> +*   It supports ``RTE_ETH_EVENT_INTR_LSC`` and
> ``RTE_ETH_EVENT_QUEUE_STATE_CHANGE`` events.
> +
> +*   It supports Port Hotplug functionality.
> +
> +*   Don't need to stop RX/TX, when the user wants to stop a guest or a
> virtio-net driver on guest.
> +
> +Vhost PMD arguments
> +-------------------
> +
> +The user can specify below arguments in `--vdev` option.
> +
> +#.  ``iface``:
> +
> +    It is used to specify a path to connect to a QEMU virtio-net device.
> +
> +#.  ``queues``:
> +
> +    It is used to specify the number of queues virtio-net device has.
> +    (Default: 1)
> +
> +Vhost PMD event handling
> +------------------------
> +
> +This section describes how to handle vhost PMD events.
> +
> +The user can register an event callback handler with
> ``rte_eth_dev_callback_register()``.
> +The registered callback handler will be invoked with one of below event
> types.
> +
> +#.  ``RTE_ETH_EVENT_INTR_LSC``:
> +
> +    It means link status of the port was changed.
> +
> +#.  ``RTE_ETH_EVENT_QUEUE_STATE_CHANGE``:
> +
> +    It means some of queue statuses were changed. Call
> ``rte_eth_vhost_get_queue_event()`` in the callback handler.
> +    Because changing multiple statuses may occur only one event, call the
> function repeatedly as long as it doesn't return negative value.
> +
> +Vhost PMD with testpmd application
> +----------------------------------
> +
> +This section demonstrates vhost PMD with testpmd DPDK sample
> application.
> +
> +#.  Launch the testpmd with vhost PMD:
> +
> +    .. code-block:: console
> +
> +        ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
> +
> +    Other basic DPDK preparations like hugepage enabling here.
> +    Please refer to the *DPDK Getting Started Guide* for detailed
> instructions.
> +
> +#.  Launch the QEMU:
> +
> +    .. code-block:: console
> +
> +       qemu-system-x86_64 <snip>
> +                   -chardev socket,id=chr0,path=/tmp/sock0 \
> +                   -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
> +                   -device virtio-net-pci,netdev=net0
> +
> +    This command attaches one virtio-net device to QEMU guest.
> +    After initialization processes between QEMU and DPDK vhost library are
> done, status of the port will be linked up.
> diff --git a/doc/guides/rel_notes/release_16_04.rst
> b/doc/guides/rel_notes/release_16_04.rst
> index 2785b29..2e4bbb3 100644
> --- a/doc/guides/rel_notes/release_16_04.rst
> +++ b/doc/guides/rel_notes/release_16_04.rst
> @@ -248,6 +248,10 @@ This section should contain new features added in
> this release. Sample format:
> 
>    New application implementing an IPsec Security Gateway.
> 
> +* **Added vhost PMD.**
> +
> +  Added virtual PMD that wraps librte_vhost.
> +
> 
>  Resolved Issues
>  ---------------
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index 0c3393f..8ba37fb 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -52,4 +52,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
>  DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
> 
> +ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
> +endif # $(CONFIG_RTE_LIBRTE_VHOST)
> +
>  include $(RTE_SDK)/mk/rte.subdir.mk
> diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
> new file mode 100644
> index 0000000..f49a69b
> --- /dev/null
> +++ b/drivers/net/vhost/Makefile
> @@ -0,0 +1,62 @@
> +#   BSD LICENSE
> +#
> +#   Copyright (c) 2010-2016 Intel Corporation.
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +#     * Redistributions of source code must retain the above copyright
> +#       notice, this list of conditions and the following disclaimer.
> +#     * Redistributions in binary form must reproduce the above copyright
> +#       notice, this list of conditions and the following disclaimer in
> +#       the documentation and/or other materials provided with the
> +#       distribution.
> +#     * Neither the name of Intel corporation nor the names of its
> +#       contributors may be used to endorse or promote products derived
> +#       from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +#
> +# library name
> +#
> +LIB = librte_pmd_vhost.a
> +
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS)
> +
> +EXPORT_MAP := rte_pmd_vhost_version.map
> +
> +LIBABIVER := 1
> +
> +#
> +# all source are stored in SRCS-y
> +#
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
> +
> +#
> +# Export include files
> +#
> +SYMLINK-y-include += rte_eth_vhost.h
> +
> +# this lib depends upon:
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/drivers/net/vhost/rte_eth_vhost.c
> b/drivers/net/vhost/rte_eth_vhost.c
> new file mode 100644
> index 0000000..6b9d287
> --- /dev/null
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -0,0 +1,917 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright (c) 2016 IGEL Co., Ltd.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of IGEL Co.,Ltd. nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> + */
> +#include <unistd.h>
> +#include <pthread.h>
> +#include <stdbool.h>
> +#ifdef RTE_LIBRTE_VHOST_NUMA
> +#include <numaif.h>
> +#endif
> +
> +#include <rte_mbuf.h>
> +#include <rte_ethdev.h>
> +#include <rte_malloc.h>
> +#include <rte_memcpy.h>
> +#include <rte_dev.h>
> +#include <rte_kvargs.h>
> +#include <rte_virtio_net.h>
> +#include <rte_spinlock.h>
> +
> +#include "rte_eth_vhost.h"
> +
> +#define ETH_VHOST_IFACE_ARG		"iface"
> +#define ETH_VHOST_QUEUES_ARG		"queues"
> +
> +static const char *drivername = "VHOST PMD";
> +
> +static const char *valid_arguments[] = {
> +	ETH_VHOST_IFACE_ARG,
> +	ETH_VHOST_QUEUES_ARG,
> +	NULL
> +};
> +
> +static struct ether_addr base_eth_addr = {
> +	.addr_bytes = {
> +		0x56 /* V */,
> +		0x48 /* H */,
> +		0x4F /* O */,
> +		0x53 /* S */,
> +		0x54 /* T */,
> +		0x00
> +	}
> +};
> +
> +struct vhost_queue {
> +	rte_atomic32_t allow_queuing;
> +	rte_atomic32_t while_queuing;
> +	struct virtio_net *device;
> +	struct pmd_internal *internal;
> +	struct rte_mempool *mb_pool;
> +	uint8_t port;
> +	uint16_t virtqueue_id;
> +	uint64_t rx_pkts;
> +	uint64_t tx_pkts;
> +	uint64_t missed_pkts;
> +	uint64_t rx_bytes;
> +	uint64_t tx_bytes;
> +};
> +
> +struct pmd_internal {
> +	char *dev_name;
> +	char *iface_name;
> +
> +	volatile uint16_t once;
> +};
> +
> +struct internal_list {
> +	TAILQ_ENTRY(internal_list) next;
> +	struct rte_eth_dev *eth_dev;
> +};
> +
> +TAILQ_HEAD(internal_list_head, internal_list);
> +static struct internal_list_head internal_list =
> +	TAILQ_HEAD_INITIALIZER(internal_list);
> +
> +static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
> +
> +static rte_atomic16_t nb_started_ports;
> +static pthread_t session_th;
> +
> +static struct rte_eth_link pmd_link = {
> +		.link_speed = 10000,
> +		.link_duplex = ETH_LINK_FULL_DUPLEX,
> +		.link_status = 0
> +};
> +
> +struct rte_vhost_vring_state {
> +	rte_spinlock_t lock;
> +
> +	bool cur[RTE_MAX_QUEUES_PER_PORT * 2];
> +	bool seen[RTE_MAX_QUEUES_PER_PORT * 2];
> +	unsigned int index;
> +	unsigned int max_vring;
> +};
> +
> +static struct rte_vhost_vring_state *vring_states[RTE_MAX_ETHPORTS];
> +
> +static uint16_t
> +eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
> +	struct vhost_queue *r = q;
> +	uint16_t i, nb_rx = 0;
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		return 0;
> +
> +	rte_atomic32_set(&r->while_queuing, 1);
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		goto out;
> +
> +	/* Dequeue packets from guest TX queue */
> +	nb_rx = rte_vhost_dequeue_burst(r->device,
> +			r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
> +
> +	r->rx_pkts += nb_rx;
> +
> +	for (i = 0; likely(i < nb_rx); i++) {
> +		bufs[i]->port = r->port;
> +		r->rx_bytes += bufs[i]->pkt_len;
> +	}
> +
> +out:
> +	rte_atomic32_set(&r->while_queuing, 0);
> +
> +	return nb_rx;
> +}
> +
> +static uint16_t
> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
> +	struct vhost_queue *r = q;
> +	uint16_t i, nb_tx = 0;
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		return 0;
> +
> +	rte_atomic32_set(&r->while_queuing, 1);
> +
> +	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +		goto out;
> +
> +	/* Enqueue packets to guest RX queue */
> +	nb_tx = rte_vhost_enqueue_burst(r->device,
> +			r->virtqueue_id, bufs, nb_bufs);
> +
> +	r->tx_pkts += nb_tx;
> +	r->missed_pkts += nb_bufs - nb_tx;
> +
> +	for (i = 0; likely(i < nb_tx); i++)
> +		r->tx_bytes += bufs[i]->pkt_len;
> +
> +	for (i = 0; likely(i < nb_tx); i++)
> +		rte_pktmbuf_free(bufs[i]);
> +out:
> +	rte_atomic32_set(&r->while_queuing, 0);
> +
> +	return nb_tx;
> +}
> +
> +static int
> +eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
> +{
> +	return 0;
> +}
> +
> +static inline struct internal_list *
> +find_internal_resource(char *ifname)
> +{
> +	int found = 0;
> +	struct internal_list *list;
> +	struct pmd_internal *internal;
> +
> +	if (ifname == NULL)
> +		return NULL;
> +
> +	pthread_mutex_lock(&internal_list_lock);
> +
> +	TAILQ_FOREACH(list, &internal_list, next) {
> +		internal = list->eth_dev->data->dev_private;
> +		if (!strcmp(internal->iface_name, ifname)) {
> +			found = 1;
> +			break;
> +		}
> +	}
> +
> +	pthread_mutex_unlock(&internal_list_lock);
> +
> +	if (!found)
> +		return NULL;
> +
> +	return list;
> +}
> +
> +static int
> +new_device(struct virtio_net *dev)
> +{
> +	struct rte_eth_dev *eth_dev;
> +	struct internal_list *list;
> +	struct pmd_internal *internal;
> +	struct vhost_queue *vq;
> +	unsigned i;
> +
> +	if (dev == NULL) {
> +		RTE_LOG(INFO, PMD, "Invalid argument\n");
> +		return -1;
> +	}
> +
> +	list = find_internal_resource(dev->ifname);
> +	if (list == NULL) {
> +		RTE_LOG(INFO, PMD, "Invalid device name\n");
> +		return -1;
> +	}
> +
> +	eth_dev = list->eth_dev;
> +	internal = eth_dev->data->dev_private;
> +
> +	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> +		vq = eth_dev->data->rx_queues[i];
> +		if (vq == NULL)
> +			continue;
> +		vq->device = dev;
> +		vq->internal = internal;
> +		vq->port = eth_dev->data->port_id;
> +		rte_vhost_enable_guest_notification(dev, vq-
> >virtqueue_id, 0);
> +	}
> +	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
> +		vq = eth_dev->data->tx_queues[i];
> +		if (vq == NULL)
> +			continue;
> +		vq->device = dev;
> +		vq->internal = internal;
> +		vq->port = eth_dev->data->port_id;
> +		rte_vhost_enable_guest_notification(dev, vq-
> >virtqueue_id, 0);
> +	}
> +
> +	dev->flags |= VIRTIO_DEV_RUNNING;
> +	dev->priv = eth_dev;
> +	eth_dev->data->dev_link.link_status = 1;
> +
> +	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> +		vq = eth_dev->data->rx_queues[i];
> +		if (vq == NULL)
> +			continue;
> +		rte_atomic32_set(&vq->allow_queuing, 1);
> +	}
> +	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
> +		vq = eth_dev->data->tx_queues[i];
> +		if (vq == NULL)
> +			continue;
> +		rte_atomic32_set(&vq->allow_queuing, 1);
> +	}
> +
> +	RTE_LOG(INFO, PMD, "New connection established\n");
> +
> +	_rte_eth_dev_callback_process(eth_dev,
> RTE_ETH_EVENT_INTR_LSC);
> +
> +	return 0;
> +}
> +
> +static void
> +destroy_device(volatile struct virtio_net *dev)
> +{
> +	struct rte_eth_dev *eth_dev;
> +	struct vhost_queue *vq;
> +	unsigned i;
> +
> +	if (dev == NULL) {
> +		RTE_LOG(INFO, PMD, "Invalid argument\n");
> +		return;
> +	}
> +
> +	eth_dev = (struct rte_eth_dev *)dev->priv;
> +	if (eth_dev == NULL) {
> +		RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
> +		return;
> +	}
> +
> +	/* Wait until rx/tx_pkt_burst stops accessing vhost device */
> +	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> +		vq = eth_dev->data->rx_queues[i];
> +		if (vq == NULL)
> +			continue;
> +		rte_atomic32_set(&vq->allow_queuing, 0);
> +		while (rte_atomic32_read(&vq->while_queuing))
> +			rte_pause();
> +	}
> +	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
> +		vq = eth_dev->data->tx_queues[i];
> +		if (vq == NULL)
> +			continue;
> +		rte_atomic32_set(&vq->allow_queuing, 0);
> +		while (rte_atomic32_read(&vq->while_queuing))
> +			rte_pause();
> +	}
> +
> +	eth_dev->data->dev_link.link_status = 0;
> +
> +	dev->priv = NULL;
> +	dev->flags &= ~VIRTIO_DEV_RUNNING;
> +
> +	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> +		vq = eth_dev->data->rx_queues[i];
> +		if (vq == NULL)
> +			continue;
> +		vq->device = NULL;
> +	}
> +	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
> +		vq = eth_dev->data->tx_queues[i];
> +		if (vq == NULL)
> +			continue;
> +		vq->device = NULL;
> +	}
> +
> +	RTE_LOG(INFO, PMD, "Connection closed\n");
> +
> +	_rte_eth_dev_callback_process(eth_dev,
> RTE_ETH_EVENT_INTR_LSC);
> +}
> +
> +static int
> +vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
> +{
> +	struct rte_vhost_vring_state *state;
> +	struct rte_eth_dev *eth_dev;
> +	struct internal_list *list;
> +#ifdef RTE_LIBRTE_VHOST_NUMA
> +	int newnode, ret;
> +#endif
> +
> +	if (dev == NULL) {
> +		RTE_LOG(ERR, PMD, "Invalid argument\n");
> +		return -1;
> +	}
> +
> +	list = find_internal_resource(dev->ifname);
> +	if (list == NULL) {
> +		RTE_LOG(ERR, PMD, "Invalid interface name: %s\n", dev-
> >ifname);
> +		return -1;
> +	}
> +
> +	eth_dev = list->eth_dev;
> +	/* won't be NULL */
> +	state = vring_states[eth_dev->data->port_id];
> +
> +#ifdef RTE_LIBRTE_VHOST_NUMA
> +	ret  = get_mempolicy(&newnode, NULL, 0, dev,
> +			MPOL_F_NODE | MPOL_F_ADDR);
> +	if (ret < 0) {
> +		RTE_LOG(ERR, PMD, "Unknown numa node\n");
> +		return -1;
> +	}
> +
> +	eth_dev->data->numa_node = newnode;
> +#endif
> +	rte_spinlock_lock(&state->lock);
> +	state->cur[vring] = enable;
> +	state->max_vring = RTE_MAX(vring, state->max_vring);
> +	rte_spinlock_unlock(&state->lock);
> +
> +	RTE_LOG(INFO, PMD, "vring%u is %s\n",
> +			vring, enable ? "enabled" : "disabled");
> +
> +	_rte_eth_dev_callback_process(eth_dev,
> +			RTE_ETH_EVENT_QUEUE_STATE_CHANGE);
> +
> +	return 0;
> +}
> +
> +int
> +rte_eth_vhost_get_queue_event(uint8_t port_id,
> +		struct rte_eth_vhost_queue_event *event)
> +{
> +	struct rte_vhost_vring_state *state;
> +	unsigned int i;
> +	int idx;
> +
> +	if (port_id >= RTE_MAX_ETHPORTS) {
> +		RTE_LOG(ERR, PMD, "Invalid port id\n");
> +		return -1;
> +	}
> +
> +	state = vring_states[port_id];
> +	if (!state) {
> +		RTE_LOG(ERR, PMD, "Unused port\n");
> +		return -1;
> +	}
> +
> +	rte_spinlock_lock(&state->lock);
> +	for (i = 0; i <= state->max_vring; i++) {
> +		idx = state->index++ % (state->max_vring + 1);
> +
> +		if (state->cur[idx] != state->seen[idx]) {
> +			state->seen[idx] = state->cur[idx];
> +			event->queue_id = idx / 2;
> +			event->rx = idx & 1;
> +			event->enable = state->cur[idx];
> +			rte_spinlock_unlock(&state->lock);
> +			return 0;
> +		}
> +	}
> +	rte_spinlock_unlock(&state->lock);
> +
> +	return -1;
> +}
> +
> +static void *
> +vhost_driver_session(void *param __rte_unused)
> +{
> +	static struct virtio_net_device_ops vhost_ops;
> +
> +	/* set vhost arguments */
> +	vhost_ops.new_device = new_device;
> +	vhost_ops.destroy_device = destroy_device;
> +	vhost_ops.vring_state_changed = vring_state_changed;
> +	if (rte_vhost_driver_callback_register(&vhost_ops) < 0)
> +		RTE_LOG(ERR, PMD, "Can't register callbacks\n");
> +
> +	/* start event handling */
> +	rte_vhost_driver_session_start();
> +
> +	return NULL;
> +}
> +
> +static int
> +vhost_driver_session_start(void)
> +{
> +	int ret;
> +
> +	ret = pthread_create(&session_th,
> +			NULL, vhost_driver_session, NULL);
> +	if (ret)
> +		RTE_LOG(ERR, PMD, "Can't create a thread\n");
> +
> +	return ret;
> +}
> +
> +static void
> +vhost_driver_session_stop(void)
> +{
> +	int ret;
> +
> +	ret = pthread_cancel(session_th);
> +	if (ret)
> +		RTE_LOG(ERR, PMD, "Can't cancel the thread\n");
> +
> +	ret = pthread_join(session_th, NULL);
> +	if (ret)
> +		RTE_LOG(ERR, PMD, "Can't join the thread\n");
> +}
> +
> +static int
> +eth_dev_start(struct rte_eth_dev *dev)
> +{
> +	struct pmd_internal *internal = dev->data->dev_private;
> +	int ret = 0;
> +
> +	if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
> +		ret = rte_vhost_driver_register(internal->iface_name);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	/* We need only one message handling thread */
> +	if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
> +		ret = vhost_driver_session_start();
> +
> +	return ret;
> +}
> +
> +static void
> +eth_dev_stop(struct rte_eth_dev *dev)
> +{
> +	struct pmd_internal *internal = dev->data->dev_private;
> +
> +	if (rte_atomic16_cmpset(&internal->once, 1, 0))
> +		rte_vhost_driver_unregister(internal->iface_name);
> +
> +	if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
> +		vhost_driver_session_stop();
> +}
> +
> +static int
> +eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
> +		   uint16_t nb_rx_desc __rte_unused,
> +		   unsigned int socket_id,
> +		   const struct rte_eth_rxconf *rx_conf __rte_unused,
> +		   struct rte_mempool *mb_pool)
> +{
> +	struct vhost_queue *vq;
> +
> +	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
> +			RTE_CACHE_LINE_SIZE, socket_id);
> +	if (vq == NULL) {
> +		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx
> queue\n");
> +		return -ENOMEM;
> +	}
> +
> +	vq->mb_pool = mb_pool;
> +	vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
> +	dev->data->rx_queues[rx_queue_id] = vq;
> +
> +	return 0;
> +}
> +
> +static int
> +eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
> +		   uint16_t nb_tx_desc __rte_unused,
> +		   unsigned int socket_id,
> +		   const struct rte_eth_txconf *tx_conf __rte_unused)
> +{
> +	struct vhost_queue *vq;
> +
> +	vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
> +			RTE_CACHE_LINE_SIZE, socket_id);
> +	if (vq == NULL) {
> +		RTE_LOG(ERR, PMD, "Failed to allocate memory for tx
> queue\n");
> +		return -ENOMEM;
> +	}
> +
> +	vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
> +	dev->data->tx_queues[tx_queue_id] = vq;
> +
> +	return 0;
> +}
> +
> +static void
> +eth_dev_info(struct rte_eth_dev *dev,
> +	     struct rte_eth_dev_info *dev_info)
> +{
> +	dev_info->driver_name = drivername;
> +	dev_info->max_mac_addrs = 1;
> +	dev_info->max_rx_pktlen = (uint32_t)-1;
> +	dev_info->max_rx_queues = dev->data->nb_rx_queues;
> +	dev_info->max_tx_queues = dev->data->nb_tx_queues;

I'm not entirely familiar with eth driver code so please correct me if I am wrong.

I'm wondering if assigning the max queue values to dev->data->nb_*x_queues is correct.
A user could change the value of nb_*x_queues with a call to rte_eth_dev_configure(n_queues) which in turn calls rte_eth_dev_*x_queue_config(n_queues) which will set dev->data->nb_*x_queues to the value of n_queues which can be arbitrary and decided by the user. If this is the case, dev->data->nb_*x_queues will no longer reflect the max, rather the value the user chose in the call to rte_eth_dev_configure. And the max could potentially change with multiple calls to configure. Is this intended behaviour?

> +	dev_info->min_rx_bufsize = 0;
> +}
> +
> +static void
> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
> +{
> +	unsigned i;
> +	unsigned long rx_total = 0, tx_total = 0, tx_missed_total = 0;
> +	unsigned long rx_total_bytes = 0, tx_total_bytes = 0;
> +	struct vhost_queue *vq;
> +
> +	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
> +			i < dev->data->nb_rx_queues; i++) {
> +		if (dev->data->rx_queues[i] == NULL)
> +			continue;
> +		vq = dev->data->rx_queues[i];
> +		stats->q_ipackets[i] = vq->rx_pkts;
> +		rx_total += stats->q_ipackets[i];
> +
> +		stats->q_ibytes[i] = vq->rx_bytes;
> +		rx_total_bytes += stats->q_ibytes[i];
> +	}
> +
> +	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
> +			i < dev->data->nb_tx_queues; i++) {
> +		if (dev->data->tx_queues[i] == NULL)
> +			continue;
> +		vq = dev->data->tx_queues[i];
> +		stats->q_opackets[i] = vq->tx_pkts;
> +		tx_missed_total += vq->missed_pkts;
> +		tx_total += stats->q_opackets[i];
> +
> +		stats->q_obytes[i] = vq->tx_bytes;
> +		tx_total_bytes += stats->q_obytes[i];
> +	}
> +
> +	stats->ipackets = rx_total;
> +	stats->opackets = tx_total;
> +	stats->imissed = tx_missed_total;
> +	stats->ibytes = rx_total_bytes;
> +	stats->obytes = tx_total_bytes;
> +}
> +
> +static void
> +eth_stats_reset(struct rte_eth_dev *dev)
> +{
> +	struct vhost_queue *vq;
> +	unsigned i;
> +
> +	for (i = 0; i < dev->data->nb_rx_queues; i++) {
> +		if (dev->data->rx_queues[i] == NULL)
> +			continue;
> +		vq = dev->data->rx_queues[i];
> +		vq->rx_pkts = 0;
> +		vq->rx_bytes = 0;
> +	}
> +	for (i = 0; i < dev->data->nb_tx_queues; i++) {
> +		if (dev->data->tx_queues[i] == NULL)
> +			continue;
> +		vq = dev->data->tx_queues[i];
> +		vq->tx_pkts = 0;
> +		vq->tx_bytes = 0;
> +		vq->missed_pkts = 0;
> +	}
> +}
> +
> +static void
> +eth_queue_release(void *q)
> +{
> +	rte_free(q);
> +}
> +
> +static int
> +eth_link_update(struct rte_eth_dev *dev __rte_unused,
> +		int wait_to_complete __rte_unused)
> +{
> +	return 0;
> +}
> +
> +/**
> + * Disable features in feature_mask. Returns 0 on success.
> + */
> +int
> +rte_eth_vhost_feature_disable(uint64_t feature_mask)
> +{
> +	return rte_vhost_feature_disable(feature_mask);
> +}
> +
> +/**
> + * Enable features in feature_mask. Returns 0 on success.
> + */
> +int
> +rte_eth_vhost_feature_enable(uint64_t feature_mask)
> +{
> +	return rte_vhost_feature_enable(feature_mask);
> +}
> +
> +/* Returns currently supported vhost features */
> +uint64_t
> +rte_eth_vhost_feature_get(void)
> +{
> +	return rte_vhost_feature_get();
> +}
> +
> +static const struct eth_dev_ops ops = {
> +	.dev_start = eth_dev_start,
> +	.dev_stop = eth_dev_stop,
> +	.dev_configure = eth_dev_configure,
> +	.dev_infos_get = eth_dev_info,
> +	.rx_queue_setup = eth_rx_queue_setup,
> +	.tx_queue_setup = eth_tx_queue_setup,
> +	.rx_queue_release = eth_queue_release,
> +	.tx_queue_release = eth_queue_release,
> +	.link_update = eth_link_update,
> +	.stats_get = eth_stats_get,
> +	.stats_reset = eth_stats_reset,
> +};
> +
> +static int
> +eth_dev_vhost_create(const char *name, char *iface_name, int16_t
> queues,
> +		     const unsigned numa_node)
> +{
> +	struct rte_eth_dev_data *data = NULL;
> +	struct pmd_internal *internal = NULL;
> +	struct rte_eth_dev *eth_dev = NULL;
> +	struct ether_addr *eth_addr = NULL;
> +	struct rte_vhost_vring_state *vring_state = NULL;
> +	struct internal_list *list = NULL;
> +
> +	RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa
> socket %u\n",
> +		numa_node);
> +
> +	/* now do all data allocation - for eth_dev structure, dummy pci
> driver
> +	 * and internal (private) data
> +	 */
> +	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
> +	if (data == NULL)
> +		goto error;
> +
> +	internal = rte_zmalloc_socket(name, sizeof(*internal), 0,
> numa_node);
> +	if (internal == NULL)
> +		goto error;
> +
> +	list = rte_zmalloc_socket(name, sizeof(*list), 0, numa_node);
> +	if (list == NULL)
> +		goto error;
> +
> +	/* reserve an ethdev entry */
> +	eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
> +	if (eth_dev == NULL)
> +		goto error;
> +
> +	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0,
> numa_node);
> +	if (eth_addr == NULL)
> +		goto error;
> +	*eth_addr = base_eth_addr;
> +	eth_addr->addr_bytes[5] = eth_dev->data->port_id;
> +
> +	vring_state = rte_zmalloc_socket(name,
> +			sizeof(*vring_state), 0, numa_node);
> +	if (vring_state == NULL)
> +		goto error;
> +
> +	TAILQ_INIT(&eth_dev->link_intr_cbs);
> +
> +	/* now put it all together
> +	 * - store queue data in internal,
> +	 * - store numa_node info in ethdev data
> +	 * - point eth_dev_data to internals
> +	 * - and point eth_dev structure to new eth_dev_data structure
> +	 */
> +	internal->dev_name = strdup(name);
> +	if (internal->dev_name == NULL)
> +		goto error;
> +	internal->iface_name = strdup(iface_name);
> +	if (internal->iface_name == NULL)
> +		goto error;
> +
> +	list->eth_dev = eth_dev;
> +	pthread_mutex_lock(&internal_list_lock);
> +	TAILQ_INSERT_TAIL(&internal_list, list, next);
> +	pthread_mutex_unlock(&internal_list_lock);
> +
> +	rte_spinlock_init(&vring_state->lock);
> +	vring_states[eth_dev->data->port_id] = vring_state;
> +
> +	data->dev_private = internal;
> +	data->port_id = eth_dev->data->port_id;
> +	memmove(data->name, eth_dev->data->name, sizeof(data-
> >name));
> +	data->nb_rx_queues = queues;
> +	data->nb_tx_queues = queues;
> +	data->dev_link = pmd_link;
> +	data->mac_addrs = eth_addr;
> +
> +	/* We'll replace the 'data' originally allocated by eth_dev. So the
> +	 * vhost PMD resources won't be shared between multi processes.
> +	 */
> +	eth_dev->data = data;
> +	eth_dev->dev_ops = &ops;
> +	eth_dev->driver = NULL;
> +	data->dev_flags =
> +		RTE_ETH_DEV_DETACHABLE | RTE_ETH_DEV_INTR_LSC;
> +	data->kdrv = RTE_KDRV_NONE;
> +	data->drv_name = internal->dev_name;
> +	data->numa_node = numa_node;
> +
> +	/* finally assign rx and tx ops */
> +	eth_dev->rx_pkt_burst = eth_vhost_rx;
> +	eth_dev->tx_pkt_burst = eth_vhost_tx;
> +
> +	return data->port_id;
> +
> +error:
> +	if (internal)
> +		free(internal->dev_name);
> +	rte_free(vring_state);
> +	rte_free(eth_addr);
> +	if (eth_dev)
> +		rte_eth_dev_release_port(eth_dev);
> +	rte_free(internal);
> +	rte_free(list);
> +	rte_free(data);
> +
> +	return -1;
> +}
> +
> +static inline int
> +open_iface(const char *key __rte_unused, const char *value, void
> *extra_args)
> +{
> +	const char **iface_name = extra_args;
> +
> +	if (value == NULL)
> +		return -1;
> +
> +	*iface_name = value;
> +
> +	return 0;
> +}
> +
> +static inline int
> +open_queues(const char *key __rte_unused, const char *value, void
> *extra_args)
> +{
> +	uint16_t *q = extra_args;
> +
> +	if (value == NULL || extra_args == NULL)
> +		return -EINVAL;
> +
> +	*q = (uint16_t)strtoul(value, NULL, 0);
> +	if (*q == USHRT_MAX && errno == ERANGE)
> +		return -1;
> +
> +	if (*q > RTE_MAX_QUEUES_PER_PORT)
> +		return -1;
> +
> +	return 0;
> +}
> +
> +static int
> +rte_pmd_vhost_devinit(const char *name, const char *params)
> +{
> +	struct rte_kvargs *kvlist = NULL;
> +	int ret = 0;
> +	char *iface_name;
> +	uint16_t queues;
> +
> +	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
> +
> +	kvlist = rte_kvargs_parse(params, valid_arguments);
> +	if (kvlist == NULL)
> +		return -1;
> +
> +	if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
> +		ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
> +					 &open_iface, &iface_name);
> +		if (ret < 0)
> +			goto out_free;
> +	} else {
> +		ret = -1;
> +		goto out_free;
> +	}
> +
> +	if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
> +		ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
> +					 &open_queues, &queues);
> +		if (ret < 0)
> +			goto out_free;
> +
> +	} else
> +		queues = 1;
> +
> +	eth_dev_vhost_create(name, iface_name, queues,
> rte_socket_id());
> +
> +out_free:
> +	rte_kvargs_free(kvlist);
> +	return ret;
> +}
> +
> +static int
> +rte_pmd_vhost_devuninit(const char *name)
> +{
> +	struct rte_eth_dev *eth_dev = NULL;
> +	struct pmd_internal *internal;
> +	struct internal_list *list;
> +	unsigned int i;
> +
> +	RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
> +
> +	/* find an ethdev entry */
> +	eth_dev = rte_eth_dev_allocated(name);
> +	if (eth_dev == NULL)
> +		return -ENODEV;
> +
> +	internal = eth_dev->data->dev_private;
> +	if (internal == NULL)
> +		return -ENODEV;
> +
> +	list = find_internal_resource(internal->iface_name);
> +	if (list == NULL)
> +		return -ENODEV;
> +
> +	pthread_mutex_lock(&internal_list_lock);
> +	TAILQ_REMOVE(&internal_list, list, next);
> +	pthread_mutex_unlock(&internal_list_lock);
> +	rte_free(list);
> +
> +	eth_dev_stop(eth_dev);
> +
> +	rte_free(vring_states[eth_dev->data->port_id]);
> +	vring_states[eth_dev->data->port_id] = NULL;
> +
> +	free(internal->dev_name);
> +	free(internal->iface_name);
> +
> +	for (i = 0; i < eth_dev->data->nb_rx_queues; i++)
> +		rte_free(eth_dev->data->rx_queues[i]);
> +	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
> +		rte_free(eth_dev->data->tx_queues[i]);
> +
> +	rte_free(eth_dev->data->mac_addrs);
> +	rte_free(eth_dev->data);
> +	rte_free(internal);
> +
> +	rte_eth_dev_release_port(eth_dev);
> +
> +	return 0;
> +}
> +
> +static struct rte_driver pmd_vhost_drv = {
> +	.name = "eth_vhost",
> +	.type = PMD_VDEV,
> +	.init = rte_pmd_vhost_devinit,
> +	.uninit = rte_pmd_vhost_devuninit,
> +};
> +
> +PMD_REGISTER_DRIVER(pmd_vhost_drv);
> diff --git a/drivers/net/vhost/rte_eth_vhost.h
> b/drivers/net/vhost/rte_eth_vhost.h
> new file mode 100644
> index 0000000..e78cb74
> --- /dev/null
> +++ b/drivers/net/vhost/rte_eth_vhost.h
> @@ -0,0 +1,109 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2016 IGEL Co., Ltd.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of IGEL Co., Ltd. nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> + */
> +
> +#ifndef _RTE_ETH_VHOST_H_
> +#define _RTE_ETH_VHOST_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <stdint.h>
> +#include <stdbool.h>
> +
> +#include <rte_virtio_net.h>
> +
> +/**
> + * Disable features in feature_mask.
> + *
> + * @param feature_mask
> + *  Vhost features defined in "linux/virtio_net.h".
> + * @return
> + *  - On success, zero.
> + *  - On failure, a negative value.
> + */
> +int rte_eth_vhost_feature_disable(uint64_t feature_mask);
> +
> +/**
> + * Enable features in feature_mask.
> + *
> + * @param feature_mask
> + *  Vhost features defined in "linux/virtio_net.h".
> + * @return
> + *  - On success, zero.
> + *  - On failure, a negative value.
> + */
> +int rte_eth_vhost_feature_enable(uint64_t feature_mask);
> +
> +/**
> + * Returns currently supported vhost features.
> + *
> + * @return
> + *  Vhost features defined in "linux/virtio_net.h".
> + */
> +uint64_t rte_eth_vhost_feature_get(void);
> +
> +/*
> + * Event description.
> + */
> +struct rte_eth_vhost_queue_event {
> +	uint16_t queue_id;
> +	bool rx;
> +	bool enable;
> +};
> +
> +/**
> + * Get queue events from specified port.
> + * If a callback for below event is registered by
> + * rte_eth_dev_callback_register(), this function will describe what was
> + * changed.
> + *  - RTE_ETH_EVENT_QUEUE_STATE_CHANGE
> + * Multiple events may cause only one callback kicking, so call this function
> + * while returning 0.
> + *
> + * @param port_id
> + *  Port id.
> + * @param event
> + *  Pointer to a rte_eth_vhost_queue_event structure.
> + * @return
> + *  - On success, zero.
> + *  - On failure, a negative value.
> + */
> +int rte_eth_vhost_get_queue_event(uint8_t port_id,
> +		struct rte_eth_vhost_queue_event *event);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif
> diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map
> b/drivers/net/vhost/rte_pmd_vhost_version.map
> new file mode 100644
> index 0000000..65bf3a8
> --- /dev/null
> +++ b/drivers/net/vhost/rte_pmd_vhost_version.map
> @@ -0,0 +1,10 @@
> +DPDK_16.04 {
> +	global:
> +
> +	rte_eth_vhost_feature_disable;
> +	rte_eth_vhost_feature_enable;
> +	rte_eth_vhost_feature_get;
> +	rte_eth_vhost_get_queue_event;
> +
> +	local: *;
> +};
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index a1cd9a3..bd973e8 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -166,6 +166,12 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SNOW3G)
> += -lrte_pmd_snow3g
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SNOW3G)     += -
> L$(LIBSSO_PATH)/build -lsso
>  endif # CONFIG_RTE_LIBRTE_CRYPTODEV
> 
> +ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
> +
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
> +
> +endif # $(CONFIG_RTE_LIBRTE_VHOST)
> +
>  endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
> 
>  _LDLIBS-y += $(EXECENV_LDLIBS)
> --
> 2.1.4

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v13 2/2] vhost: Add VHOST PMD
  2016-03-21 15:40                                         ` Loftus, Ciara
@ 2016-03-22  1:55                                           ` Tetsuya Mukawa
  2016-03-22  2:50                                             ` Tetsuya Mukawa
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-22  1:55 UTC (permalink / raw)
  To: Loftus, Ciara, dev; +Cc: Richardson, Bruce, ann.zhuangyanying, thomas.monjalon

On 2016/03/22 0:40, Loftus, Ciara wrote:
>> +
>> +static void
>> +eth_dev_info(struct rte_eth_dev *dev,
>> +	     struct rte_eth_dev_info *dev_info)
>> +{
>> +	dev_info->driver_name = drivername;
>> +	dev_info->max_mac_addrs = 1;
>> +	dev_info->max_rx_pktlen = (uint32_t)-1;
>> +	dev_info->max_rx_queues = dev->data->nb_rx_queues;
>> +	dev_info->max_tx_queues = dev->data->nb_tx_queues;
> I'm not entirely familiar with eth driver code so please correct me if I am wrong.
>
> I'm wondering if assigning the max queue values to dev->data->nb_*x_queues is correct.
> A user could change the value of nb_*x_queues with a call to rte_eth_dev_configure(n_queues) which in turn calls rte_eth_dev_*x_queue_config(n_queues) which will set dev->data->nb_*x_queues to the value of n_queues which can be arbitrary and decided by the user. If this is the case, dev->data->nb_*x_queues will no longer reflect the max, rather the value the user chose in the call to rte_eth_dev_configure. And the max could potentially change with multiple calls to configure. Is this intended behaviour?

Hi Ciara,

Thanks for reviewing it. Here is a part of rte_eth_dev_configure().

int
rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
                      const struct rte_eth_conf *dev_conf)
{
        <snip>
        /*
         * Check that the numbers of RX and TX queues are not greater
         * than the maximum number of RX and TX queues supported by the
         * configured device.
         */
        (*dev->dev_ops->dev_infos_get)(dev, &dev_info);

        if (nb_rx_q == 0 && nb_tx_q == 0) {
               <snip>
                return -EINVAL;
        }

        if (nb_rx_q > dev_info.max_rx_queues) {
               <snip>
                return -EINVAL;
        }

        if (nb_tx_q > dev_info.max_tx_queues) {
               <snip>
                return -EINVAL;
        }

        <snip>

        /*
         * Setup new number of RX/TX queues and reconfigure device.
         */
        diag = rte_eth_dev_rx_queue_config(dev, nb_rx_q);
        <snip>
        diag = rte_eth_dev_tx_queue_config(dev, nb_tx_q);
        <snip>
}

Anyway, rte_eth_dev_tx/rx_queue_config() will be called only after
checking the current maximum number of queues.
So the user cannot set the number of queues greater than current maximum
number.

Regards,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v13 2/2] vhost: Add VHOST PMD
  2016-03-22  1:55                                           ` Tetsuya Mukawa
@ 2016-03-22  2:50                                             ` Tetsuya Mukawa
  2016-03-22 10:33                                               ` Loftus, Ciara
  0 siblings, 1 reply; 200+ messages in thread
From: Tetsuya Mukawa @ 2016-03-22  2:50 UTC (permalink / raw)
  To: Loftus, Ciara, dev; +Cc: Richardson, Bruce, ann.zhuangyanying, thomas.monjalon

On 2016/03/22 10:55, Tetsuya Mukawa wrote:
> On 2016/03/22 0:40, Loftus, Ciara wrote:
>>> +
>>> +static void
>>> +eth_dev_info(struct rte_eth_dev *dev,
>>> +	     struct rte_eth_dev_info *dev_info)
>>> +{
>>> +	dev_info->driver_name = drivername;
>>> +	dev_info->max_mac_addrs = 1;
>>> +	dev_info->max_rx_pktlen = (uint32_t)-1;
>>> +	dev_info->max_rx_queues = dev->data->nb_rx_queues;
>>> +	dev_info->max_tx_queues = dev->data->nb_tx_queues;
>> I'm not entirely familiar with eth driver code so please correct me if I am wrong.
>>
>> I'm wondering if assigning the max queue values to dev->data->nb_*x_queues is correct.
>> A user could change the value of nb_*x_queues with a call to rte_eth_dev_configure(n_queues) which in turn calls rte_eth_dev_*x_queue_config(n_queues) which will set dev->data->nb_*x_queues to the value of n_queues which can be arbitrary and decided by the user. If this is the case, dev->data->nb_*x_queues will no longer reflect the max, rather the value the user chose in the call to rte_eth_dev_configure. And the max could potentially change with multiple calls to configure. Is this intended behaviour?
> Hi Ciara,
>
> Thanks for reviewing it. Here is a part of rte_eth_dev_configure().
>
> int
> rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
>                       const struct rte_eth_conf *dev_conf)
> {
>         <snip>
>         /*
>          * Check that the numbers of RX and TX queues are not greater
>          * than the maximum number of RX and TX queues supported by the
>          * configured device.
>          */
>         (*dev->dev_ops->dev_infos_get)(dev, &dev_info);
>
>         if (nb_rx_q == 0 && nb_tx_q == 0) {
>                <snip>
>                 return -EINVAL;
>         }
>
>         if (nb_rx_q > dev_info.max_rx_queues) {
>                <snip>
>                 return -EINVAL;
>         }
>
>         if (nb_tx_q > dev_info.max_tx_queues) {
>                <snip>
>                 return -EINVAL;
>         }
>
>         <snip>
>
>         /*
>          * Setup new number of RX/TX queues and reconfigure device.
>          */
>         diag = rte_eth_dev_rx_queue_config(dev, nb_rx_q);
>         <snip>
>         diag = rte_eth_dev_tx_queue_config(dev, nb_tx_q);
>         <snip>
> }
>
> Anyway, rte_eth_dev_tx/rx_queue_config() will be called only after
> checking the current maximum number of queues.
> So the user cannot set the number of queues greater than current maximum
> number.
>
> Regards,
> Tetsuya

Hi Ciara,

Now, I understand what you say.
Probably you pointed out the case that the user specified a value
smaller than current maximum value.

For example, if we have 4 queues. Below code will be failed at last line.
rte_eth_dev_configure(portid, 4, 4, ...);
rte_eth_dev_configure(portid, 2, 2, ...);
rte_eth_dev_configure(portid, 4, 4, ...);

I will submit a patch to fix it. Could you please review and ack it?

Regards,
Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v13 2/2] vhost: Add VHOST PMD
  2016-03-22  2:50                                             ` Tetsuya Mukawa
@ 2016-03-22 10:33                                               ` Loftus, Ciara
  0 siblings, 0 replies; 200+ messages in thread
From: Loftus, Ciara @ 2016-03-22 10:33 UTC (permalink / raw)
  To: Tetsuya Mukawa, dev; +Cc: Richardson, Bruce, ann.zhuangyanying, thomas.monjalon

> 
> On 2016/03/22 10:55, Tetsuya Mukawa wrote:
> > On 2016/03/22 0:40, Loftus, Ciara wrote:
> >>> +
> >>> +static void
> >>> +eth_dev_info(struct rte_eth_dev *dev,
> >>> +	     struct rte_eth_dev_info *dev_info)
> >>> +{
> >>> +	dev_info->driver_name = drivername;
> >>> +	dev_info->max_mac_addrs = 1;
> >>> +	dev_info->max_rx_pktlen = (uint32_t)-1;
> >>> +	dev_info->max_rx_queues = dev->data->nb_rx_queues;
> >>> +	dev_info->max_tx_queues = dev->data->nb_tx_queues;
> >> I'm not entirely familiar with eth driver code so please correct me if I am
> wrong.
> >>
> >> I'm wondering if assigning the max queue values to dev->data-
> >nb_*x_queues is correct.
> >> A user could change the value of nb_*x_queues with a call to
> rte_eth_dev_configure(n_queues) which in turn calls
> rte_eth_dev_*x_queue_config(n_queues) which will set dev->data-
> >nb_*x_queues to the value of n_queues which can be arbitrary and
> decided by the user. If this is the case, dev->data->nb_*x_queues will no
> longer reflect the max, rather the value the user chose in the call to
> rte_eth_dev_configure. And the max could potentially change with multiple
> calls to configure. Is this intended behaviour?
> > Hi Ciara,
> >
> > Thanks for reviewing it. Here is a part of rte_eth_dev_configure().
> >
> > int
> > rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, uint16_t
> nb_tx_q,
> >                       const struct rte_eth_conf *dev_conf)
> > {
> >         <snip>
> >         /*
> >          * Check that the numbers of RX and TX queues are not greater
> >          * than the maximum number of RX and TX queues supported by the
> >          * configured device.
> >          */
> >         (*dev->dev_ops->dev_infos_get)(dev, &dev_info);
> >
> >         if (nb_rx_q == 0 && nb_tx_q == 0) {
> >                <snip>
> >                 return -EINVAL;
> >         }
> >
> >         if (nb_rx_q > dev_info.max_rx_queues) {
> >                <snip>
> >                 return -EINVAL;
> >         }
> >
> >         if (nb_tx_q > dev_info.max_tx_queues) {
> >                <snip>
> >                 return -EINVAL;
> >         }
> >
> >         <snip>
> >
> >         /*
> >          * Setup new number of RX/TX queues and reconfigure device.
> >          */
> >         diag = rte_eth_dev_rx_queue_config(dev, nb_rx_q);
> >         <snip>
> >         diag = rte_eth_dev_tx_queue_config(dev, nb_tx_q);
> >         <snip>
> > }
> >
> > Anyway, rte_eth_dev_tx/rx_queue_config() will be called only after
> > checking the current maximum number of queues.
> > So the user cannot set the number of queues greater than current
> maximum
> > number.
> >
> > Regards,
> > Tetsuya
> 
> Hi Ciara,
> 
> Now, I understand what you say.
> Probably you pointed out the case that the user specified a value
> smaller than current maximum value.
> 
> For example, if we have 4 queues. Below code will be failed at last line.
> rte_eth_dev_configure(portid, 4, 4, ...);
> rte_eth_dev_configure(portid, 2, 2, ...);
> rte_eth_dev_configure(portid, 4, 4, ...);
> 
> I will submit a patch to fix it. Could you please review and ack it?

Hi Tetsuya,

Correct, sorry for the initial confusion. Thanks for the patch so quickly.
I've reviewed the code - looks good. I just want to run some tests and will give my Ack later today all going well.

Thanks,
Ciara

> 
> Regards,
> Tetsuya

^ permalink raw reply	[flat|nested] 200+ messages in thread

end of thread, other threads:[~2016-03-22 10:33 UTC | newest]

Thread overview: 200+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-31  3:55 [RFC PATCH v2] Add VHOST PMD Tetsuya Mukawa
2015-08-31  3:55 ` [RFC PATCH v2] vhost: " Tetsuya Mukawa
2015-09-23 17:47   ` Loftus, Ciara
2015-10-16  8:40     ` Tetsuya Mukawa
2015-10-20 14:13       ` Loftus, Ciara
2015-10-21  4:30         ` Tetsuya Mukawa
2015-10-21 10:09           ` Bruce Richardson
2015-10-16 12:52   ` Bruce Richardson
2015-10-19  1:51     ` Tetsuya Mukawa
2015-10-19  9:32       ` Loftus, Ciara
2015-10-19  9:45         ` Bruce Richardson
2015-10-19 10:50           ` Tetsuya Mukawa
2015-10-19 13:26             ` Panu Matilainen
2015-10-19 13:27               ` Richardson, Bruce
2015-10-21  4:35                 ` Tetsuya Mukawa
2015-10-21  6:25                   ` Panu Matilainen
2015-10-21 10:22                     ` Bruce Richardson
2015-10-22  9:50                       ` Tetsuya Mukawa
2015-10-27 13:44                         ` Traynor, Kevin
2015-10-28  2:24                           ` Tetsuya Mukawa
2015-10-22  9:45   ` [RFC PATCH v3 0/2] " Tetsuya Mukawa
2015-10-22  9:45     ` [RFC PATCH v3 1/2] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
2015-10-27  6:12       ` [PATCH 0/3] Add VHOST PMD Tetsuya Mukawa
2015-10-27  6:12         ` [PATCH 1/3] vhost: Fix wrong handling of virtqueue array index Tetsuya Mukawa
2015-10-27  6:29           ` Yuanhan Liu
2015-10-27  6:33             ` Yuanhan Liu
2015-10-27  6:47           ` Yuanhan Liu
2015-10-27  7:28             ` Tetsuya Mukawa
2015-10-27  7:34               ` Yuanhan Liu
2015-10-27  6:12         ` [PATCH 2/3] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
2015-10-30 17:49           ` Loftus, Ciara
2015-11-02  3:15             ` Tetsuya Mukawa
2015-10-27  6:12         ` [PATCH 3/3] vhost: Add VHOST PMD Tetsuya Mukawa
2015-11-02  3:58           ` [PATCH v2 0/2] " Tetsuya Mukawa
2015-11-02  3:58             ` [PATCH v2 1/2] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
2015-11-09  5:16               ` [PATCH v3 0/2] Add VHOST PMD Tetsuya Mukawa
2015-11-09  5:17                 ` [PATCH v3 1/2] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
2015-11-09 18:16                   ` Aaron Conole
2015-11-10  3:13                     ` Tetsuya Mukawa
2015-11-10  7:16                       ` Panu Matilainen
2015-11-10  9:48                         ` Tetsuya Mukawa
2015-11-10 10:05                           ` Panu Matilainen
2015-11-10 10:15                             ` Tetsuya Mukawa
2015-11-09  5:17                 ` [PATCH v3 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
2015-11-09  6:21                   ` Yuanhan Liu
2015-11-09  6:27                     ` Tetsuya Mukawa
2015-11-09 22:22                   ` Stephen Hemminger
2015-11-10  3:14                     ` Tetsuya Mukawa
2015-11-12 12:52                   ` Wang, Zhihong
2015-11-13  3:09                     ` Tetsuya Mukawa
2015-11-13  3:50                       ` Wang, Zhihong
2015-11-13  4:03                   ` Rich Lane
2015-11-13  4:29                     ` Tetsuya Mukawa
2015-11-13  5:20                   ` [PATCH v4 0/2] " Tetsuya Mukawa
2015-11-13  5:20                     ` [PATCH v4 1/2] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
2015-11-17 13:29                       ` Yuanhan Liu
2015-11-19  2:03                         ` Tetsuya Mukawa
2015-11-19  2:18                           ` Yuanhan Liu
2015-11-19  3:13                             ` Tetsuya Mukawa
2015-11-19  3:33                               ` Yuanhan Liu
2015-11-19  5:14                                 ` Tetsuya Mukawa
2015-11-19  5:45                                   ` Yuanhan Liu
2015-11-19  5:58                                     ` Tetsuya Mukawa
2015-11-19  6:31                                       ` Yuanhan Liu
2015-11-19  6:37                                         ` Tetsuya Mukawa
2015-11-13  5:20                     ` [PATCH v4 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
2015-11-16  1:57                       ` Wang, Zhihong
2015-11-20 11:43                       ` Yuanhan Liu
2015-11-24  2:48                         ` Tetsuya Mukawa
2015-11-24  3:40                           ` Yuanhan Liu
2015-11-24  3:44                             ` Tetsuya Mukawa
2015-11-21  0:15                       ` Rich Lane
2015-11-24  4:41                         ` Tetsuya Mukawa
2015-11-24  9:00                       ` [PATCH v5 0/3] " Tetsuya Mukawa
2015-11-24  9:00                         ` [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD Tetsuya Mukawa
2015-12-17 11:42                           ` Yuanhan Liu
2015-12-18  3:15                             ` Tetsuya Mukawa
2015-12-18  3:36                               ` Tetsuya Mukawa
2015-12-18  4:15                               ` Yuanhan Liu
2015-12-18  4:28                                 ` Tetsuya Mukawa
2015-12-18 18:01                                   ` Rich Lane
2015-12-21  2:10                                     ` Tetsuya Mukawa
2015-12-22  4:36                                       ` Yuanhan Liu
2015-12-22  3:41                                     ` Yuanhan Liu
2015-12-22  4:47                                       ` Rich Lane
2015-12-22  5:47                                         ` Yuanhan Liu
2015-12-22  9:38                                           ` Rich Lane
2015-12-23  2:44                                             ` Yuanhan Liu
2015-12-23 22:00                                               ` Thomas Monjalon
2015-12-24  3:51                                                 ` Yuanhan Liu
2015-12-24  4:07                                                   ` Tetsuya Mukawa
2015-12-24  3:09                                         ` Tetsuya Mukawa
2015-12-24  3:54                                           ` Tetsuya Mukawa
2015-12-24  4:00                                           ` Yuanhan Liu
2015-12-24  4:23                                             ` Tetsuya Mukawa
2015-12-24  5:37                                           ` Rich Lane
2015-12-24  7:58                                             ` Tetsuya Mukawa
2015-12-28 21:59                                               ` Rich Lane
2016-01-06  3:56                                                 ` Tetsuya Mukawa
2016-01-06  7:38                                                   ` Yuanhan Liu
2015-12-18 10:03                                 ` Xie, Huawei
2015-12-21  2:10                                   ` Tetsuya Mukawa
2016-02-02 11:18                           ` [PATCH v6 0/2] Add VHOST PMD Tetsuya Mukawa
2016-02-02 19:52                             ` Rich Lane
2016-02-02 11:18                           ` [PATCH v6 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
2016-02-02 11:18                           ` [PATCH v6 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
2016-02-02 23:43                             ` Ferruh Yigit
2016-02-03  2:13                               ` Tetsuya Mukawa
2016-02-03  7:48                               ` Tetsuya Mukawa
2016-02-03  9:24                                 ` Ferruh Yigit
2016-02-03  9:35                                   ` Tetsuya Mukawa
2016-02-04  7:26                           ` [PATCH v7 0/2] " Tetsuya Mukawa
2016-02-04  7:26                           ` [PATCH v7 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
2016-02-04  7:26                           ` [PATCH v7 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
2016-02-04 11:17                             ` Ferruh Yigit
2016-02-05  6:28                               ` Tetsuya Mukawa
2016-02-05  6:35                                 ` Yuanhan Liu
2016-02-05  7:10                                   ` Tetsuya Mukawa
2016-02-08  9:42                                 ` Ferruh Yigit
2016-02-09  1:54                                   ` Tetsuya Mukawa
2016-02-05 11:28                             ` [PATCH v8 0/2] " Tetsuya Mukawa
2016-02-05 11:28                             ` [PATCH v8 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
2016-02-06  4:57                               ` Yuanhan Liu
2016-02-05 11:28                             ` [PATCH v8 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
2016-02-06  5:12                               ` Yuanhan Liu
2016-02-09  9:38                               ` [PATCH v9 0/2] " Tetsuya Mukawa
2016-02-24  2:45                                 ` Qiu, Michael
2016-02-24  5:09                                   ` Tetsuya Mukawa
2016-02-25  7:51                                     ` Qiu, Michael
2016-02-26  4:29                                       ` Tetsuya Mukawa
2016-02-26  8:35                                         ` Tetsuya Mukawa
2016-03-01  2:00                                           ` Qiu, Michael
2016-03-01  2:19                                             ` Tetsuya Mukawa
2016-03-02  2:24                                               ` Qiu, Michael
2016-03-04  1:12                                                 ` Tetsuya Mukawa
2016-02-09  9:38                               ` [PATCH v9 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
2016-03-04  4:17                                 ` [PATCH v10 0/2] Add VHOST PMD Tetsuya Mukawa
2016-03-04  4:17                                 ` [PATCH v10 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
2016-03-04  4:17                                 ` [PATCH v10 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
2016-03-04  8:39                                   ` Yuanhan Liu
2016-03-04  9:58                                     ` Tetsuya Mukawa
2016-03-07  2:07                                   ` [PATCH v11 0/2] " Tetsuya Mukawa
2016-03-07  2:07                                   ` [PATCH v11 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
2016-03-07  2:07                                   ` [PATCH v11 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
2016-03-14 12:02                                     ` Bruce Richardson
2016-03-15  5:35                                       ` Tetsuya Mukawa
2016-03-15  8:31                                     ` [PATCH v12 0/2] " Tetsuya Mukawa
2016-03-15  8:31                                     ` [PATCH v12 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
2016-03-18 13:54                                       ` Thomas Monjalon
2016-03-15  8:31                                     ` [PATCH v12 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
2016-03-18 12:27                                       ` Bruce Richardson
2016-03-18 13:41                                         ` Tetsuya Mukawa
2016-03-18 13:52                                           ` Thomas Monjalon
2016-03-18 14:03                                             ` Tetsuya Mukawa
2016-03-18 14:13                                               ` Bruce Richardson
2016-03-18 14:21                                                 ` Tetsuya Mukawa
2016-03-21  5:41                                         ` Tetsuya Mukawa
2016-03-21  5:45                                       ` [PATCH v13 0/2] " Tetsuya Mukawa
2016-03-21 12:42                                         ` Bruce Richardson
2016-03-21  5:45                                       ` [PATCH v13 1/2] ethdev: Add a new event type to notify a queue state changed event Tetsuya Mukawa
2016-03-21  8:37                                         ` Thomas Monjalon
2016-03-21  9:24                                           ` Tetsuya Mukawa
2016-03-21 11:05                                             ` Bruce Richardson
2016-03-21 13:51                                               ` Tetsuya Mukawa
2016-03-21  5:45                                       ` [PATCH v13 2/2] vhost: Add VHOST PMD Tetsuya Mukawa
2016-03-21 15:40                                         ` Loftus, Ciara
2016-03-22  1:55                                           ` Tetsuya Mukawa
2016-03-22  2:50                                             ` Tetsuya Mukawa
2016-03-22 10:33                                               ` Loftus, Ciara
2016-02-09  9:38                               ` [PATCH v9 " Tetsuya Mukawa
2015-11-24  9:00                         ` [PATCH v5 2/3] " Tetsuya Mukawa
2015-12-18  7:45                           ` Yuanhan Liu
2015-12-18  9:25                             ` Tetsuya Mukawa
2015-11-24  9:00                         ` [PATCH v5 3/3] vhost: Add helper function to convert port id to virtio device pointer Tetsuya Mukawa
2015-12-17 11:47                           ` Yuanhan Liu
2015-12-18  3:15                             ` Tetsuya Mukawa
2015-12-18  4:19                               ` Yuanhan Liu
2015-12-08  1:12                         ` [PATCH v5 0/3] Add VHOST PMD Tetsuya Mukawa
2015-12-08  2:03                           ` Yuanhan Liu
2015-12-08  2:10                             ` Tetsuya Mukawa
2015-11-13  5:32                     ` [PATCH v4 0/2] " Yuanhan Liu
2015-11-13  5:37                       ` Tetsuya Mukawa
2015-11-13  6:50                       ` Tetsuya Mukawa
2015-11-17 13:26                         ` Yuanhan Liu
2015-11-19  1:20                           ` Tetsuya Mukawa
2015-11-09  5:42                 ` [PATCH v3 " Yuanhan Liu
2015-11-02  3:58             ` [PATCH v2 2/2] vhost: " Tetsuya Mukawa
2015-11-06  2:22               ` Yuanhan Liu
2015-11-06  3:54                 ` Tetsuya Mukawa
2015-11-05  2:17             ` [PATCH v2 0/2] " Tetsuya Mukawa
2015-11-09 22:25           ` [PATCH 3/3] vhost: " Stephen Hemminger
2015-11-10  3:27             ` Tetsuya Mukawa
2015-10-27  7:54         ` [PATCH 0/3] " Tetsuya Mukawa
2015-10-30 18:30           ` Thomas Monjalon
2015-11-02  3:15             ` Tetsuya Mukawa
2015-10-22  9:45     ` [RFC PATCH v3 2/2] vhost: " Tetsuya Mukawa
2015-10-22 12:49       ` Bruce Richardson
2015-10-23  3:48         ` Tetsuya Mukawa
2015-10-29 14:25       ` Xie, Huawei
2015-10-30  1:18         ` Tetsuya Mukawa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.