All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH V2 0/3] example/vhost: Introduce Vswitch Framework
@ 2016-09-04 10:23 Pankaj Chauhan
  2016-09-04 10:23 ` [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework Pankaj Chauhan
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Pankaj Chauhan @ 2016-09-04 10:23 UTC (permalink / raw)
  To: dev
  Cc: hemant.agrawal, jianfeng.tan, yuanhan.liu, maxime.coquelin,
	Pankaj Chauhan

Introduce generic vswitch framework in vhost-switch application. Following
are the goals/aim of the framework:

1. Make vhost-switch application generic so that it can support devices 
which don't support VMDQ.

2. Provide a framework so that any switching logic (generic in software or
vendor specefic like VMDQ) can work with vhost-switch. Thus making vhost-switch
applicable for multiple platforms of different vendors.

3. Make vhost-switch and switching logic scalable in terms of ports or policies
of doing rx/tx across the ports added to switch.

The patchset includes three patches:
1. "Add vswitch(generic switch) framework": This adds the generic framework, it provides
the APIs/accessor functions which the vhos-switch application uses without knowing
anything about underlying switching logic. The framework introduces the concept of
vswitch_device, vswitch_port, and vswitch_ops. The idea is that vhost-switch will
sit over the framework and different switching logics will plug into the framework
underneath it. Please refer the patch description for more details of devices, ports
and ops.

2. "Add vswitch command line options": Adds two new command line options for vswitch.
3. "Add VMDQ vswitch device": This patch basically delinks existing vhost/main.[c,h]
from VMDQ and adds VMDQ as a vswitch device implmentation which plugs into the vswitch
framework. Any other vendor logic for switching can also be plugged in same way.

Thanks to Tan Jianfeng, Yuanhan Liu, Maxime coquelin for early discussions and inputs
on this concept.

TODO list:
1. Addd constructor based logic for registration of Vswitch implmentations like VMDQ.
we will use similar mechanism as registration of PMD drivers (base on constructor function)
to register all the switch implmentations in vhost-switch and then select the required
implementation using command line option 'switch'.

2. Test VM2VM hardware mode: I tried following command it didn't work:

vhost-switch -c f -n 4 --socket-mem 1024 --huge-dir /mnt/huge  -- -p 0x1  --vm2vm 2 
  	--vlan-strip 1 --dev-basename usvhost
I tried same command on upstream master also but it didn't work, that means i am doing something
wrong, please help me with correct method to test it. Or now the patches are working, so
if possible please give a try to hardware VM2VM mode.

3. Changes for Maxime's comment on freeing of buffers in vs_looup_n_fwd(). I wasn't clear about
what is the right way to handle, we'll fix it after getting feedback from Maxime.

Change Log
==========

v1->v2
------
1. Tested following:
	- Ping and iperf from VM to external machine connected to Phys port.
	- Ping and iperf betweeen 2 VMs in software vm2vm mode

	- My testsetup is:
		-  vhost & VM machine: xeonD1521 with Network card: X552/X557-AT 10GBASE-T
		- Peer connected to VM machine: OptiPlex-790 with 82574L Gigabit card
2. Incorporated changes for all the comments from Maxime Coquelin. One comment is not yet
   taken care off, and it is listed in TODO list.

Pankaj Chauhan (3):
  examples/vhost: Add vswitch (generic switch) framework
  examples/vhost: Add vswitch command line options
  examples/vhost: Add VMDQ vswitch device

 examples/vhost/Makefile         |   2 +-
 examples/vhost/main.c           | 651 +++++++++++++-------------------------
 examples/vhost/main.h           |  10 +
 examples/vhost/vmdq.c           | 669 ++++++++++++++++++++++++++++++++++++++++
 examples/vhost/vmdq.h           |  57 ++++
 examples/vhost/vswitch_common.c | 499 ++++++++++++++++++++++++++++++
 examples/vhost/vswitch_common.h | 186 +++++++++++
 examples/vhost/vswitch_txrx.c   |  97 ++++++
 examples/vhost/vswitch_txrx.h   |  71 +++++
 9 files changed, 1800 insertions(+), 442 deletions(-)
 create mode 100644 examples/vhost/vmdq.c
 create mode 100644 examples/vhost/vmdq.h
 create mode 100644 examples/vhost/vswitch_common.c
 create mode 100644 examples/vhost/vswitch_common.h
 create mode 100644 examples/vhost/vswitch_txrx.c
 create mode 100644 examples/vhost/vswitch_txrx.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework
  2016-09-04 10:23 [RFC][PATCH V2 0/3] example/vhost: Introduce Vswitch Framework Pankaj Chauhan
@ 2016-09-04 10:23 ` Pankaj Chauhan
  2016-09-09  8:56   ` Tan, Jianfeng
                     ` (2 more replies)
  2016-09-04 10:23 ` [RFC][PATCH V2 2/3] examples/vhost: Add vswitch command line options Pankaj Chauhan
  2016-09-04 10:24 ` [RFC][PATCH V2 3/3] examples/vhost: Add VMDQ vswitch device Pankaj Chauhan
  2 siblings, 3 replies; 22+ messages in thread
From: Pankaj Chauhan @ 2016-09-04 10:23 UTC (permalink / raw)
  To: dev
  Cc: hemant.agrawal, jianfeng.tan, yuanhan.liu, maxime.coquelin,
	Pankaj Chauhan

Introduce support for a generic framework for handling of switching between
physical and vhost devices. The vswitch framework introduces the following
concept:

1. vswitch_dev: Vswitch device is a logical switch which can have physical and
virtio devices. The devices are operated/used using standard rte_eth API for
physical devices and lib_vhost API for vhost devices, PMD API is not used
for vhost devices.

2. vswitch_port: Any physical or virtio device that is added to vswitch. The
port can have its own tx/rx functions for doing data transfer, which are exposed
to the framework using generic function pointers (vs_port->do_tx/do_rx). This way
the generic code can do tx/rx without understanding the type of device (Physical or
virtio). Similarly each port has its own functions to select tx/rx queues. The framework
provides default tx/rx queue selection functions to the port when port is added
(for both physical and vhost devices). But the framework allows the switch implementation
to override the queue selection functions (vs_port->get_txq/rxq) if required.

3. vswitch_ops: The ops is set of function pointers which are used to do operations
like learning, unlearning, add/delete port, lookup_and_forward. The user of vswitch
framework (vhost/main.[c,h])uses these function pointers to perform above mentioned
operations, thus it remains agnostic of the underlying implementation.

Different switching logics can implement their vswitch_device and vswitch_ops, and
register with the framework. This framework makes vhost-switch application scalable
in terms of:

1. Different switching logics (one of them is vmdq, vhost/vmdq.[c,h]
2. Number of ports.
3. Policies of selecting ports for rx and tx.

Signed-off-by: Pankaj Chauhan <pankaj.chauhan@nxp.com>
---
 examples/vhost/Makefile         |   2 +-
 examples/vhost/main.c           | 128 +++++++++--
 examples/vhost/vswitch_common.c | 499 ++++++++++++++++++++++++++++++++++++++++
 examples/vhost/vswitch_common.h | 186 +++++++++++++++
 examples/vhost/vswitch_txrx.c   |  97 ++++++++
 examples/vhost/vswitch_txrx.h   |  71 ++++++
 6 files changed, 966 insertions(+), 17 deletions(-)
 create mode 100644 examples/vhost/vswitch_common.c
 create mode 100644 examples/vhost/vswitch_common.h
 create mode 100644 examples/vhost/vswitch_txrx.c
 create mode 100644 examples/vhost/vswitch_txrx.h

diff --git a/examples/vhost/Makefile b/examples/vhost/Makefile
index e95c68a..1b58308 100644
--- a/examples/vhost/Makefile
+++ b/examples/vhost/Makefile
@@ -48,7 +48,7 @@ else
 APP = vhost-switch
 
 # all source are stored in SRCS-y
-SRCS-y := main.c
+SRCS-y := main.c vswitch_common.c vswitch_txrx.c vmdq.c
 
 CFLAGS += -O2 -D_FILE_OFFSET_BITS=64
 CFLAGS += $(WERROR_FLAGS)
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 92a9823..c949df4 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -811,9 +811,16 @@ static inline void __attribute__((always_inline))
 virtio_xmit(struct vhost_dev *dst_vdev, struct vhost_dev *src_vdev,
 	    struct rte_mbuf *m)
 {
-	uint16_t ret;
+	uint16_t ret, txq;
+	struct vswitch_port *vs_port = dst_vdev->vs_port;
 
-	ret = rte_vhost_enqueue_burst(dst_vdev->vid, VIRTIO_RXQ, &m, 1);
+	/* The switch implmentation decides the txq for each port itself.
+	 * this gives switch implmentations to choose thier own queue management
+	 * for each port
+	 */
+
+	txq = vs_port->get_txq(vs_port, rte_lcore_id());
+	ret = rte_vhost_enqueue_burst(dst_vdev->vid, txq, &m, 1);
 	if (enable_stats) {
 		rte_atomic64_inc(&dst_vdev->stats.rx_total_atomic);
 		rte_atomic64_add(&dst_vdev->stats.rx_atomic, ret);
@@ -832,7 +839,7 @@ virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m)
 	struct ether_hdr *pkt_hdr;
 	struct vhost_dev *dst_vdev;
 
-	pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
+		pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
 
 	dst_vdev = find_vhost_dev(&pkt_hdr->d_addr);
 	if (!dst_vdev)
@@ -934,11 +941,26 @@ static inline void __attribute__((always_inline))
 do_drain_mbuf_table(struct mbuf_table *tx_q)
 {
 	uint16_t count;
+	struct vswitch_port *tx_port;
+
+	/* Let switch implmentation decide which physical port to do tx to.
+	 * Every switch implmentation may have it's own strategy, for example
+	 * VMDQ does tx to only one Physical port. Having a scheduler function
+	 * which is switch specefic give flexibility to have another strategy
+	 * for a switch
+	 */
+	tx_port = vs_sched_tx_port(vswitch_dev_g, VSWITCH_PTYPE_PHYS,
+				     rte_lcore_id());
+	if (unlikely(!tx_port))
+	    goto out;
 
-	count = rte_eth_tx_burst(ports[0], tx_q->txq_id,
-				 tx_q->m_table, tx_q->len);
-	if (unlikely(count < tx_q->len))
-		free_pkts(&tx_q->m_table[count], tx_q->len - count);
+	if (tx_q->len) {
+		count = tx_port->do_tx(tx_port, tx_q->txq_id, NULL, tx_q->m_table,
+					  tx_q->len);
+
+		if (unlikely(count < tx_q->len))
+			free_pkts(&tx_q->m_table[count], tx_q->len - count);
+	}
 
 	tx_q->len = 0;
 }
@@ -1060,6 +1082,28 @@ drain_eth_rx(struct vhost_dev *vdev)
 {
 	uint16_t rx_count, enqueue_count;
 	struct rte_mbuf *pkts[MAX_PKT_BURST];
+	uint16_t rxq, core_id;
+	struct vswitch_port *rx_port;
+
+
+	core_id = rte_lcore_id();
+	/* Let switch implmentation decide which physical port to do rx from.
+	 * Every switch implmentation may have it's own strategy, for example
+	 * VMDQ does rx from only one Physical port. Having a scheduler function
+	 * which is switch specefic give flexibility to have another strategy
+	 * for a switch
+	 */
+	rx_port = vs_sched_rx_port(vswitch_dev_g, VSWITCH_PTYPE_PHYS,
+				    core_id);
+	if (unlikely(!rx_port))
+	    goto out;
+
+	/* The switch implmentation decides the rxq for each port itself.
+	 * this gives switch implmentations to choose thier own queue management
+	 * for each port
+	 */
+	rxq = rx_port->get_rxq(rx_port, vdev, core_id);
+	rx_count = rx_port->do_rx(rx_port, rxq, NULL, pkts, MAX_PKT_BURST);
 
 	rx_count = rte_eth_rx_burst(ports[0], vdev->vmdq_rx_q,
 				    pkts, MAX_PKT_BURST);
@@ -1098,20 +1142,31 @@ static inline void __attribute__((always_inline))
 drain_virtio_tx(struct vhost_dev *vdev)
 {
 	struct rte_mbuf *pkts[MAX_PKT_BURST];
-	uint16_t count;
-	uint16_t i;
+	struct vswitch_port *vs_port = vdev->vs_port;
+	uint16_t count, rxq, core_id;
+
+	/* The switch implmentation decides the rxq for each port itself.
+	 * this gives switch implmentations to choose thier own queue management
+	 * for each port
+	 */
 
-	count = rte_vhost_dequeue_burst(vdev->vid, VIRTIO_TXQ, mbuf_pool,
+	core_id = rte_lcore_id();
+	rxq = vs_port->get_rxq(vs_port, NULL, core_id);
+	count = rte_vhost_dequeue_burst(vdev->vid, rxq, mbuf_pool,
 					pkts, MAX_PKT_BURST);
 
-	/* setup VMDq for the first packet */
-	if (unlikely(vdev->ready == DEVICE_MAC_LEARNING) && count) {
-		if (vdev->remove || link_vmdq(vdev, pkts[0]) == -1)
+	if (unlikely(!count))
+		goto out;
+
+	if (unlikely(vdev->ready == DEVICE_MAC_LEARNING)) {
+		if (vdev->remove || vs_learn_port(vs_port, pkts, count))
 			free_pkts(pkts, count);
 	}
 
-	for (i = 0; i < count; ++i)
-		virtio_tx_route(vdev, pkts[i], vlan_tags[vdev->vid]);
+	vs_lookup_n_fwd(vs_port, pkts, count, VIRTIO_RXQ);
+
+out:
+	return;
 }
 
 /*
@@ -1241,7 +1296,7 @@ static int
 new_device(int vid)
 {
 	int lcore, core_add = 0;
-	uint32_t device_num_min = num_devices;
+	uint32_t device_num_min;
 	struct vhost_dev *vdev;
 
 	vdev = rte_zmalloc("vhost device", sizeof(*vdev), RTE_CACHE_LINE_SIZE);
@@ -1252,6 +1307,16 @@ new_device(int vid)
 		return -1;
 	}
 	vdev->vid = vid;
+	device_num_min = vs_get_max_vdevs(vswitch_dev_g);
+	RTE_LOG(INFO, VHOST_PORT, "max virtio devices %d\n", device_num_min);
+
+	vs_port = vs_add_port(vswitch_dev_g, vid, VSWITCH_PTYPE_VIRTIO, vdev);
+
+	if (!vs_port) {
+		rte_exit(EXIT_FAILURE, "Failed to add port [%d] to vsdev %s\n",
+			 vs_port->port_id, vswitch_dev_g->name);
+	}
+	vdev->vs_port = vs_port;
 
 	TAILQ_INSERT_TAIL(&vhost_dev_list, vdev, global_vdev_entry);
 	vdev->vmdq_rx_q = vid * queues_per_pool + vmdq_queue_base;
@@ -1277,6 +1342,10 @@ new_device(int vid)
 	rte_vhost_enable_guest_notification(vid, VIRTIO_RXQ, 0);
 	rte_vhost_enable_guest_notification(vid, VIRTIO_TXQ, 0);
 
+	RTE_LOG(INFO, VHOST_PORT, "New virtio port %d\n", vid);
+
+	vs_port_start(vs_port);
+
 	RTE_LOG(INFO, VHOST_DATA,
 		"(%d) device has been added to data core %d\n",
 		vid, vdev->coreid);
@@ -1402,6 +1471,31 @@ create_mbuf_pool(uint16_t nr_port, uint32_t nr_switch_core, uint32_t mbuf_size,
 		rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n");
 }
 
+static uint64_t get_vswitch_conf_flags(void)
+{
+	uint64_t vs_conf_flags = 0;
+
+	if (vm2vm_mode == VM2VM_HARDWARE)
+		vs_conf_flags |= VS_CNF_FLG_VM2VM_HARDWARE;
+
+	if (vm2vm_mode == VM2VM_SOFTWARE)
+		vs_conf_flags |= VS_CNF_FLG_VM2VM_SOFTWARE;
+
+	if (promiscuous)
+		vs_conf_flags |= VS_CNF_FLG_PROMISCOUS_EN;
+
+	if (jumbo_frame_en)
+		vs_conf_flags |= VS_CNF_FLG_JUMBO_EN;
+
+	if (enable_stats)
+		vs_conf_flags |= VS_CNF_FLG_STATS_EN;
+
+	if (vlan_strip)
+		vs_conf_flags |= VS_CNF_FLG_VLAN_STRIP_EN;
+
+	return vs_conf_flags;
+}
+
 /*
  * Main function, does initialisation and calls the per-lcore functions. The CUSE
  * device is also registered here to handle the IOCTLs.
@@ -1484,6 +1578,7 @@ main(int argc, char *argv[])
 				"Cannot initialize network ports\n");
 	}
 
+	RTE_LOG(INFO, VHOST_CONFIG, "PORT init done \n");
 	/* Enable stats if the user option is set. */
 	if (enable_stats) {
 		ret = pthread_create(&tid, NULL, (void *)print_stats, NULL);
@@ -1499,6 +1594,7 @@ main(int argc, char *argv[])
 				"Cannot set print-stats name\n");
 	}
 
+	RTE_LOG(INFO, VHOST_CONFIG, "Launching switch_worker threads \n");
 	/* Launch all data cores. */
 	RTE_LCORE_FOREACH_SLAVE(lcore_id)
 		rte_eal_remote_launch(switch_worker, NULL, lcore_id);
diff --git a/examples/vhost/vswitch_common.c b/examples/vhost/vswitch_common.c
new file mode 100644
index 0000000..9f9ab87
--- /dev/null
+++ b/examples/vhost/vswitch_common.c
@@ -0,0 +1,499 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Freescale Semiconductor. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Freescale Semiconductor nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+#include <rte_atomic.h>
+#include <rte_cycles.h>
+#include <rte_ethdev.h>
+#include <rte_log.h>
+#include <rte_string_fns.h>
+#include <rte_malloc.h>
+#include <rte_virtio_net.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+
+#include <sys/queue.h>
+#include <strings.h>
+#include <errno.h>
+
+#include "vswitch_common.h"
+#include "vswitch_txrx.h"
+
+/* Meta data for vswitch. Since this is going to be used in this file only it
+ * is defined here instead of in a header file.
+ */
+struct vs_mdata {
+	rte_spinlock_t lock;
+	int dev_count;
+	LIST_HEAD(, vswitch_dev) head;
+};
+
+static struct vs_mdata *vs_mdata_g;
+
+struct vswitch_dev *vs_get_vswitch_dev(const char *name)
+{
+	struct vswitch_dev *temp, *vs_dev = NULL;
+
+	rte_spinlock_lock(&vs_mdata_g->lock);
+	LIST_FOREACH(temp, &vs_mdata_g->head, list) {
+		if (!strncmp(temp->name, name, VSWITCH_NAME_SIZE))
+			vs_dev = temp;
+	}
+	rte_spinlock_unlock(&vs_mdata_g->lock);
+
+	return vs_dev;
+}
+
+static struct vswitch_port *vs_get_free_port(struct vswitch_dev *vs_dev)
+{
+	int i, found = 0;
+	struct vswitch_port *vs_port = NULL;
+
+	rte_spinlock_lock(&vs_dev->lock);
+
+	for (i = 0; i < vs_dev->port_count; i++) {
+		vs_port = &vs_dev->ports[i];
+		if (vs_port->state == VSWITCH_PSTATE_NOT_INUSE) {
+			found = 1;
+			break;
+		}
+	}
+
+	if (found)
+		vs_port->state = VSWITCH_PSTATE_BEING_ALLOCATED;
+	else
+		vs_port = NULL;
+
+	rte_spinlock_unlock(&vs_dev->lock);
+	return vs_port;
+}
+
+static void vs_free_port(struct vswitch_port *vs_port)
+{
+	struct vswitch_dev *vs_dev = vs_port->vs_dev;
+
+	rte_spinlock_lock(&vs_dev->lock);
+	vs_port->state = VSWITCH_PSTATE_NOT_INUSE;
+	rte_spinlock_unlock(&vs_dev->lock);
+}
+
+static __attribute__((unused))struct vswitch_port *vs_get_port(
+			struct vswitch_dev *vs_dev, unsigned int port_id,
+			enum vswitch_port_type type)
+{
+	int i;
+	struct vswitch_port *vs_port = NULL;
+
+	for (i = 0; i < vs_dev->port_count; i++) {
+		vs_port = &vs_dev->ports[i];
+		if ((vs_port->port_id == port_id) && (vs_port->type == type))
+			return vs_port;
+	}
+
+	return NULL;
+}
+
+int vs_port_start(struct vswitch_port *vs_port)
+{
+	struct vswitch_ops *vs_ops = vs_port->vs_dev->ops;
+	int rc = 0;
+
+	if (vs_ops->port_start)
+		rc = vs_ops->port_start(vs_port);
+
+	if (!rc)
+		vs_port->state = VSWITCH_PSTATE_LEARNING;
+
+	return rc;
+}
+
+int vs_port_stop(struct vswitch_port *vs_port)
+{
+	struct vswitch_ops *vs_ops = vs_port->vs_dev->ops;
+	int rc = 0;
+
+	if (vs_ops->port_stop)
+		rc = vs_ops->port_stop(vs_port);
+
+	if (!rc)
+		vs_port->state = VSWITCH_PSTATE_ADDED;
+
+	return rc;
+}
+
+struct vswitch_port *vs_add_port(struct vswitch_dev *vs_dev, int port_id,
+		enum vswitch_port_type type, void *priv)
+{
+	int rc = 0;
+	struct vswitch_port *vs_port = NULL;
+	struct vswitch_ops *vs_ops = vs_dev->ops;
+
+	vs_port = vs_get_free_port(vs_dev);
+	if (!vs_port) {
+		RTE_LOG(DEBUG, VHOST_CONFIG, "Failed get free port in \
+			vswitch %s\n", vs_dev->name);
+		rc = -EBUSY;
+		goto out;
+	}
+
+	vs_port->port_id = port_id;
+	vs_port->type = type;
+	vs_port->priv = priv;
+
+	/* Initialize default port operations. It should be noted that
+	 * The switch ops->add_port can replace them with switch specefic
+	 * operations if required. This gives us more flexibility in switch
+	 * implementations.
+	 */
+
+	switch (type) {
+	case VSWITCH_PTYPE_PHYS:
+	       vs_port->do_tx = vs_do_tx_phys_port;
+	       vs_port->do_rx = vs_do_rx_phys_port;
+	       vs_port->get_txq = vs_get_txq_phys_port;
+	       vs_port->get_rxq = vs_get_rxq_phys_port;
+	       break;
+	case VSWITCH_PTYPE_VIRTIO:
+	       vs_port->do_tx = vs_do_tx_virtio_port;
+	       vs_port->do_rx = vs_do_rx_virtio_port;
+	       vs_port->get_txq = vs_get_txq_virtio_port;
+	       vs_port->get_rxq = vs_get_rxq_virtio_port;
+	       break;
+	default:
+		RTE_LOG(DEBUG, VHOST_CONFIG, "Invalid port [id %d, type %d]",
+				port_id, type);
+	       rc = -EINVAL;
+	       goto out;
+	}
+
+	if (vs_ops->add_port)
+		rc = vs_ops->add_port(vs_port);
+
+	if (rc)
+		goto out;
+
+	vs_port->state = VSWITCH_PSTATE_ADDED;
+
+	rte_eth_macaddr_get(vs_port->port_id, &vs_port->mac_addr);
+	RTE_LOG(INFO, VHOST_PORT, "Port %u MAC: %02"PRIx8" %02"PRIx8" %02"PRIx8
+			" %02"PRIx8" %02"PRIx8" %02"PRIx8"\n",
+			(unsigned)port_id,
+			vs_port->mac_addr.addr_bytes[0],
+			vs_port->mac_addr.addr_bytes[1],
+			vs_port->mac_addr.addr_bytes[2],
+			vs_port->mac_addr.addr_bytes[3],
+			vs_port->mac_addr.addr_bytes[4],
+			vs_port->mac_addr.addr_bytes[5]);
+
+	RTE_LOG(DEBUG, VHOST_CONFIG, "Added port [%d, type %d] to \
+			vswitch %s\n", vs_port->port_id, type, vs_dev->name);
+out:
+	if (rc){
+		RTE_LOG(INFO, VHOST_CONFIG, "Failed to Add port [%d, type %d] to \
+			vswitch %s\n", port_id, type, vs_dev->name);
+		if (vs_port)
+			vs_free_port(vs_port);
+		vs_port = NULL;
+	}
+
+	return vs_port;
+}
+
+int vs_del_port(struct vswitch_port *vs_port)
+{
+	int rc = 0;
+	struct vswitch_ops *vs_ops = vs_port->vs_dev->ops;
+
+	if (vs_ops->del_port)
+		rc = vs_ops->del_port(vs_port);
+
+	if (!rc)
+		vs_port->state = VSWITCH_PSTATE_NOT_INUSE;
+
+
+	/*TBD:XXX: may be put a dummy function which frees all packets without
+	 * any tx/rx, NULL looks ugly!
+	 */
+	vs_port->do_tx = vs_port->do_rx = NULL;
+
+	RTE_LOG(DEBUG, VHOST_CONFIG, "Removed port [%d, type %d] from \
+			vswitch %s\n", vs_port->port_id, vs_port->type,
+			vs_port->vs_dev->name);
+
+	return rc;
+}
+
+int vs_learn_port(struct vswitch_port *vs_port, struct rte_mbuf
+				**pkts, uint16_t count)
+{
+	struct vswitch_ops *vs_ops = vs_port->vs_dev->ops;
+	int rc;
+
+	rc = vs_ops->learn_port(vs_port, pkts, count);
+	if (!rc)
+		vs_port->state = VSWITCH_PSTATE_FORWARDING;
+
+	return rc;
+}
+
+int vs_unlearn_port(struct vswitch_port *vs_port)
+{
+	struct vswitch_ops *vs_ops = vs_port->vs_dev->ops;
+
+	vs_ops->unlearn_port(vs_port);
+	vs_port->state = VSWITCH_PSTATE_LEARNING;
+
+	return 0;
+}
+
+
+int vs_lookup_n_fwd(struct vswitch_port *vs_port, struct rte_mbuf **pkts,
+		    uint16_t count, uint16_t in_rxq)
+{
+	int rc, i;
+	struct vswitch_ops *vs_ops = vs_port->vs_dev->ops;
+
+	rc = vs_ops->lookup_n_fwd(vs_port, pkts, count, in_rxq);
+
+	if (rc) {
+		for (i = 0; i < count; i++)
+			rte_pktmbuf_free(pkts[i]);
+	}
+
+	return rc;
+}
+
+int vs_do_broadcast_fwd(struct vswitch_dev *vs_dev,
+			struct vswitch_port *in_port, uint32_t txport_type_mask,
+			struct rte_mbuf *mbuf)
+{
+	int i = 0, tx_q;
+	struct vswitch_port *dest_port;
+
+	for (i = 0; i < vs_dev->port_count; i++) {
+		dest_port = &vs_dev->ports[i];
+		/*Do not broadcast on incomming port */
+		if ((in_port->type == dest_port->type) &&
+			(in_port->port_id == dest_port->port_id))
+			continue;
+		if (!(VS_PTYPE_MASK(dest_port->type) & txport_type_mask))
+			continue;
+
+		tx_q = dest_port->get_txq(dest_port, rte_lcore_id());
+		dest_port->do_tx(dest_port, tx_q, NULL, &mbuf, 1);
+	}
+
+	return 0;
+}
+
+struct vswitch_port *vs_sched_rx_port(struct vswitch_dev *vs_dev, enum
+				      vswitch_port_type ptype, uint16_t core_id)
+{
+	struct vswitch_port *vs_port = NULL;
+	struct vswitch_ops *vs_ops = vs_dev->ops;
+
+	if (vs_ops->sched_rx_port)
+		vs_port = vs_ops->sched_rx_port(vs_dev, ptype, core_id);
+
+	return vs_port;
+}
+
+struct vswitch_port *vs_sched_tx_port(struct vswitch_dev *vs_dev, enum
+				      vswitch_port_type ptype, uint16_t core_id)
+{
+	struct vswitch_port *vs_port = NULL;
+	struct vswitch_ops *vs_ops = vs_dev->ops;
+
+	if (vs_ops->sched_tx_port)
+		vs_port = vs_ops->sched_tx_port(vs_dev, ptype, core_id);
+
+	return vs_port;
+}
+
+static int vs_check_mandatory_ops(struct vswitch_ops *ops)
+{
+	int rc = 0;
+
+	if (!ops->lookup_n_fwd){
+		RTE_LOG(ERR, VHOST_CONFIG, "lookup_n_fwd is NULL");
+		rc = -EINVAL;
+	}
+	if (!ops->learn_port){
+		RTE_LOG(ERR, VHOST_CONFIG, "learn_port is NULL");
+		rc = -EINVAL;
+	}
+
+	if (!ops->unlearn_port){
+		RTE_LOG(ERR, VHOST_CONFIG, "unlearn_port is NULL");
+		rc = -EINVAL;
+	}
+
+
+	return rc;
+}
+
+struct vswitch_dev *vs_register_switch(const char *name, int priv_size,
+				int max_ports, struct vswitch_ops *ops)
+{
+	struct vswitch_dev *vs_dev;
+	struct vswitch_port *vs_port;
+	int size, i;
+
+	if (vs_check_mandatory_ops(ops)) {
+		RTE_LOG(ERR, VHOST_CONFIG, "%s: mandatory ops missing\n",
+			name);
+		return NULL;
+	}
+
+	size = priv_size + sizeof(struct vswitch_dev);
+	vs_dev = rte_malloc(NULL, size, 64);
+	if (!vs_dev) {
+		RTE_LOG(DEBUG, VHOST_CONFIG, "Failed to vswitch device\n");
+		goto out;
+	}
+
+	strncpy(vs_dev->name, name, VSWITCH_NAME_SIZE);
+	vs_dev->priv = (void *) (vs_dev + 1);
+
+	size = max_ports * sizeof(struct vswitch_port);
+	vs_dev->ports = rte_malloc(NULL, size, 64);
+	if (!vs_dev->ports) {
+		RTE_LOG(DEBUG, VHOST_CONFIG,
+			"Failed allocate %d ports for vswitch %s\n",
+			max_ports, vs_dev->name);
+		goto out;
+	}
+
+	memset(vs_dev->ports, 0, size);
+	for (i = 0; i < max_ports; i++)
+	{
+		vs_port = &vs_dev->ports[i];
+		vs_port->state = VSWITCH_PSTATE_NOT_INUSE;
+		vs_port->vs_dev = vs_dev;
+	}
+
+	vs_dev->port_count = max_ports;
+	vs_dev->ops = ops;
+	rte_spinlock_init(&vs_dev->lock);
+
+	rte_spinlock_lock(&vs_mdata_g->lock);
+	LIST_INSERT_HEAD(&vs_mdata_g->head, vs_dev, list);
+	vs_mdata_g->dev_count++;
+	rte_spinlock_unlock(&vs_mdata_g->lock);
+
+	RTE_LOG(INFO, VHOST_CONFIG, "Added vswitch %s [0x%p], max_ports %d\n",
+		vs_dev->name, vs_dev, vs_dev->port_count);
+
+	return vs_dev;
+
+out:
+	if (vs_dev && vs_dev->ports)
+		rte_free(vs_dev->ports);
+	if (vs_dev)
+		rte_free(vs_dev);
+
+	return NULL;
+}
+
+int  vs_unregister_switch(struct vswitch_dev *vs_dev)
+{
+	struct vswitch_dev *temp;
+	int removed = 0, rc;
+
+	rte_spinlock_lock(&vs_mdata_g->lock);
+	LIST_FOREACH(temp, &vs_mdata_g->head, list) {
+		if (!strncmp(temp->name, vs_dev->name, VSWITCH_NAME_SIZE)){
+			LIST_REMOVE(temp, list);
+			removed = 1;
+		}
+	}
+	rte_spinlock_unlock(&vs_mdata_g->lock);
+
+	if (!removed) {
+		RTE_LOG(DEBUG, VHOST_CONFIG, "vswitch [%s] not found\n",
+			vs_dev->name);
+		rc = -ENODEV;
+	}
+
+	RTE_LOG(DEBUG, VHOST_CONFIG, "Unregistering and freeing vswitch [%s]\n",
+		vs_dev->name);
+
+	if (vs_dev->ports)
+		rte_free(vs_dev->ports);
+
+	rte_free(vs_dev);
+
+	return rc;
+}
+
+int  vs_get_max_vdevs(struct vswitch_dev *vs_dev)
+{
+	struct vswitch_ops *ops = vs_dev->ops;
+
+	return ops->get_max_vdevs(vs_dev);
+}
+
+int vs_switch_dev_init(struct vswitch_dev *vs_dev, uint16_t conf_flags)
+{
+	struct vswitch_ops *ops = vs_dev->ops;
+	int rc = 0;
+
+	if (ops->switch_init)
+		rc = ops->switch_init(vs_dev, conf_flags);
+
+	if (!rc)
+		vs_dev->conf_flags = conf_flags;
+
+	return rc;
+}
+
+int vs_vswitch_init(void)
+{
+	int rc = 0;
+
+	vs_mdata_g = rte_malloc(NULL, sizeof(struct vs_mdata), 64);
+	if (!vs_mdata_g)
+	{
+		RTE_LOG(ERR, VHOST_CONFIG,
+				"vhost vswitch mdata allocation failed\n");
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	memset(vs_mdata_g, 0, sizeof(struct vs_mdata));
+	LIST_INIT(&vs_mdata_g->head);
+	rte_spinlock_init(&vs_mdata_g->lock);
+
+out:
+	return rc;
+}
diff --git a/examples/vhost/vswitch_common.h b/examples/vhost/vswitch_common.h
new file mode 100644
index 0000000..a5800df
--- /dev/null
+++ b/examples/vhost/vswitch_common.h
@@ -0,0 +1,186 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Freescale Semiconductor. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Freescale Semiconductor nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __VHOST_SWITCH_COMMON_H__
+#define __VHOST_SWITCH_COMMON_H__
+
+#include <sys/queue.h>
+#include "main.h"
+
+#define VSWITCH_NAME_SIZE	32
+
+enum vswitch_port_type {
+	VSWITCH_PTYPE_PHYS = 1,
+	VSWITCH_PTYPE_VIRTIO,
+	VSWITCH_PTYPE_END,
+};
+
+#define VS_PTYPE_MASK(type) (1 << type)
+
+enum vswitch_port_state {
+	VSWITCH_PSTATE_ADDED = 1,
+	VSWITCH_PSTATE_LEARNING,
+	VSWITCH_PSTATE_FORWARDING,
+	VSWITCH_PSTATE_NOT_INUSE,
+	VSWITCH_PSTATE_BEING_ALLOCATED,
+	VSWITCH_PSTATE_END,
+};
+
+struct vswitch_port {
+	enum vswitch_port_type type;
+	enum vswitch_port_state state;
+	unsigned int port_id;
+	struct rte_eth_conf port_conf;
+	struct vswitch_dev *vs_dev;	/*switch to which this port belongs */
+	void *priv;			/*Private for port specefic data*/
+	struct ether_addr mac_addr;
+	int phys_port_rxq;
+	uint16_t (*do_tx) (struct vswitch_port *port, uint16_t tx_q,
+			   __attribute__((unused))struct rte_mempool *mbuf_pool,
+			   struct rte_mbuf **tx_pkts,	uint16_t pkt_count);
+	uint16_t (*do_rx) (struct vswitch_port *port, uint16_t rx_q,
+			   __attribute__((unused))struct rte_mempool *mbuf_pool,
+			   struct rte_mbuf **rx_pkts,	uint16_t pkt_count);
+	uint16_t (*get_rxq) (struct vswitch_port *port, struct vhost_dev *vdev,
+				uint16_t core_id);
+	uint16_t (*get_txq) (struct vswitch_port *port, uint16_t core_id);
+};
+
+struct vswitch_dev {
+	char name[VSWITCH_NAME_SIZE];
+	void *priv;				/* Private for switch specefic
+						 * data */
+	uint64_t conf_flags;
+	rte_spinlock_t lock;
+	LIST_ENTRY (vswitch_dev) list;
+	struct vswitch_port *ports;
+	int port_count;
+	struct vswitch_ops *ops;
+};
+
+/* Function typedefs */
+typedef int (*vs_switch_init_t) (struct vswitch_dev *, uint64_t conf_flags);
+typedef int (*vs_add_port_t) (struct vswitch_port *vs_port);
+typedef int (*vs_del_port_t) (struct vswitch_port *port);
+typedef int (*vs_learn_port_t) (struct vswitch_port *port, struct rte_mbuf
+				**pkts, uint16_t count);
+typedef int (*vs_unlearn_port_t) (struct vswitch_port *port);
+typedef int (*vs_lookup_n_fwd_t)(struct vswitch_port *in_port, struct rte_mbuf
+			      **pkts, uint16_t pkt_count, uint16_t rxq);
+#if 0
+typedef uint16_t (*vs_do_tx_t) (struct vswitch_port *port, uint16_t tx_q, struct
+			   rte_mbuf **tx_pkts,	uint16_t pkt_count);
+
+typedef uint16_t (*vs_do_rx_t) (struct vswitch_port *port, uint16_t rx_q, struct
+			   rte_mbuf **rx_pkts,	uint16_t pkt_count);
+#endif
+typedef struct vswitch_port* (*vs_sched_tx_port_t)(struct vswitch_dev *vs_dev,
+				enum vswitch_port_type ptype, uint16_t core_id);
+
+typedef struct vswitch_port* (*vs_sched_rx_port_t)(struct vswitch_dev *vs_dev,
+				enum vswitch_port_type ptype, uint16_t core_id);
+
+typedef int (*vs_port_start_t) (struct vswitch_port *port);
+typedef int (*vs_get_max_vdevs_t) (struct vswitch_dev *);
+
+struct vswitch_ops {
+	vs_add_port_t add_port;
+	vs_del_port_t del_port;
+	vs_lookup_n_fwd_t lookup_n_fwd;
+	vs_learn_port_t learn_port;
+	vs_unlearn_port_t unlearn_port;
+	vs_port_start_t port_start;
+	vs_port_start_t port_stop;
+	vs_switch_init_t switch_init;
+	vs_sched_tx_port_t sched_tx_port;
+	vs_sched_rx_port_t sched_rx_port;
+	vs_get_max_vdevs_t get_max_vdevs;
+};
+
+/*VSWITCH conf flags */
+#define VS_CNF_FLG_VM2VM_HARDWARE	(1 << 0)
+#define VS_CNF_FLG_VM2VM_SOFTWARE	(1 << 1)
+#define VS_CNF_FLG_PROMISCOUS_EN	(1 << 2)
+#define VS_CNF_FLG_JUMBO_EN		(1 << 3)
+#define VS_CNF_FLG_VLAN_STRIP_EN	(1 << 4)
+#define VS_CNF_FLG_STATS_EN		(1 << 5)
+
+/*API for vswitch implementations*/
+struct vswitch_dev *vs_register_switch(const char *name, int priv_size,
+				int max_ports, struct vswitch_ops *ops);
+int  vs_unregister_switch(struct vswitch_dev *dev);
+
+int vs_vswitch_init(void);
+
+
+/*API for user of vswitch like vhost-switch application */
+
+struct vswitch_dev *vs_get_vswitch_dev(const char *name);
+struct vswitch_port *vs_add_port(struct vswitch_dev *vs_dev, int port_id,
+		enum vswitch_port_type type, void *priv);
+
+int vs_del_port(struct vswitch_port *port);
+
+int vs_switch_dev_init(struct vswitch_dev *vs_dev, uint16_t conf_flags);
+
+int vs_port_start(struct vswitch_port *vs_port);
+int vs_port_stop(struct vswitch_port *vs_port);
+int vs_learn_port(struct vswitch_port *port, struct rte_mbuf
+				**pkts, uint16_t count);
+int vs_unlearn_port(struct vswitch_port *port);
+int vs_lookup_n_fwd(struct vswitch_port *in_port, struct rte_mbuf
+			      **pkts, uint16_t count, uint16_t in_rxq);
+
+int vs_do_broadcast_fwd(struct vswitch_dev *vs_dev,
+		struct vswitch_port *in_port, uint32_t txport_type_mask,
+		struct rte_mbuf *mbuf);
+
+struct vswitch_port *vs_sched_tx_port(struct vswitch_dev *vs_dev, enum
+				vswitch_port_type ptype, uint16_t core_id);
+
+struct vswitch_port *vs_sched_rx_port(struct vswitch_dev *vs_dev, enum
+				vswitch_port_type ptype, uint16_t core_id);
+int vs_get_max_vdevs(struct vswitch_dev *vs_dev);
+
+/*Extern APIs from vhost/main.c */
+
+extern struct mbuf_table *vhost_switch_get_txq(uint16_t core_id);
+extern int virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m);
+extern struct vhost_dev *find_vhost_dev(struct ether_addr *mac);
+void do_drain_mbuf_table(struct mbuf_table *tx_q);
+
+/* TBD:XXX: This needs to be removed here, when constructor mechanism
+ * for registering swittches is in place
+ */
+extern void vmdq_switch_impl_init(void);
+#endif
diff --git a/examples/vhost/vswitch_txrx.c b/examples/vhost/vswitch_txrx.c
new file mode 100644
index 0000000..23b38b9
--- /dev/null
+++ b/examples/vhost/vswitch_txrx.c
@@ -0,0 +1,97 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Freescale Semiconductor. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Freescale Semiconductor nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+#include <rte_ethdev.h>
+#include <rte_log.h>
+#include <rte_virtio_net.h>
+
+#include "vswitch_common.h"
+#include "vswitch_txrx.h"
+
+uint16_t vs_do_tx_phys_port(struct vswitch_port *port, uint16_t tx_q,
+			__attribute__((unused))struct rte_mempool *mbuf_pool,
+				   struct rte_mbuf **pkts, uint16_t pkt_count)
+{
+	return rte_eth_tx_burst(port->port_id, tx_q, pkts, pkt_count);
+}
+
+
+uint16_t vs_do_tx_virtio_port(struct vswitch_port *port, uint16_t tx_q,
+			__attribute__((unused)) struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t pkt_count)
+{
+	return rte_vhost_enqueue_burst(port->port_id, tx_q, pkts, pkt_count);
+}
+
+uint16_t vs_do_rx_phys_port (struct vswitch_port *port, uint16_t rx_q,
+			__attribute__((unused))struct rte_mempool *mbuf_pool,
+				 struct rte_mbuf **rx_pkts, uint16_t pkt_count)
+{
+	return rte_eth_rx_burst(port->port_id, rx_q, rx_pkts, pkt_count);
+}
+
+uint16_t vs_do_rx_virtio_port (struct vswitch_port *port, uint16_t rx_q,
+				struct rte_mempool *mbuf_pool,
+				struct rte_mbuf **rx_pkts, uint16_t pkt_count)
+{
+	return rte_vhost_dequeue_burst(port->port_id, rx_q, mbuf_pool,
+				       rx_pkts, pkt_count);
+}
+
+uint16_t vs_get_rxq_phys_port(__attribute__((unused)) struct vswitch_port *port,
+	struct vhost_dev *vdev,	__attribute__((unused))uint16_t core_id)
+{
+	struct vswitch_port *vdev_vs_port = vdev->vs_port;
+
+	return vdev_vs_port->phys_port_rxq;
+}
+
+uint16_t vs_get_txq_phys_port(__attribute__((unused)) struct vswitch_port *port,
+	uint16_t core_id)
+{
+	return core_id;
+}
+
+uint16_t vs_get_rxq_virtio_port(__attribute__((unused)) struct vswitch_port *port,
+	__attribute__((unused))struct vhost_dev *vdev,
+	__attribute__((unused)) uint16_t core_id)
+{
+	return VIRTIO_TXQ;
+}
+
+uint16_t vs_get_txq_virtio_port(__attribute__((unused))struct vswitch_port *port,
+	__attribute__((unused)) uint16_t core_id)
+{
+	return VIRTIO_RXQ;
+}
diff --git a/examples/vhost/vswitch_txrx.h b/examples/vhost/vswitch_txrx.h
new file mode 100644
index 0000000..39d2dab
--- /dev/null
+++ b/examples/vhost/vswitch_txrx.h
@@ -0,0 +1,71 @@
+
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Freescale Semiconductor. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Freescale Semiconductor nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#ifndef __VSWTICH_TXRX_H__
+#define __VSWTICH_TXRX_H__
+
+#include <unistd.h>
+#include <rte_ethdev.h>
+
+#include "vswitch_common.h"
+#include "vswitch_txrx.h"
+
+uint16_t vs_do_tx_phys_port(struct vswitch_port *port, uint16_t tx_q,
+			__attribute__((unused))struct rte_mempool *mbuf_pool,
+				   struct rte_mbuf **pkts, uint16_t pkt_count);
+
+
+uint16_t vs_do_tx_virtio_port(struct vswitch_port *port, uint16_t tx_q,
+			__attribute__((unused)) struct rte_mempool *mbuf_pool,
+				struct rte_mbuf **pkts, uint16_t pkt_count);
+
+uint16_t vs_do_rx_phys_port (struct vswitch_port *port, uint16_t rx_q,
+			__attribute__((unused))struct rte_mempool *mbuf_pool,
+				 struct rte_mbuf **rx_pkts, uint16_t pkt_count);
+
+uint16_t vs_do_rx_virtio_port (struct vswitch_port *port, uint16_t rx_q,
+				struct rte_mempool *mbuf_pool,
+				struct rte_mbuf **rx_pkts, uint16_t pkt_count);
+
+uint16_t vs_get_rxq_phys_port(struct vswitch_port *port,
+	struct vhost_dev *vdev,	uint16_t core_id);
+
+uint16_t vs_get_txq_phys_port(struct vswitch_port *port,
+	uint16_t core_id);
+
+uint16_t vs_get_rxq_virtio_port(struct vswitch_port *port,
+	struct vhost_dev *vdev,	uint16_t core_id);
+
+uint16_t vs_get_txq_virtio_port(struct vswitch_port *port,
+	uint16_t core_id);
+#endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC][PATCH V2 2/3] examples/vhost: Add vswitch command line options
  2016-09-04 10:23 [RFC][PATCH V2 0/3] example/vhost: Introduce Vswitch Framework Pankaj Chauhan
  2016-09-04 10:23 ` [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework Pankaj Chauhan
@ 2016-09-04 10:23 ` Pankaj Chauhan
  2016-09-13 12:14   ` Yuanhan Liu
  2016-09-04 10:24 ` [RFC][PATCH V2 3/3] examples/vhost: Add VMDQ vswitch device Pankaj Chauhan
  2 siblings, 1 reply; 22+ messages in thread
From: Pankaj Chauhan @ 2016-09-04 10:23 UTC (permalink / raw)
  To: dev
  Cc: hemant.agrawal, jianfeng.tan, yuanhan.liu, maxime.coquelin,
	Pankaj Chauhan

Add command line options for selecting switch implementation
and maximum ports for the vswitch.following are two new command
line options:

--switch <switch_name> [char string, Selects the switch imlementation]
--max-ports<port_count> [int, selects maximum number of ports to support]

For example:

$ ./vhost-switch -c 3 -n 2 --socket-mem 1024 --huge-dir /hugetlbfs -- -p
0x1 --dev-basename sock1 --switch "vmdq" --max-ports 3

Signed-off-by: Pankaj Chauhan <pankaj.chauhan@nxp.com>
---
 examples/vhost/main.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index c949df4..a4e51ae 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -142,6 +142,10 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
 /* Character device basename. Can be set by user. */
 static char dev_basename[MAX_BASENAME_SZ] = "vhost-net";
 
+/* vswitch device name and maximum number of ports */
+static char switch_dev[MAX_BASENAME_SZ] = "vmdq";
+static uint32_t switch_max_ports = MAX_DEVICES;
+
 /* empty vmdq configuration structure. Filled in programatically */
 static struct rte_eth_conf vmdq_conf_default = {
 	.rxmode = {
@@ -408,6 +412,22 @@ us_vhost_parse_basename(const char *q_arg)
 }
 
 /*
+ * Set switch device name.
+ */
+static int
+us_vhost_parse_switch_name(const char *q_arg)
+{
+	/* parse number string */
+
+	if (strnlen(q_arg, MAX_BASENAME_SZ) > MAX_BASENAME_SZ)
+		return -1;
+
+	snprintf(&switch_dev[0], MAX_BASENAME_SZ, "%s", q_arg);
+
+	return 0;
+}
+
+/*
  * Parse the portmask provided at run time.
  */
 static int
@@ -501,6 +521,8 @@ us_vhost_parse_args(int argc, char **argv)
 		{"tx-csum", required_argument, NULL, 0},
 		{"tso", required_argument, NULL, 0},
 		{"client", no_argument, &client_mode, 1},
+		{"switch", required_argument, NULL, 0},
+		{"max-ports", required_argument, NULL, 0},
 		{NULL, 0, 0, 0},
 	};
 
@@ -655,6 +677,27 @@ us_vhost_parse_args(int argc, char **argv)
 				}
 			}
 
+			/* Set vswitch_driver name */
+			if (!strncmp(long_option[option_index].name, "switch", MAX_LONG_OPT_SZ)) {
+				if (us_vhost_parse_switch_name(optarg) == -1) {
+					RTE_LOG(INFO, VHOST_CONFIG,
+						"Invalid argument for switch (Max len %d )\n", MAX_BASENAME_SZ);
+					us_vhost_usage(prgname);
+					return -EINVAL;
+				}
+			}
+
+			/* Specify Max ports in vswitch. */
+			if (!strncmp(long_option[option_index].name, "max-ports", MAX_LONG_OPT_SZ)) {
+				ret = parse_num_opt(optarg, INT32_MAX);
+				if (ret == -1) {
+					RTE_LOG(INFO, VHOST_CONFIG, "Invalid argument for switch max ports [0-N]\n");
+					us_vhost_usage(prgname);
+					return -1;
+				}
+				switch_max_ports = ret;
+			}
+
 			break;
 
 			/* Invalid option - print options. */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC][PATCH V2 3/3] examples/vhost: Add VMDQ vswitch device
  2016-09-04 10:23 [RFC][PATCH V2 0/3] example/vhost: Introduce Vswitch Framework Pankaj Chauhan
  2016-09-04 10:23 ` [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework Pankaj Chauhan
  2016-09-04 10:23 ` [RFC][PATCH V2 2/3] examples/vhost: Add vswitch command line options Pankaj Chauhan
@ 2016-09-04 10:24 ` Pankaj Chauhan
  2 siblings, 0 replies; 22+ messages in thread
From: Pankaj Chauhan @ 2016-09-04 10:24 UTC (permalink / raw)
  To: dev
  Cc: hemant.agrawal, jianfeng.tan, yuanhan.liu, maxime.coquelin,
	Pankaj Chauhan

Add support for VMDQ vswitch device. This patch takes
out all VMDQ specefic code from vhost/main.[c,h] and
move it to vmdq.[c,h]. Moreover vmdq.[c,h] files plug
the VMDQ vswitch device implmentation to the vhost-switch
using vswitch framework.

The main vhost/main.[c,h] code is now generic and can support
any switch implementation that conforms with vswitch framework.

Please note that the core VMDQ logic remains as it is, as it
was in vhost/main.c, this patch just moves it to different
file and fits into ops provided by framework.

Signed-off-by: Pankaj Chauhan <pankaj.chauhan@nxp.com>
---
 examples/vhost/main.c | 486 +++++-------------------------------
 examples/vhost/main.h |  10 +
 examples/vhost/vmdq.c | 669 ++++++++++++++++++++++++++++++++++++++++++++++++++
 examples/vhost/vmdq.h |  57 +++++
 4 files changed, 794 insertions(+), 428 deletions(-)
 create mode 100644 examples/vhost/vmdq.c
 create mode 100644 examples/vhost/vmdq.h

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index a4e51ae..096339b 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -54,6 +54,7 @@
 #include <rte_tcp.h>
 
 #include "main.h"
+#include "vswitch_common.h"
 
 #ifndef MAX_QUEUES
 #define MAX_QUEUES 128
@@ -65,7 +66,6 @@
 #define MBUF_CACHE_SIZE	128
 #define MBUF_DATA_SIZE	RTE_MBUF_DEFAULT_BUF_SIZE
 
-#define MAX_PKT_BURST 32		/* Max burst size for RX/TX */
 #define BURST_TX_DRAIN_US 100	/* TX drain every ~100us */
 
 #define BURST_RX_WAIT_US 15	/* Defines how long we wait between retries on RX */
@@ -103,7 +103,6 @@ static uint32_t enabled_port_mask = 0;
 static uint32_t promiscuous;
 
 /* number of devices/queues to support*/
-static uint32_t num_queues = 0;
 static uint32_t num_devices;
 
 static struct rte_mempool *mbuf_pool;
@@ -112,6 +111,8 @@ static int mergeable;
 /* Do vlan strip on host, enabled on default */
 static uint32_t vlan_strip = 1;
 
+static uint32_t jumbo_frame_en = 0;
+
 /* Enable VM2VM communications. If this is disabled then the MAC address compare is skipped. */
 typedef enum {
 	VM2VM_DISABLED = 0,
@@ -146,74 +147,16 @@ static char dev_basename[MAX_BASENAME_SZ] = "vhost-net";
 static char switch_dev[MAX_BASENAME_SZ] = "vmdq";
 static uint32_t switch_max_ports = MAX_DEVICES;
 
-/* empty vmdq configuration structure. Filled in programatically */
-static struct rte_eth_conf vmdq_conf_default = {
-	.rxmode = {
-		.mq_mode        = ETH_MQ_RX_VMDQ_ONLY,
-		.split_hdr_size = 0,
-		.header_split   = 0, /**< Header Split disabled */
-		.hw_ip_checksum = 0, /**< IP checksum offload disabled */
-		.hw_vlan_filter = 0, /**< VLAN filtering disabled */
-		/*
-		 * It is necessary for 1G NIC such as I350,
-		 * this fixes bug of ipv4 forwarding in guest can't
-		 * forward pakets from one virtio dev to another virtio dev.
-		 */
-		.hw_vlan_strip  = 1, /**< VLAN strip enabled. */
-		.jumbo_frame    = 0, /**< Jumbo Frame Support disabled */
-		.hw_strip_crc   = 0, /**< CRC stripped by hardware */
-	},
-
-	.txmode = {
-		.mq_mode = ETH_MQ_TX_NONE,
-	},
-	.rx_adv_conf = {
-		/*
-		 * should be overridden separately in code with
-		 * appropriate values
-		 */
-		.vmdq_rx_conf = {
-			.nb_queue_pools = ETH_8_POOLS,
-			.enable_default_pool = 0,
-			.default_pool = 0,
-			.nb_pool_maps = 0,
-			.pool_map = {{0, 0},},
-		},
-	},
-};
-
+struct vswitch_dev *vswitch_dev_g;
 static unsigned lcore_ids[RTE_MAX_LCORE];
 static uint8_t ports[RTE_MAX_ETHPORTS];
 static unsigned num_ports = 0; /**< The number of ports specified in command line */
-static uint16_t num_pf_queues, num_vmdq_queues;
-static uint16_t vmdq_pool_base, vmdq_queue_base;
-static uint16_t queues_per_pool;
-
-const uint16_t vlan_tags[] = {
-	1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007,
-	1008, 1009, 1010, 1011,	1012, 1013, 1014, 1015,
-	1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023,
-	1024, 1025, 1026, 1027, 1028, 1029, 1030, 1031,
-	1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039,
-	1040, 1041, 1042, 1043, 1044, 1045, 1046, 1047,
-	1048, 1049, 1050, 1051, 1052, 1053, 1054, 1055,
-	1056, 1057, 1058, 1059, 1060, 1061, 1062, 1063,
-};
-
-/* ethernet addresses of ports */
-static struct ether_addr vmdq_ports_eth_addr[RTE_MAX_ETHPORTS];
 
 static struct vhost_dev_tailq_list vhost_dev_list =
 	TAILQ_HEAD_INITIALIZER(vhost_dev_list);
 
 static struct lcore_info lcore_info[RTE_MAX_LCORE];
 
-/* Used for queueing bursts of TX packets. */
-struct mbuf_table {
-	unsigned len;
-	unsigned txq_id;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
 
 /* TX queue for each data core. */
 struct mbuf_table lcore_tx_queue[RTE_MAX_LCORE];
@@ -223,35 +166,6 @@ struct mbuf_table lcore_tx_queue[RTE_MAX_LCORE];
 #define VLAN_HLEN       4
 
 /*
- * Builds up the correct configuration for VMDQ VLAN pool map
- * according to the pool & queue limits.
- */
-static inline int
-get_eth_conf(struct rte_eth_conf *eth_conf, uint32_t num_devices)
-{
-	struct rte_eth_vmdq_rx_conf conf;
-	struct rte_eth_vmdq_rx_conf *def_conf =
-		&vmdq_conf_default.rx_adv_conf.vmdq_rx_conf;
-	unsigned i;
-
-	memset(&conf, 0, sizeof(conf));
-	conf.nb_queue_pools = (enum rte_eth_nb_pools)num_devices;
-	conf.nb_pool_maps = num_devices;
-	conf.enable_loop_back = def_conf->enable_loop_back;
-	conf.rx_mode = def_conf->rx_mode;
-
-	for (i = 0; i < conf.nb_pool_maps; i++) {
-		conf.pool_map[i].vlan_id = vlan_tags[ i ];
-		conf.pool_map[i].pools = (1UL << i);
-	}
-
-	(void)(rte_memcpy(eth_conf, &vmdq_conf_default, sizeof(*eth_conf)));
-	(void)(rte_memcpy(&eth_conf->rx_adv_conf.vmdq_rx_conf, &conf,
-		   sizeof(eth_conf->rx_adv_conf.vmdq_rx_conf)));
-	return 0;
-}
-
-/*
  * Validate the device number according to the max pool number gotten form
  * dev_info. If the device number is invalid, give the error message and
  * return -1. Each device must have its own pool.
@@ -274,16 +188,25 @@ static inline int
 port_init(uint8_t port)
 {
 	struct rte_eth_dev_info dev_info;
-	struct rte_eth_conf port_conf;
 	struct rte_eth_rxconf *rxconf;
 	struct rte_eth_txconf *txconf;
 	int16_t rx_rings, tx_rings;
 	uint16_t rx_ring_size, tx_ring_size;
+	struct vswitch_port *vs_port;
 	int retval;
 	uint16_t q;
 
+	if (port >= rte_eth_dev_count()) return -1;
+
+	vs_port = vs_add_port(vswitch_dev_g, port, VSWITCH_PTYPE_PHYS, NULL);
+
+	if (!vs_port) {
+		rte_exit(EXIT_FAILURE, "Failed to add port [%d] to vsdev %s\n",
+			 port, vswitch_dev_g->name);
+	}
+
 	/* The max pool number from dev_info will be used to validate the pool number specified in cmd line */
-	rte_eth_dev_info_get (port, &dev_info);
+	rte_eth_dev_info_get (vs_port->port_id, &dev_info);
 
 	if (dev_info.max_rx_queues > MAX_QUEUES) {
 		rte_exit(EXIT_FAILURE,
@@ -298,33 +221,10 @@ port_init(uint8_t port)
 	/* Enable vlan offload */
 	txconf->txq_flags &= ~ETH_TXQ_FLAGS_NOVLANOFFL;
 
-	/*configure the number of supported virtio devices based on VMDQ limits */
-	num_devices = dev_info.max_vmdq_pools;
-
 	rx_ring_size = RTE_TEST_RX_DESC_DEFAULT;
 	tx_ring_size = RTE_TEST_TX_DESC_DEFAULT;
 	tx_rings = (uint16_t)rte_lcore_count();
 
-	retval = validate_num_devices(MAX_DEVICES);
-	if (retval < 0)
-		return retval;
-
-	/* Get port configuration. */
-	retval = get_eth_conf(&port_conf, num_devices);
-	if (retval < 0)
-		return retval;
-	/* NIC queues are divided into pf queues and vmdq queues.  */
-	num_pf_queues = dev_info.max_rx_queues - dev_info.vmdq_queue_num;
-	queues_per_pool = dev_info.vmdq_queue_num / dev_info.max_vmdq_pools;
-	num_vmdq_queues = num_devices * queues_per_pool;
-	num_queues = num_pf_queues + num_vmdq_queues;
-	vmdq_queue_base = dev_info.vmdq_queue_base;
-	vmdq_pool_base  = dev_info.vmdq_pool_base;
-	printf("pf queue num: %u, configured vmdq pool num: %u, each vmdq pool has %u queues\n",
-		num_pf_queues, num_devices, queues_per_pool);
-
-	if (port >= rte_eth_dev_count()) return -1;
-
 	if (enable_tx_csum == 0)
 		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_CSUM);
 
@@ -337,10 +237,11 @@ port_init(uint8_t port)
 
 	rx_rings = (uint16_t)dev_info.max_rx_queues;
 	/* Configure ethernet device. */
-	retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf);
+	retval = rte_eth_dev_configure(vs_port->port_id, rx_rings, tx_rings,
+				       &vs_port->port_conf);
 	if (retval != 0) {
 		RTE_LOG(ERR, VHOST_PORT, "Failed to configure port %u: %s.\n",
-			port, strerror(-retval));
+			vs_port->port_id, strerror(-retval));
 		return retval;
 	}
 
@@ -353,7 +254,7 @@ port_init(uint8_t port)
 		if (retval < 0) {
 			RTE_LOG(ERR, VHOST_PORT,
 				"Failed to setup rx queue %u of port %u: %s.\n",
-				q, port, strerror(-retval));
+				q, vs_port->port_id, strerror(-retval));
 			return retval;
 		}
 	}
@@ -369,28 +270,10 @@ port_init(uint8_t port)
 		}
 	}
 
-	/* Start the device. */
-	retval  = rte_eth_dev_start(port);
-	if (retval < 0) {
-		RTE_LOG(ERR, VHOST_PORT, "Failed to start port %u: %s\n",
-			port, strerror(-retval));
-		return retval;
-	}
-
 	if (promiscuous)
 		rte_eth_promiscuous_enable(port);
 
-	rte_eth_macaddr_get(port, &vmdq_ports_eth_addr[port]);
-	RTE_LOG(INFO, VHOST_PORT, "Max virtio devices supported: %u\n", num_devices);
-	RTE_LOG(INFO, VHOST_PORT, "Port %u MAC: %02"PRIx8" %02"PRIx8" %02"PRIx8
-			" %02"PRIx8" %02"PRIx8" %02"PRIx8"\n",
-			(unsigned)port,
-			vmdq_ports_eth_addr[port].addr_bytes[0],
-			vmdq_ports_eth_addr[port].addr_bytes[1],
-			vmdq_ports_eth_addr[port].addr_bytes[2],
-			vmdq_ports_eth_addr[port].addr_bytes[3],
-			vmdq_ports_eth_addr[port].addr_bytes[4],
-			vmdq_ports_eth_addr[port].addr_bytes[5]);
+	vs_port_start(vs_port);
 
 	return 0;
 }
@@ -542,9 +425,6 @@ us_vhost_parse_args(int argc, char **argv)
 
 		case 'P':
 			promiscuous = 1;
-			vmdq_conf_default.rx_adv_conf.vmdq_rx_conf.rx_mode =
-				ETH_VMDQ_ACCEPT_BROADCAST |
-				ETH_VMDQ_ACCEPT_MULTICAST;
 			rte_vhost_feature_enable(1ULL << VIRTIO_NET_F_CTRL_RX);
 
 			break;
@@ -632,11 +512,8 @@ us_vhost_parse_args(int argc, char **argv)
 					return -1;
 				} else {
 					mergeable = !!ret;
-					if (ret) {
-						vmdq_conf_default.rxmode.jumbo_frame = 1;
-						vmdq_conf_default.rxmode.max_rx_pkt_len
-							= JUMBO_FRAME_MAX_SIZE;
-					}
+					if (ret)
+						jumbo_frame_en = 1;
 				}
 			}
 
@@ -651,8 +528,6 @@ us_vhost_parse_args(int argc, char **argv)
 					return -1;
 				} else {
 					vlan_strip = !!ret;
-					vmdq_conf_default.rxmode.hw_vlan_strip =
-						vlan_strip;
 				}
 			}
 
@@ -747,8 +622,7 @@ static unsigned check_ports_num(unsigned nb_ports)
 	return valid_num_ports;
 }
 
-static inline struct vhost_dev *__attribute__((always_inline))
-find_vhost_dev(struct ether_addr *mac)
+struct vhost_dev *find_vhost_dev(struct ether_addr *mac)
 {
 	struct vhost_dev *vdev;
 
@@ -761,95 +635,6 @@ find_vhost_dev(struct ether_addr *mac)
 	return NULL;
 }
 
-/*
- * This function learns the MAC address of the device and registers this along with a
- * vlan tag to a VMDQ.
- */
-static int
-link_vmdq(struct vhost_dev *vdev, struct rte_mbuf *m)
-{
-	struct ether_hdr *pkt_hdr;
-	int i, ret;
-
-	/* Learn MAC address of guest device from packet */
-	pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
-
-	if (find_vhost_dev(&pkt_hdr->s_addr)) {
-		RTE_LOG(ERR, VHOST_DATA,
-			"(%d) device is using a registered MAC!\n",
-			vdev->vid);
-		return -1;
-	}
-
-	for (i = 0; i < ETHER_ADDR_LEN; i++)
-		vdev->mac_address.addr_bytes[i] = pkt_hdr->s_addr.addr_bytes[i];
-
-	/* vlan_tag currently uses the device_id. */
-	vdev->vlan_tag = vlan_tags[vdev->vid];
-
-	/* Print out VMDQ registration info. */
-	RTE_LOG(INFO, VHOST_DATA,
-		"(%d) mac %02x:%02x:%02x:%02x:%02x:%02x and vlan %d registered\n",
-		vdev->vid,
-		vdev->mac_address.addr_bytes[0], vdev->mac_address.addr_bytes[1],
-		vdev->mac_address.addr_bytes[2], vdev->mac_address.addr_bytes[3],
-		vdev->mac_address.addr_bytes[4], vdev->mac_address.addr_bytes[5],
-		vdev->vlan_tag);
-
-	/* Register the MAC address. */
-	ret = rte_eth_dev_mac_addr_add(ports[0], &vdev->mac_address,
-				(uint32_t)vdev->vid + vmdq_pool_base);
-	if (ret)
-		RTE_LOG(ERR, VHOST_DATA,
-			"(%d) failed to add device MAC address to VMDQ\n",
-			vdev->vid);
-
-	/* Enable stripping of the vlan tag as we handle routing. */
-	if (vlan_strip)
-		rte_eth_dev_set_vlan_strip_on_queue(ports[0],
-			(uint16_t)vdev->vmdq_rx_q, 1);
-
-	/* Set device as ready for RX. */
-	vdev->ready = DEVICE_RX;
-
-	return 0;
-}
-
-/*
- * Removes MAC address and vlan tag from VMDQ. Ensures that nothing is adding buffers to the RX
- * queue before disabling RX on the device.
- */
-static inline void
-unlink_vmdq(struct vhost_dev *vdev)
-{
-	unsigned i = 0;
-	unsigned rx_count;
-	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
-
-	if (vdev->ready == DEVICE_RX) {
-		/*clear MAC and VLAN settings*/
-		rte_eth_dev_mac_addr_remove(ports[0], &vdev->mac_address);
-		for (i = 0; i < 6; i++)
-			vdev->mac_address.addr_bytes[i] = 0;
-
-		vdev->vlan_tag = 0;
-
-		/*Clear out the receive buffers*/
-		rx_count = rte_eth_rx_burst(ports[0],
-					(uint16_t)vdev->vmdq_rx_q, pkts_burst, MAX_PKT_BURST);
-
-		while (rx_count) {
-			for (i = 0; i < rx_count; i++)
-				rte_pktmbuf_free(pkts_burst[i]);
-
-			rx_count = rte_eth_rx_burst(ports[0],
-					(uint16_t)vdev->vmdq_rx_q, pkts_burst, MAX_PKT_BURST);
-		}
-
-		vdev->ready = DEVICE_MAC_LEARNING;
-	}
-}
-
 static inline void __attribute__((always_inline))
 virtio_xmit(struct vhost_dev *dst_vdev, struct vhost_dev *src_vdev,
 	    struct rte_mbuf *m)
@@ -876,8 +661,7 @@ virtio_xmit(struct vhost_dev *dst_vdev, struct vhost_dev *src_vdev,
  * Check if the packet destination MAC address is for a local device. If so then put
  * the packet on that devices RX queue. If not then return.
  */
-static inline int __attribute__((always_inline))
-virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m)
+int virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m)
 {
 	struct ether_hdr *pkt_hdr;
 	struct vhost_dev *dst_vdev;
@@ -908,69 +692,9 @@ virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m)
 	return 0;
 }
 
-/*
- * Check if the destination MAC of a packet is one local VM,
- * and get its vlan tag, and offset if it is.
- */
-static inline int __attribute__((always_inline))
-find_local_dest(struct vhost_dev *vdev, struct rte_mbuf *m,
-	uint32_t *offset, uint16_t *vlan_tag)
+struct mbuf_table *vhost_switch_get_txq(uint16_t core_id)
 {
-	struct vhost_dev *dst_vdev;
-	struct ether_hdr *pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
-
-	dst_vdev = find_vhost_dev(&pkt_hdr->d_addr);
-	if (!dst_vdev)
-		return 0;
-
-	if (vdev->vid == dst_vdev->vid) {
-		RTE_LOG(DEBUG, VHOST_DATA,
-			"(%d) TX: src and dst MAC is same. Dropping packet.\n",
-			vdev->vid);
-		return -1;
-	}
-
-	/*
-	 * HW vlan strip will reduce the packet length
-	 * by minus length of vlan tag, so need restore
-	 * the packet length by plus it.
-	 */
-	*offset  = VLAN_HLEN;
-	*vlan_tag = vlan_tags[vdev->vid];
-
-	RTE_LOG(DEBUG, VHOST_DATA,
-		"(%d) TX: pkt to local VM device id: (%d), vlan tag: %u.\n",
-		vdev->vid, dst_vdev->vid, *vlan_tag);
-
-	return 0;
-}
-
-static uint16_t
-get_psd_sum(void *l3_hdr, uint64_t ol_flags)
-{
-	if (ol_flags & PKT_TX_IPV4)
-		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
-	else /* assume ethertype == ETHER_TYPE_IPv6 */
-		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
-}
-
-static void virtio_tx_offload(struct rte_mbuf *m)
-{
-	void *l3_hdr;
-	struct ipv4_hdr *ipv4_hdr = NULL;
-	struct tcp_hdr *tcp_hdr = NULL;
-	struct ether_hdr *eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
-
-	l3_hdr = (char *)eth_hdr + m->l2_len;
-
-	if (m->ol_flags & PKT_TX_IPV4) {
-		ipv4_hdr = l3_hdr;
-		ipv4_hdr->hdr_checksum = 0;
-		m->ol_flags |= PKT_TX_IP_CKSUM;
-	}
-
-	tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + m->l3_len);
-	tcp_hdr->cksum = get_psd_sum(l3_hdr, m->ol_flags);
+	return &lcore_tx_queue[core_id];
 }
 
 static inline void
@@ -980,8 +704,7 @@ free_pkts(struct rte_mbuf **pkts, uint16_t n)
 		rte_pktmbuf_free(pkts[n]);
 }
 
-static inline void __attribute__((always_inline))
-do_drain_mbuf_table(struct mbuf_table *tx_q)
+void do_drain_mbuf_table(struct mbuf_table *tx_q)
 {
 	uint16_t count;
 	struct vswitch_port *tx_port;
@@ -1006,100 +729,10 @@ do_drain_mbuf_table(struct mbuf_table *tx_q)
 	}
 
 	tx_q->len = 0;
+out:
+	return;
 }
 
-/*
- * This function routes the TX packet to the correct interface. This
- * may be a local device or the physical port.
- */
-static inline void __attribute__((always_inline))
-virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf *m, uint16_t vlan_tag)
-{
-	struct mbuf_table *tx_q;
-	unsigned offset = 0;
-	const uint16_t lcore_id = rte_lcore_id();
-	struct ether_hdr *nh;
-
-
-	nh = rte_pktmbuf_mtod(m, struct ether_hdr *);
-	if (unlikely(is_broadcast_ether_addr(&nh->d_addr))) {
-		struct vhost_dev *vdev2;
-
-		TAILQ_FOREACH(vdev2, &vhost_dev_list, global_vdev_entry) {
-			virtio_xmit(vdev2, vdev, m);
-		}
-		goto queue2nic;
-	}
-
-	/*check if destination is local VM*/
-	if ((vm2vm_mode == VM2VM_SOFTWARE) && (virtio_tx_local(vdev, m) == 0)) {
-		rte_pktmbuf_free(m);
-		return;
-	}
-
-	if (unlikely(vm2vm_mode == VM2VM_HARDWARE)) {
-		if (unlikely(find_local_dest(vdev, m, &offset,
-					     &vlan_tag) != 0)) {
-			rte_pktmbuf_free(m);
-			return;
-		}
-	}
-
-	RTE_LOG(DEBUG, VHOST_DATA,
-		"(%d) TX: MAC address is external\n", vdev->vid);
-
-queue2nic:
-
-	/*Add packet to the port tx queue*/
-	tx_q = &lcore_tx_queue[lcore_id];
-
-	nh = rte_pktmbuf_mtod(m, struct ether_hdr *);
-	if (unlikely(nh->ether_type == rte_cpu_to_be_16(ETHER_TYPE_VLAN))) {
-		/* Guest has inserted the vlan tag. */
-		struct vlan_hdr *vh = (struct vlan_hdr *) (nh + 1);
-		uint16_t vlan_tag_be = rte_cpu_to_be_16(vlan_tag);
-		if ((vm2vm_mode == VM2VM_HARDWARE) &&
-			(vh->vlan_tci != vlan_tag_be))
-			vh->vlan_tci = vlan_tag_be;
-	} else {
-		m->ol_flags |= PKT_TX_VLAN_PKT;
-
-		/*
-		 * Find the right seg to adjust the data len when offset is
-		 * bigger than tail room size.
-		 */
-		if (unlikely(vm2vm_mode == VM2VM_HARDWARE)) {
-			if (likely(offset <= rte_pktmbuf_tailroom(m)))
-				m->data_len += offset;
-			else {
-				struct rte_mbuf *seg = m;
-
-				while ((seg->next != NULL) &&
-					(offset > rte_pktmbuf_tailroom(seg)))
-					seg = seg->next;
-
-				seg->data_len += offset;
-			}
-			m->pkt_len += offset;
-		}
-
-		m->vlan_tci = vlan_tag;
-	}
-
-	if (m->ol_flags & PKT_TX_TCP_SEG)
-		virtio_tx_offload(m);
-
-	tx_q->m_table[tx_q->len++] = m;
-	if (enable_stats) {
-		vdev->stats.tx_total++;
-		vdev->stats.tx++;
-	}
-
-	if (unlikely(tx_q->len == MAX_PKT_BURST))
-		do_drain_mbuf_table(tx_q);
-}
-
-
 static inline void __attribute__((always_inline))
 drain_mbuf_table(struct mbuf_table *tx_q)
 {
@@ -1123,7 +756,7 @@ drain_mbuf_table(struct mbuf_table *tx_q)
 static inline void __attribute__((always_inline))
 drain_eth_rx(struct vhost_dev *vdev)
 {
-	uint16_t rx_count, enqueue_count;
+	uint16_t rx_count;
 	struct rte_mbuf *pkts[MAX_PKT_BURST];
 	uint16_t rxq, core_id;
 	struct vswitch_port *rx_port;
@@ -1148,37 +781,18 @@ drain_eth_rx(struct vhost_dev *vdev)
 	rxq = rx_port->get_rxq(rx_port, vdev, core_id);
 	rx_count = rx_port->do_rx(rx_port, rxq, NULL, pkts, MAX_PKT_BURST);
 
-	rx_count = rte_eth_rx_burst(ports[0], vdev->vmdq_rx_q,
-				    pkts, MAX_PKT_BURST);
 	if (!rx_count)
 		return;
 
-	/*
-	 * When "enable_retry" is set, here we wait and retry when there
-	 * is no enough free slots in the queue to hold @rx_count packets,
-	 * to diminish packet loss.
-	 */
-	if (enable_retry &&
-	    unlikely(rx_count > rte_vhost_avail_entries(vdev->vid,
-			VIRTIO_RXQ))) {
-		uint32_t retry;
-
-		for (retry = 0; retry < burst_rx_retry_num; retry++) {
-			rte_delay_us(burst_rx_delay_time);
-			if (rx_count <= rte_vhost_avail_entries(vdev->vid,
-					VIRTIO_RXQ))
-				break;
-		}
-	}
+	vs_lookup_n_fwd(rx_port, pkts, rx_count, rxq);
 
-	enqueue_count = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						pkts, rx_count);
 	if (enable_stats) {
 		rte_atomic64_add(&vdev->stats.rx_total_atomic, rx_count);
-		rte_atomic64_add(&vdev->stats.rx_atomic, enqueue_count);
 	}
 
 	free_pkts(pkts, rx_count);
+out:
+	return;
 }
 
 static inline void __attribute__((always_inline))
@@ -1263,7 +877,7 @@ switch_worker(void *arg __rte_unused)
 		TAILQ_FOREACH(vdev, &lcore_info[lcore_id].vdev_list,
 			      lcore_vdev_entry) {
 			if (unlikely(vdev->remove)) {
-				unlink_vmdq(vdev);
+				vs_unlearn_port(vdev->vs_port);
 				vdev->ready = DEVICE_SAFE_REMOVE;
 				continue;
 			}
@@ -1289,6 +903,7 @@ static void
 destroy_device(int vid)
 {
 	struct vhost_dev *vdev = NULL;
+	struct vswitch_port *vs_port;
 	int lcore;
 
 	TAILQ_FOREACH(vdev, &vhost_dev_list, global_vdev_entry) {
@@ -1324,6 +939,10 @@ destroy_device(int vid)
 
 	lcore_info[vdev->coreid].device_num--;
 
+	vs_port = vdev->vs_port;
+	vs_port_stop(vs_port);
+	vs_del_port(vs_port);
+
 	RTE_LOG(INFO, VHOST_DATA,
 		"(%d) device has been removed from data core\n",
 		vdev->vid);
@@ -1341,6 +960,7 @@ new_device(int vid)
 	int lcore, core_add = 0;
 	uint32_t device_num_min;
 	struct vhost_dev *vdev;
+	struct vswitch_port *vs_port;
 
 	vdev = rte_zmalloc("vhost device", sizeof(*vdev), RTE_CACHE_LINE_SIZE);
 	if (vdev == NULL) {
@@ -1362,7 +982,6 @@ new_device(int vid)
 	vdev->vs_port = vs_port;
 
 	TAILQ_INSERT_TAIL(&vhost_dev_list, vdev, global_vdev_entry);
-	vdev->vmdq_rx_q = vid * queues_per_pool + vmdq_queue_base;
 
 	/*reset ready flag*/
 	vdev->ready = DEVICE_MAC_LEARNING;
@@ -1552,7 +1171,7 @@ main(int argc, char *argv[])
 	uint8_t portid;
 	static pthread_t tid;
 	char thread_name[RTE_MAX_THREAD_NAME_LEN];
-	uint64_t flags = 0;
+	uint64_t flags = 0, vswitch_conf_flags;
 
 	signal(SIGINT, sigint_handler);
 
@@ -1563,11 +1182,19 @@ main(int argc, char *argv[])
 	argc -= ret;
 	argv += ret;
 
+	vs_vswitch_init();
+
+	/* TBD:XXX: This needs to be removed here, when constructor mechanism
+	 * for registering swittches is in place
+	 */
+	vmdq_switch_impl_init();
+
 	/* parse app arguments */
 	ret = us_vhost_parse_args(argc, argv);
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Invalid argument\n");
 
+	/*TBD:XXX: vdev list or the vdev ports */
 	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id ++)
 		TAILQ_INIT(&lcore_info[lcore_id].vdev_list);
 
@@ -1592,6 +1219,13 @@ main(int argc, char *argv[])
 		return -1;
 	}
 
+	vswitch_dev_g = vs_get_vswitch_dev(switch_dev);
+	if (!vswitch_dev_g) {
+		RTE_LOG(INFO, VHOST_CONFIG, "switch dev %s not supported\n",
+			switch_dev);
+		return -1;
+	}
+
 	/*
 	 * FIXME: here we are trying to allocate mbufs big enough for
 	 * @MAX_QUEUES, but the truth is we're never going to use that
@@ -1601,12 +1235,8 @@ main(int argc, char *argv[])
 	create_mbuf_pool(valid_num_ports, rte_lcore_count() - 1, MBUF_DATA_SIZE,
 			 MAX_QUEUES, RTE_TEST_RX_DESC_DEFAULT, MBUF_CACHE_SIZE);
 
-	if (vm2vm_mode == VM2VM_HARDWARE) {
-		/* Enable VT loop back to let L2 switch to do it. */
-		vmdq_conf_default.rx_adv_conf.vmdq_rx_conf.enable_loop_back = 1;
-		RTE_LOG(DEBUG, VHOST_CONFIG,
-			"Enable loop back for L2 switch in vmdq.\n");
-	}
+	vswitch_conf_flags = get_vswitch_conf_flags();
+	vs_switch_dev_init(vswitch_dev_g, vswitch_conf_flags);
 
 	/* initialize all ports */
 	for (portid = 0; portid < nb_ports; portid++) {
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index 6bb42e8..d6b23f6 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -68,6 +68,7 @@ struct vhost_dev {
 	struct device_statistics stats;
 	TAILQ_ENTRY(vhost_dev) global_vdev_entry;
 	TAILQ_ENTRY(vhost_dev) lcore_vdev_entry;
+	struct vswitch_port *vs_port;
 } __rte_cache_aligned;
 
 TAILQ_HEAD(vhost_dev_tailq_list, vhost_dev);
@@ -88,4 +89,13 @@ struct lcore_info {
 	struct vhost_dev_tailq_list vdev_list;
 };
 
+#define MAX_PKT_BURST 32		/* Max burst size for RX/TX */
+
+/* Used for queueing bursts of TX packets. */
+struct mbuf_table {
+	unsigned len;
+	unsigned txq_id;
+	struct rte_mbuf *m_table[MAX_PKT_BURST];
+};
+
 #endif /* _MAIN_H_ */
diff --git a/examples/vhost/vmdq.c b/examples/vhost/vmdq.c
new file mode 100644
index 0000000..ca26195
--- /dev/null
+++ b/examples/vhost/vmdq.c
@@ -0,0 +1,669 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <rte_atomic.h>
+#include <rte_cycles.h>
+#include <rte_ethdev.h>
+#include <rte_log.h>
+#include <rte_string_fns.h>
+#include <rte_malloc.h>
+#include <rte_virtio_net.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+
+#include "vswitch_common.h"
+#include "vmdq.h"
+
+#define JUMBO_FRAME_MAX_SIZE    0x2600
+/* State of virtio device. */
+#define DEVICE_MAC_LEARNING 0
+#define DEVICE_RX			1
+#define DEVICE_SAFE_REMOVE	2
+
+#define VLAN_HLEN       4
+
+static struct vswitch_dev *vmdq_switch_dev_g;
+
+const uint16_t vlan_tags[] = {
+	1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007,
+	1008, 1009, 1010, 1011,	1012, 1013, 1014, 1015,
+	1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023,
+	1024, 1025, 1026, 1027, 1028, 1029, 1030, 1031,
+	1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039,
+	1040, 1041, 1042, 1043, 1044, 1045, 1046, 1047,
+	1048, 1049, 1050, 1051, 1052, 1053, 1054, 1055,
+	1056, 1057, 1058, 1059, 1060, 1061, 1062, 1063,
+};
+
+/* empty vmdq configuration structure. Filled in programatically */
+static struct rte_eth_conf vmdq_conf_default = {
+	.rxmode = {
+		.mq_mode        = ETH_MQ_RX_VMDQ_ONLY,
+		.split_hdr_size = 0,
+		.header_split   = 0, /**< Header Split disabled */
+		.hw_ip_checksum = 0, /**< IP checksum offload disabled */
+		.hw_vlan_filter = 0, /**< VLAN filtering disabled */
+		/*
+		 * It is necessary for 1G NIC such as I350,
+		 * this fixes bug of ipv4 forwarding in guest can't
+		 * forward pakets from one virtio dev to another virtio dev.
+		 */
+		.hw_vlan_strip  = 1, /**< VLAN strip enabled. */
+		.jumbo_frame    = 0, /**< Jumbo Frame Support disabled */
+		.hw_strip_crc   = 0, /**< CRC stripped by hardware */
+	},
+
+	.txmode = {
+		.mq_mode = ETH_MQ_TX_NONE,
+	},
+	.rx_adv_conf = {
+		/*
+		 * should be overridden separately in code with
+		 * appropriate values
+		 */
+		.vmdq_rx_conf = {
+			.nb_queue_pools = ETH_8_POOLS,
+			.enable_default_pool = 0,
+			.default_pool = 0,
+			.nb_pool_maps = 0,
+			.pool_map = {{0, 0},},
+		},
+	},
+};
+
+
+static int vmdq_switch_init(__attribute__((unused))struct vswitch_dev *vs_dev,
+			    uint64_t conf_flags)
+{
+	uint32_t enable;
+
+	if (conf_flags & VS_CNF_FLG_VM2VM_HARDWARE) {
+		/* Enable VT loop back to let L2 switch to do it. */
+		vmdq_conf_default.rx_adv_conf.vmdq_rx_conf.enable_loop_back = 1;
+		RTE_LOG(DEBUG, VHOST_CONFIG,
+			"Enable loop back for L2 switch in vmdq.\n");
+	}
+
+	if (conf_flags & VS_CNF_FLG_PROMISCOUS_EN) {
+		vmdq_conf_default.rx_adv_conf.vmdq_rx_conf.rx_mode =
+				ETH_VMDQ_ACCEPT_BROADCAST |
+				ETH_VMDQ_ACCEPT_MULTICAST;
+	}
+
+	if (conf_flags & VS_CNF_FLG_JUMBO_EN) {
+		vmdq_conf_default.rxmode.jumbo_frame = 1;
+		vmdq_conf_default.rxmode.max_rx_pkt_len = JUMBO_FRAME_MAX_SIZE;
+	}
+
+	enable = !!(conf_flags & VS_CNF_FLG_VLAN_STRIP_EN);
+	vmdq_conf_default.rxmode.hw_vlan_strip = enable;
+
+	return 0;
+}
+
+
+static int vmdq_get_max_vdevs(struct vswitch_dev *vs_dev)
+{
+	struct vmdq_switch_priv *priv = vs_dev->priv;
+
+	return priv->num_devices;
+}
+
+/*
+ * Builds up the correct configuration for VMDQ VLAN pool map
+ * according to the pool & queue limits.
+ */
+static inline int
+vmdq_get_eth_conf(struct vswitch_port *vs_port, struct rte_eth_conf *eth_conf)
+{
+	struct rte_eth_vmdq_rx_conf conf;
+	struct rte_eth_vmdq_rx_conf *def_conf =
+		&vmdq_conf_default.rx_adv_conf.vmdq_rx_conf;
+	struct vmdq_switch_priv *priv = vs_port->vs_dev->priv;
+	unsigned i;
+
+	memset(&conf, 0, sizeof(conf));
+	conf.nb_queue_pools = (enum rte_eth_nb_pools)priv->num_devices;
+	conf.nb_pool_maps = priv->num_devices;
+	conf.enable_loop_back = def_conf->enable_loop_back;
+	conf.rx_mode = def_conf->rx_mode;
+
+	for (i = 0; i < conf.nb_pool_maps; i++) {
+		conf.pool_map[i].vlan_id = vlan_tags[ i ];
+		conf.pool_map[i].pools = (1UL << i);
+	}
+
+	(void)(rte_memcpy(eth_conf, &vmdq_conf_default, sizeof(*eth_conf)));
+	(void)(rte_memcpy(&eth_conf->rx_adv_conf.vmdq_rx_conf, &conf,
+		   sizeof(eth_conf->rx_adv_conf.vmdq_rx_conf)));
+	return 0;
+}
+
+static int vmdq_add_port_phys(struct vswitch_port *vs_port)
+{
+	struct rte_eth_dev_info dev_info;
+	struct vmdq_switch_priv *priv = vs_port->vs_dev->priv;
+	uint16_t queues_per_pool;
+	int rc = 0;
+
+	if (priv->phys_port_count >= VMDQ_MAX_PHYS_PORTS) {
+		RTE_LOG(INFO, VHOST_CONFIG,
+			"Physical ports greater than max devices(%d)\n",
+			VMDQ_MAX_PHYS_PORTS);
+		rc = -EBUSY;
+		goto out;
+	}
+
+	rte_eth_dev_info_get (vs_port->port_id, &dev_info);
+	if (dev_info.max_vmdq_pools > VMDQ_MAX_VIRTIO_PORTS) {
+		RTE_LOG(INFO, VHOST_CONFIG,
+			"Num devices (%d) greater than Max (%d)\n",
+			dev_info.max_vmdq_pools, VMDQ_MAX_VIRTIO_PORTS);
+		rc = -EINVAL;
+		goto out;
+
+	}
+
+	priv->num_devices = dev_info.max_vmdq_pools;
+	priv->phys_port_count++;
+
+	priv->num_pf_queues = dev_info.max_rx_queues - dev_info.vmdq_queue_num;
+	queues_per_pool =  dev_info.vmdq_queue_num / dev_info.max_vmdq_pools;
+	priv->queues_per_pool = queues_per_pool;
+	priv->num_vmdq_queues = priv->phys_port_count * queues_per_pool;
+	priv->num_queues = priv->num_pf_queues + priv->num_vmdq_queues;
+	priv->vmdq_queue_base = dev_info.vmdq_queue_base;
+	priv->vmdq_pool_base  = dev_info.vmdq_pool_base;
+
+	rc = vmdq_get_eth_conf(vs_port, &vs_port->port_conf);
+	if (rc < 0) {
+		goto out;
+	}
+
+	/*In VMDQ vhost_switch only one physical port is required, keep it
+	 * global so that it can be accessed while doing tx/rx to physical port
+	 */
+	priv->phys_port = vs_port;
+
+
+	printf("pf queue num: %u, configured vmdq pool num: %u, each vmdq pool has %u queues\n",
+		priv->num_pf_queues, priv->num_devices, priv->queues_per_pool);
+
+	RTE_LOG(INFO, VHOST_PORT, "Max virtio devices supported: %u\n",
+		priv->num_devices);
+out:
+	if (rc) {
+		priv->phys_port_count--;
+	}
+	return rc;
+}
+
+static int vmdq_add_port_virtio(struct vswitch_port *vs_port)
+{
+	struct vmdq_switch_priv *priv = vs_port->vs_dev->priv;
+	uint16_t rxq;
+
+	rxq = vs_port->port_id * priv->queues_per_pool + priv->vmdq_queue_base;
+	vs_port->phys_port_rxq = rxq;
+
+	priv->virtio_port_map[rxq] = vs_port;
+
+	RTE_LOG(INFO, VHOST_PORT, "Added virtio port %d, vmdq rxq %d\n",
+	vs_port->port_id, rxq);
+
+	return 0;
+}
+
+static int vmdq_add_port(struct vswitch_port *port)
+{
+	int rc = 0;
+
+	switch(port->type) {
+	case VSWITCH_PTYPE_PHYS:
+		rc = vmdq_add_port_phys(port);
+		break;
+	case VSWITCH_PTYPE_VIRTIO:
+		rc = vmdq_add_port_virtio(port);
+		break;
+	default:
+		RTE_LOG(INFO, VHOST_CONFIG, "Unkown port[id %d] type %d\n",
+			port->port_id, port->type);
+		rc = -EINVAL;
+	}
+
+	return rc;
+}
+
+static int vmdq_port_start(struct vswitch_port *port)
+{
+	int rc = 0;
+
+	switch(port->type) {
+	case VSWITCH_PTYPE_PHYS:
+		rc  = rte_eth_dev_start(port->port_id);
+		RTE_LOG(INFO, VHOST_PORT, "Started PHYS port %d, rc %d\n", port->port_id, rc);
+		break;
+	case VSWITCH_PTYPE_VIRTIO:
+		/*No specefic function to start virtio dev (?? check) */
+		rc = 0;
+		break;
+	default:
+		RTE_LOG(INFO, VHOST_CONFIG, "Unkown port[id %d] type %d\n",
+			port->port_id, port->type);
+		rc = -EINVAL;
+	}
+
+
+	/* Start the device. */
+	if (rc) {
+		RTE_LOG(ERR, VHOST_PORT, "Failed to init port[id %d, typ %d]\n",
+			port->port_id, port->type);
+	}
+
+	return rc;
+}
+
+/*
+ * This function learns the MAC address of the device and registers this along with a
+ * vlan tag to a VMDQ.
+ */
+static int
+link_vmdq(struct vswitch_port *vs_port, struct rte_mbuf *m)
+{
+	struct vhost_dev *vdev = vs_port->priv;
+	struct vmdq_switch_priv *priv = vs_port->vs_dev->priv;
+	struct vswitch_dev *vs_dev = vs_port->vs_dev;
+	struct vswitch_port *phys_port = priv->phys_port;
+	struct ether_hdr *pkt_hdr;
+	int i, ret;
+
+	/* Learn MAC address of guest device from packet */
+	pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
+
+	if (find_vhost_dev(&pkt_hdr->s_addr)) {
+		RTE_LOG(ERR, VHOST_DATA,
+			"(%d) device is using a registered MAC!\n",
+			vdev->vid);
+		return -1;
+	}
+
+	for (i = 0; i < ETHER_ADDR_LEN; i++) {
+		vdev->mac_address.addr_bytes[i] = pkt_hdr->s_addr.addr_bytes[i];
+		vs_port->mac_addr.addr_bytes[i] = pkt_hdr->s_addr.addr_bytes[i];
+	}
+
+	/* vlan_tag currently uses the device_id. */
+	vdev->vlan_tag = vlan_tags[vdev->vid];
+
+	/* Print out VMDQ registration info. */
+	RTE_LOG(INFO, VHOST_DATA,
+		"(%d) mac %02x:%02x:%02x:%02x:%02x:%02x and vlan %d registered\n",
+		vdev->vid,
+		vdev->mac_address.addr_bytes[0], vdev->mac_address.addr_bytes[1],
+		vdev->mac_address.addr_bytes[2], vdev->mac_address.addr_bytes[3],
+		vdev->mac_address.addr_bytes[4], vdev->mac_address.addr_bytes[5],
+		vdev->vlan_tag);
+
+	/* Register the MAC address. */
+	ret = rte_eth_dev_mac_addr_add(phys_port->port_id, &vdev->mac_address,
+				(uint32_t)vdev->vid + priv->vmdq_pool_base);
+	if (ret)
+		RTE_LOG(ERR, VHOST_DATA,
+			"(%d) failed to add device MAC address to VMDQ\n",
+			vdev->vid);
+
+	/* Enable stripping of the vlan tag as we handle routing. */
+	if (vs_dev->conf_flags & VS_CNF_FLG_VLAN_STRIP_EN)
+		rte_eth_dev_set_vlan_strip_on_queue(phys_port->port_id,
+			(uint16_t)vs_port->phys_port_rxq, 1);
+
+	/* Set device as ready for RX. */
+	vdev->ready = DEVICE_RX;
+
+	return 0;
+}
+
+static int vmdq_learn_port (struct vswitch_port *vs_port,
+			    struct rte_mbuf **pkts,
+		     __attribute__((unused))uint16_t count)
+{
+	int rc  = 0;
+
+	if (vs_port->type == VSWITCH_PTYPE_VIRTIO)
+		rc = link_vmdq(vs_port, pkts[0]);
+
+	return rc;
+}
+
+/*
+ * Removes MAC address and vlan tag from VMDQ. Ensures that nothing is adding buffers to the RX
+ * queue before disabling RX on the device.
+ */
+static void unlink_vmdq(struct vswitch_port *vs_port)
+{
+	struct vhost_dev *vdev = vs_port->priv;
+	struct vmdq_switch_priv *priv = vs_port->vs_dev->priv;
+	struct vswitch_port *phys_port = priv->phys_port;
+	unsigned i = 0;
+	unsigned rx_count;
+	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
+
+	if (vdev->ready == DEVICE_RX) {
+		/*clear MAC and VLAN settings*/
+		rte_eth_dev_mac_addr_remove(phys_port->port_id,
+					    &vdev->mac_address);
+		for (i = 0; i < 6; i++) {
+			vdev->mac_address.addr_bytes[i] = 0;
+			vs_port->mac_addr.addr_bytes[i] = 0;
+		}
+
+		vdev->vlan_tag = 0;
+
+		/*Clear out the receive buffers*/
+		rx_count = rte_eth_rx_burst(phys_port->port_id,
+					(uint16_t)vs_port->phys_port_rxq,
+					pkts_burst, MAX_PKT_BURST);
+
+		while (rx_count) {
+			for (i = 0; i < rx_count; i++)
+				rte_pktmbuf_free(pkts_burst[i]);
+
+			rx_count = rte_eth_rx_burst(phys_port->port_id,
+					(uint16_t)vdev->vmdq_rx_q, pkts_burst,
+					MAX_PKT_BURST);
+		}
+
+		vdev->ready = DEVICE_MAC_LEARNING;
+	}
+}
+
+static int vmdq_unlearn_port (struct vswitch_port *vs_port)
+{
+	int rc = 0;
+
+	if (vs_port->type == VSWITCH_PTYPE_VIRTIO)
+		unlink_vmdq(vs_port);
+
+	return rc;
+}
+
+/*
+ * Check if the destination MAC of a packet is one local VM,
+ * and get its vlan tag, and offset if it is.
+ */
+static inline int __attribute__((always_inline))
+find_local_dest(struct vhost_dev *vdev, struct rte_mbuf *m,
+	uint32_t *offset, uint16_t *vlan_tag)
+{
+	struct vhost_dev *dst_vdev;
+	struct ether_hdr *pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
+
+	dst_vdev = find_vhost_dev(&pkt_hdr->d_addr);
+	if (!dst_vdev)
+		return 0;
+
+	if (vdev->vid == dst_vdev->vid) {
+		RTE_LOG(DEBUG, VHOST_DATA,
+			"(%d) TX: src and dst MAC is same. Dropping packet.\n",
+			vdev->vid);
+		return -1;
+	}
+
+	/*
+	 * HW vlan strip will reduce the packet length
+	 * by minus length of vlan tag, so need restore
+	 * the packet length by plus it.
+	 */
+	*offset  = VLAN_HLEN;
+	*vlan_tag = vlan_tags[vdev->vid];
+
+	RTE_LOG(DEBUG, VHOST_DATA,
+		"(%d) TX: pkt to local VM device id: (%d), vlan tag: %u.\n",
+		vdev->vid, dst_vdev->vid, *vlan_tag);
+
+	return 0;
+}
+
+static uint16_t
+get_psd_sum(void *l3_hdr, uint64_t ol_flags)
+{
+	if (ol_flags & PKT_TX_IPV4)
+		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
+	else /* assume ethertype == ETHER_TYPE_IPv6 */
+		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
+}
+
+static void virtio_tx_offload(struct rte_mbuf *m)
+{
+	void *l3_hdr;
+	struct ipv4_hdr *ipv4_hdr = NULL;
+	struct tcp_hdr *tcp_hdr = NULL;
+	struct ether_hdr *eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
+
+	l3_hdr = (char *)eth_hdr + m->l2_len;
+
+	if (m->ol_flags & PKT_TX_IPV4) {
+		ipv4_hdr = l3_hdr;
+		ipv4_hdr->hdr_checksum = 0;
+		m->ol_flags |= PKT_TX_IP_CKSUM;
+	}
+
+	tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + m->l3_len);
+	tcp_hdr->cksum = get_psd_sum(l3_hdr, m->ol_flags);
+}
+
+/*
+ * This function routes the TX packet to the correct interface. This
+ * may be a local device or the physical port.
+ */
+static inline void __attribute__((always_inline))
+virtio_tx_route(struct vswitch_port *vs_port,
+		struct rte_mbuf *m, uint16_t vlan_tag)
+{
+	struct vhost_dev *vdev = vs_port->priv;
+	struct vswitch_dev *vs_dev = vs_port->vs_dev;
+	struct mbuf_table *tx_q;
+	unsigned offset = 0;
+	const uint16_t lcore_id = rte_lcore_id();
+	struct ether_hdr *nh;
+	uint32_t tx_ptype_mask;
+
+	nh = rte_pktmbuf_mtod(m, struct ether_hdr *);
+	if (unlikely(is_broadcast_ether_addr(&nh->d_addr))) {
+
+		/*broad cast to virtio ports only, for physical port
+		 * qeueu2nic will do it after adding vlan tag of device
+		 */
+		tx_ptype_mask = VS_PTYPE_MASK(VSWITCH_PTYPE_VIRTIO);
+		vs_do_broadcast_fwd(vs_port->vs_dev, vs_port,
+			tx_ptype_mask, m);
+
+		goto queue2nic;
+	}
+
+	/*check if destination is local VM*/
+	if ((vs_dev->conf_flags & VS_CNF_FLG_VM2VM_SOFTWARE)) {
+		if (!virtio_tx_local(vdev, m)) {
+			rte_pktmbuf_free(m);
+			return;
+		}
+	}
+
+	if (unlikely(vs_dev->conf_flags & VS_CNF_FLG_VM2VM_HARDWARE)) {
+		if (unlikely(find_local_dest(vdev, m, &offset,
+					     &vlan_tag) != 0)) {
+			rte_pktmbuf_free(m);
+			return;
+		}
+	}
+
+	RTE_LOG(DEBUG, VHOST_DATA,
+		"(%d) TX: MAC address is external\n", vdev->vid);
+
+queue2nic:
+
+	/*Add packet to the port tx queue*/
+	tx_q = vhost_switch_get_txq(lcore_id);
+
+	nh = rte_pktmbuf_mtod(m, struct ether_hdr *);
+	if (unlikely(nh->ether_type == rte_cpu_to_be_16(ETHER_TYPE_VLAN))) {
+		/* Guest has inserted the vlan tag. */
+		struct vlan_hdr *vh = (struct vlan_hdr *) (nh + 1);
+		uint16_t vlan_tag_be = rte_cpu_to_be_16(vlan_tag);
+		if ((vs_dev->conf_flags & VS_CNF_FLG_VM2VM_HARDWARE) &&
+			(vh->vlan_tci != vlan_tag_be))
+			vh->vlan_tci = vlan_tag_be;
+	} else {
+		m->ol_flags |= PKT_TX_VLAN_PKT;
+
+		/*
+		 * Find the right seg to adjust the data len when offset is
+		 * bigger than tail room size.
+		 */
+		if (unlikely((vs_dev->conf_flags & VS_CNF_FLG_VM2VM_HARDWARE))){
+			if (likely(offset <= rte_pktmbuf_tailroom(m)))
+				m->data_len += offset;
+			else {
+				struct rte_mbuf *seg = m;
+
+				while ((seg->next != NULL) &&
+					(offset > rte_pktmbuf_tailroom(seg)))
+					seg = seg->next;
+
+				seg->data_len += offset;
+			}
+			m->pkt_len += offset;
+		}
+
+		m->vlan_tci = vlan_tag;
+	}
+
+	if (m->ol_flags & PKT_TX_TCP_SEG)
+		virtio_tx_offload(m);
+
+	tx_q->m_table[tx_q->len++] = m;
+	if (vs_dev->conf_flags & VS_CNF_FLG_STATS_EN) {
+		vdev->stats.tx_total++;
+		vdev->stats.tx++;
+	}
+
+	if (unlikely(tx_q->len == MAX_PKT_BURST))
+		do_drain_mbuf_table(tx_q);
+}
+
+static int vmdq_lookup_n_fwd_virtio(struct vswitch_port *vs_port,
+			struct rte_mbuf **pkts, uint16_t count,
+			__attribute__((unused)) uint16_t in_rxq)
+{
+	int i;
+	struct vhost_dev *vdev = vs_port->priv;
+
+	for (i = 0; i < count; ++i)
+		virtio_tx_route(vs_port, pkts[i], vlan_tags[vdev->vid]);
+
+	return 0;
+}
+
+static int vmdq_lookup_n_fwd_phys(struct vswitch_port *vs_port,
+			struct rte_mbuf **pkts, uint16_t count, uint16_t in_rxq)
+{
+	struct vswitch_port *dest_port;
+	struct vhost_dev *dest_vdev;
+	struct vmdq_switch_priv *priv = vs_port->vs_dev->priv;
+	uint16_t enqueue_count;
+
+	dest_port = priv->virtio_port_map[in_rxq];
+	dest_vdev = (struct vhost_dev *)dest_port->priv;
+	enqueue_count = dest_port->do_tx(dest_port, VIRTIO_RXQ,
+					 NULL, pkts, count);
+
+	rte_atomic64_add(&dest_vdev->stats.rx_atomic, enqueue_count);
+
+	return 0;
+}
+
+static int vmdq_lookup_n_fwd(struct vswitch_port *vs_port,
+		struct rte_mbuf **pkts, uint16_t count, uint16_t in_rxq)
+{
+	int rc;
+
+	switch(vs_port->type) {
+	case VSWITCH_PTYPE_VIRTIO:
+		rc = vmdq_lookup_n_fwd_virtio(vs_port, pkts, count, in_rxq);
+		break;
+	case VSWITCH_PTYPE_PHYS:
+		rc = vmdq_lookup_n_fwd_phys(vs_port, pkts, count, in_rxq);
+		break;
+	default:
+		rc = -EINVAL;
+		break;
+	}
+
+	return rc;
+}
+
+static struct vswitch_port *vmdq_sched_phys_port(struct vswitch_dev *vs_dev,
+			__attribute__((unused))enum vswitch_port_type ptype,
+			__attribute__((unused))uint16_t core_id)
+{
+	struct vmdq_switch_priv *priv = vs_dev->priv;
+
+	/*With VMDQ do rx/tx with the only one physical port (non virtio)*/
+
+	return priv->phys_port;
+}
+
+struct vswitch_ops vmdq_switch_ops = {
+	.add_port = vmdq_add_port,
+	.lookup_n_fwd = vmdq_lookup_n_fwd,
+	.port_start = vmdq_port_start,
+	.switch_init = vmdq_switch_init,
+	.learn_port = vmdq_learn_port,
+	.unlearn_port = vmdq_unlearn_port,
+	.sched_rx_port = vmdq_sched_phys_port,
+	.sched_tx_port = vmdq_sched_phys_port,
+	.get_max_vdevs = vmdq_get_max_vdevs,
+};
+
+void vmdq_switch_impl_init(void)
+{
+	vmdq_switch_dev_g = vs_register_switch("vmdq",
+			sizeof(struct vmdq_switch_priv), VMDQ_MAX_VIRTIO_PORTS,
+			&vmdq_switch_ops);
+	if (!vmdq_switch_dev_g) {
+		RTE_LOG(DEBUG, VHOST_CONFIG, "VMDQ switch registration failure\n");
+		goto out;
+	}
+
+out:
+	return;
+}
diff --git a/examples/vhost/vmdq.h b/examples/vhost/vmdq.h
new file mode 100644
index 0000000..bff8535
--- /dev/null
+++ b/examples/vhost/vmdq.h
@@ -0,0 +1,57 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Freescale Semiconductor. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Freescale Semiconductor nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __VHOST_VMDQ_H__
+#define __VHOST_VMDQ_H__
+
+#include "vswitch_common.h"
+
+#define VMDQ_MAX_PHYS_PORTS	1
+#define VMDQ_MAX_VIRTIO_PORTS	64
+
+struct vmdq_switch_priv {
+	struct vswitch_dev *vs_dev;
+	struct vswitch_port *phys_port;
+	struct vswitch_port *virtio_port_map[VMDQ_MAX_VIRTIO_PORTS];
+	int phys_port_count;
+	int num_devices;
+	uint16_t num_pf_queues;
+	uint16_t num_queues;
+	uint16_t num_vmdq_queues;
+	uint16_t vmdq_pool_base;
+	uint16_t vmdq_queue_base;
+	uint16_t queues_per_pool;
+	uint16_t conf_flags;
+};
+
+#endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework
  2016-09-04 10:23 ` [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework Pankaj Chauhan
@ 2016-09-09  8:56   ` Tan, Jianfeng
  2016-09-12 10:55     ` Pankaj Chauhan
  2016-09-11 12:21   ` Yuanhan Liu
  2016-09-26  4:12   ` Yuanhan Liu
  2 siblings, 1 reply; 22+ messages in thread
From: Tan, Jianfeng @ 2016-09-09  8:56 UTC (permalink / raw)
  To: Pankaj Chauhan, dev; +Cc: hemant.agrawal, yuanhan.liu, maxime.coquelin

Hi Pankaj,

Thanks for the massive and great work.

On 9/5/2016 6:54 PM, Pankaj Chauhan wrote:
> Introduce support for a generic framework for handling of switching between
> physical and vhost devices. The vswitch framework introduces the following
> concept:
>
> 1. vswitch_dev: Vswitch device is a logical switch which can have physical and
> virtio devices. The devices are operated/used using standard rte_eth API for
> physical devices and lib_vhost API for vhost devices, PMD API is not used
> for vhost devices.
>
> 2. vswitch_port: Any physical or virtio device that is added to vswitch. The
> port can have its own tx/rx functions for doing data transfer, which are exposed
> to the framework using generic function pointers (vs_port->do_tx/do_rx). This way
> the generic code can do tx/rx without understanding the type of device (Physical or
> virtio). Similarly each port has its own functions to select tx/rx queues. The framework
> provides default tx/rx queue selection functions to the port when port is added
> (for both physical and vhost devices). But the framework allows the switch implementation
> to override the queue selection functions (vs_port->get_txq/rxq) if required.
>
> 3. vswitch_ops: The ops is set of function pointers which are used to do operations
> like learning, unlearning, add/delete port, lookup_and_forward. The user of vswitch
> framework (vhost/main.[c,h])uses these function pointers to perform above mentioned
> operations, thus it remains agnostic of the underlying implementation.
>
> Different switching logics can implement their vswitch_device and vswitch_ops, and
> register with the framework. This framework makes vhost-switch application scalable
> in terms of:
>
> 1. Different switching logics (one of them is vmdq, vhost/vmdq.[c,h]
> 2. Number of ports.
> 3. Policies of selecting ports for rx and tx.
>
> Signed-off-by: Pankaj Chauhan <pankaj.chauhan@nxp.com>

After this patch set, how's mapping relationship between cores and 
vswitch_dev? Old vhost example does not have the concept of switch, so 
each core is binded with some VDEVs. Now, we still keep original logic?

(a) If yes, provided that phys device could has no direct relationship 
with vdevs in other switching logics, we may need to bind those physical 
devices to cores too? In that case, switch_worker() will receiving pkts 
from all devices (phys or virtual) and dispatch, which looks like:

switch_worker() {
     FOR each port binding to this core {
          rx(port, pkts[], count);
          vs_lookup_n_fwd( information_needed );
     }
}

(b) If no, we may bind each core with n switches? If so, switch_worker() 
also need to be reworked to keep receiving pkts from all ports of the 
switch, and dispatch.

switch_worker() {
     FOR each switch binding to this core {
          FOR each port of switch {
              rx(port, pkts[], count);
              vs_lookup_n_fwd( information_needed );
         }
     }
}

In all, (1) we'd better not use vdev to find phys dev in switch_worker 
any more; (2) we'd better not differentiate phys device and virtual 
device in generic framework (it's just an attribute of vswitch_port.

What do you think?

Thanks,
Jianfeng

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework
  2016-09-04 10:23 ` [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework Pankaj Chauhan
  2016-09-09  8:56   ` Tan, Jianfeng
@ 2016-09-11 12:21   ` Yuanhan Liu
  2016-09-12 10:59     ` Pankaj Chauhan
  2016-09-26  4:12   ` Yuanhan Liu
  2 siblings, 1 reply; 22+ messages in thread
From: Yuanhan Liu @ 2016-09-11 12:21 UTC (permalink / raw)
  To: Pankaj Chauhan
  Cc: dev, hemant.agrawal, jianfeng.tan, maxime.coquelin, Thomas Monjalon

On Mon, Sep 05, 2016 at 04:24:29PM +0530, Pankaj Chauhan wrote:
> Introduce support for a generic framework for handling of switching between
> physical and vhost devices. The vswitch framework introduces the following
> concept:
> 
> 1. vswitch_dev: Vswitch device is a logical switch which can have physical and
> virtio devices. The devices are operated/used using standard rte_eth API for
> physical devices and lib_vhost API for vhost devices, PMD API is not used
> for vhost devices.
> 
> 2. vswitch_port: Any physical or virtio device that is added to vswitch. The
> port can have its own tx/rx functions for doing data transfer, which are exposed
> to the framework using generic function pointers (vs_port->do_tx/do_rx). This way
> the generic code can do tx/rx without understanding the type of device (Physical or
> virtio). Similarly each port has its own functions to select tx/rx queues. The framework
> provides default tx/rx queue selection functions to the port when port is added
> (for both physical and vhost devices). But the framework allows the switch implementation
> to override the queue selection functions (vs_port->get_txq/rxq) if required.
> 
> 3. vswitch_ops: The ops is set of function pointers which are used to do operations
> like learning, unlearning, add/delete port, lookup_and_forward. The user of vswitch
> framework (vhost/main.[c,h])uses these function pointers to perform above mentioned
> operations, thus it remains agnostic of the underlying implementation.
> 
> Different switching logics can implement their vswitch_device and vswitch_ops, and
> register with the framework. This framework makes vhost-switch application scalable
> in terms of:
> 
> 1. Different switching logics (one of them is vmdq, vhost/vmdq.[c,h]
> 2. Number of ports.
> 3. Policies of selecting ports for rx and tx.
> 
> Signed-off-by: Pankaj Chauhan <pankaj.chauhan@nxp.com>

Hi,

FYI, my testrobot caught some errors when this patch is applied.
(And sorry for the late review; hopefully I can make it next week).

        --yliu

---
examples/vhost/main.c: In function 'virtio_xmit':
examples/vhost/main.c:815:41: error: 'struct vhost_dev' has no member named 'vs_port'
  struct vswitch_port *vs_port = dst_vdev->vs_port;
                                         ^
examples/vhost/main.c:822:15: error: dereferencing pointer to incomplete type 'struct vswitch_port'
  txq = vs_port->get_txq(vs_port, rte_lcore_id());
               ^
examples/vhost/main.c: In function 'do_drain_mbuf_table':
examples/vhost/main.c:952:12: error: implicit declaration of function 'vs_sched_tx_port' [-Werror=implicit-function-declaration]
  tx_port = vs_sched_tx_port(vswitch_dev_g, VSWITCH_PTYPE_PHYS,
            ^
examples/vhost/main.c:952:2: error: nested extern declaration of 'vs_sched_tx_port' [-Werror=nested-externs]
  tx_port = vs_sched_tx_port(vswitch_dev_g, VSWITCH_PTYPE_PHYS,
  ^
examples/vhost/main.c:952:29: error: 'vswitch_dev_g' undeclared (first use in this function)
  tx_port = vs_sched_tx_port(vswitch_dev_g, VSWITCH_PTYPE_PHYS,
                             ^
examples/vhost/main.c:952:29: note: each undeclared identifier is reported only once for each function it appears in
examples/vhost/main.c:952:44: error: 'VSWITCH_PTYPE_PHYS' undeclared (first use in this function)
  tx_port = vs_sched_tx_port(vswitch_dev_g, VSWITCH_PTYPE_PHYS,
                                            ^
examples/vhost/main.c:958:18: error: dereferencing pointer to incomplete type 'struct vswitch_port'
   count = tx_port->do_tx(tx_port, tx_q->txq_id, NULL, tx_q->m_table,
                  ^
examples/vhost/main.c:955:6: error: label 'out' used but not defined
      goto out;
      ^
examples/vhost/main.c: In function 'drain_eth_rx':
examples/vhost/main.c:1096:12: error: implicit declaration of function 'vs_sched_rx_port' [-Werror=implicit-function-declaration]
  rx_port = vs_sched_rx_port(vswitch_dev_g, VSWITCH_PTYPE_PHYS,
            ^
examples/vhost/main.c:1096:2: error: nested extern declaration of 'vs_sched_rx_port' [-Werror=nested-externs]
  rx_port = vs_sched_rx_port(vswitch_dev_g, VSWITCH_PTYPE_PHYS,
  ^
examples/vhost/main.c:1096:29: error: 'vswitch_dev_g' undeclared (first use in this function)
  rx_port = vs_sched_rx_port(vswitch_dev_g, VSWITCH_PTYPE_PHYS,
                             ^
examples/vhost/main.c:1096:44: error: 'VSWITCH_PTYPE_PHYS' undeclared (first use in this function)
  rx_port = vs_sched_rx_port(vswitch_dev_g, VSWITCH_PTYPE_PHYS,
                                            ^
examples/vhost/main.c:1105:15: error: dereferencing pointer to incomplete type 'struct vswitch_port'
  rxq = rx_port->get_rxq(rx_port, vdev, core_id);
               ^
examples/vhost/main.c:1099:6: error: label 'out' used but not defined
      goto out;
      ^
examples/vhost/main.c: In function 'drain_virtio_tx':
examples/vhost/main.c:1145:37: error: 'struct vhost_dev' has no member named 'vs_port'
  struct vswitch_port *vs_port = vdev->vs_port;
                                     ^
examples/vhost/main.c:1154:15: error: dereferencing pointer to incomplete type 'struct vswitch_port'
  rxq = vs_port->get_rxq(vs_port, NULL, core_id);
               ^
examples/vhost/main.c:1162:23: error: implicit declaration of function 'vs_learn_port' [-Werror=implicit-function-declaration]
   if (vdev->remove || vs_learn_port(vs_port, pkts, count))
                       ^
examples/vhost/main.c:1162:3: error: nested extern declaration of 'vs_learn_port' [-Werror=nested-externs]
   if (vdev->remove || vs_learn_port(vs_port, pkts, count))
   ^
examples/vhost/main.c:1166:2: error: implicit declaration of function 'vs_lookup_n_fwd' [-Werror=implicit-function-declaration]
  vs_lookup_n_fwd(vs_port, pkts, count, VIRTIO_RXQ);
  ^
examples/vhost/main.c:1166:2: error: nested extern declaration of 'vs_lookup_n_fwd' [-Werror=nested-externs]
examples/vhost/main.c: In function 'new_device':
examples/vhost/main.c:1310:19: error: implicit declaration of function 'vs_get_max_vdevs' [-Werror=implicit-function-declaration]
  device_num_min = vs_get_max_vdevs(vswitch_dev_g);
                   ^
examples/vhost/main.c:1310:2: error: nested extern declaration of 'vs_get_max_vdevs' [-Werror=nested-externs]
  device_num_min = vs_get_max_vdevs(vswitch_dev_g);
  ^
examples/vhost/main.c:1310:36: error: 'vswitch_dev_g' undeclared (first use in this function)
  device_num_min = vs_get_max_vdevs(vswitch_dev_g);
                                    ^
examples/vhost/main.c:1313:2: error: 'vs_port' undeclared (first use in this function)
  vs_port = vs_add_port(vswitch_dev_g, vid, VSWITCH_PTYPE_VIRTIO, vdev);
  ^
examples/vhost/main.c:1313:12: error: implicit declaration of function 'vs_add_port' [-Werror=implicit-function-declaration]
  vs_port = vs_add_port(vswitch_dev_g, vid, VSWITCH_PTYPE_VIRTIO, vdev);
            ^
examples/vhost/main.c:1313:2: error: nested extern declaration of 'vs_add_port' [-Werror=nested-externs]
  vs_port = vs_add_port(vswitch_dev_g, vid, VSWITCH_PTYPE_VIRTIO, vdev);
  ^
examples/vhost/main.c:1313:44: error: 'VSWITCH_PTYPE_VIRTIO' undeclared (first use in this function)
  vs_port = vs_add_port(vswitch_dev_g, vid, VSWITCH_PTYPE_VIRTIO, vdev);
                                            ^
examples/vhost/main.c:1319:6: error: 'struct vhost_dev' has no member named 'vs_port'
  vdev->vs_port = vs_port;
      ^
examples/vhost/main.c:1347:2: error: implicit declaration of function 'vs_port_start' [-Werror=implicit-function-declaration]
  vs_port_start(vs_port);
  ^
examples/vhost/main.c:1347:2: error: nested extern declaration of 'vs_port_start' [-Werror=nested-externs]
examples/vhost/main.c: In function 'get_vswitch_conf_flags':
examples/vhost/main.c:1479:20: error: 'VS_CNF_FLG_VM2VM_HARDWARE' undeclared (first use in this function)
   vs_conf_flags |= VS_CNF_FLG_VM2VM_HARDWARE;
                    ^
examples/vhost/main.c:1482:20: error: 'VS_CNF_FLG_VM2VM_SOFTWARE' undeclared (first use in this function)
   vs_conf_flags |= VS_CNF_FLG_VM2VM_SOFTWARE;
                    ^
examples/vhost/main.c:1485:20: error: 'VS_CNF_FLG_PROMISCOUS_EN' undeclared (first use in this function)
   vs_conf_flags |= VS_CNF_FLG_PROMISCOUS_EN;
                    ^
examples/vhost/main.c:1487:6: error: 'jumbo_frame_en' undeclared (first use in this function)
  if (jumbo_frame_en)
      ^
examples/vhost/main.c:1488:20: error: 'VS_CNF_FLG_JUMBO_EN' undeclared (first use in this function)
   vs_conf_flags |= VS_CNF_FLG_JUMBO_EN;
                    ^
examples/vhost/main.c:1491:20: error: 'VS_CNF_FLG_STATS_EN' undeclared (first use in this function)
   vs_conf_flags |= VS_CNF_FLG_STATS_EN;
                    ^
examples/vhost/main.c:1494:20: error: 'VS_CNF_FLG_VLAN_STRIP_EN' undeclared (first use in this function)
   vs_conf_flags |= VS_CNF_FLG_VLAN_STRIP_EN;
                    ^
examples/vhost/main.c: At top level:
examples/vhost/main.c:726:1: error: 'link_vmdq' defined but not used [-Werror=unused-function]
 link_vmdq(struct vhost_dev *vdev, struct rte_mbuf *m)
 ^
examples/vhost/main.c:1474:17: error: 'get_vswitch_conf_flags' defined but not used [-Werror=unused-function]
 static uint64_t get_vswitch_conf_flags(void)
                 ^
cc1: all warnings being treated as errors
make[1]: *** [main.o] Error 1
make: *** [all] Error 2
error: build examples/vhost failed
error: build failed

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework
  2016-09-09  8:56   ` Tan, Jianfeng
@ 2016-09-12 10:55     ` Pankaj Chauhan
  2016-09-13  6:51       ` Tan, Jianfeng
  0 siblings, 1 reply; 22+ messages in thread
From: Pankaj Chauhan @ 2016-09-12 10:55 UTC (permalink / raw)
  To: Tan, Jianfeng, dev; +Cc: hemant.agrawal, yuanhan.liu, maxime.coquelin

On 9/9/2016 2:26 PM, Tan, Jianfeng wrote:
> Hi Pankaj,
>
> Thanks for the massive and great work.

Hi Jianfeng,

Thanks for the review.
>
> On 9/5/2016 6:54 PM, Pankaj Chauhan wrote:
>> Introduce support for a generic framework for handling of switching
>> between
>> physical and vhost devices. The vswitch framework introduces the
>> following
>> concept:
>>
>> 1. vswitch_dev: Vswitch device is a logical switch which can have
>> physical and
>> virtio devices. The devices are operated/used using standard rte_eth
>> API for
>> physical devices and lib_vhost API for vhost devices, PMD API is not used
>> for vhost devices.
>>
>> 2. vswitch_port: Any physical or virtio device that is added to
>> vswitch. The
>> port can have its own tx/rx functions for doing data transfer, which
>> are exposed
>> to the framework using generic function pointers
>> (vs_port->do_tx/do_rx). This way
>> the generic code can do tx/rx without understanding the type of device
>> (Physical or
>> virtio). Similarly each port has its own functions to select tx/rx
>> queues. The framework
>> provides default tx/rx queue selection functions to the port when port
>> is added
>> (for both physical and vhost devices). But the framework allows the
>> switch implementation
>> to override the queue selection functions (vs_port->get_txq/rxq) if
>> required.
>>
>> 3. vswitch_ops: The ops is set of function pointers which are used to
>> do operations
>> like learning, unlearning, add/delete port, lookup_and_forward. The
>> user of vswitch
>> framework (vhost/main.[c,h])uses these function pointers to perform
>> above mentioned
>> operations, thus it remains agnostic of the underlying implementation.
>>
>> Different switching logics can implement their vswitch_device and
>> vswitch_ops, and
>> register with the framework. This framework makes vhost-switch
>> application scalable
>> in terms of:
>>
>> 1. Different switching logics (one of them is vmdq, vhost/vmdq.[c,h]
>> 2. Number of ports.
>> 3. Policies of selecting ports for rx and tx.
>>
>> Signed-off-by: Pankaj Chauhan <pankaj.chauhan@nxp.com>
>
> After this patch set, how's mapping relationship between cores and
> vswitch_dev? Old vhost example does not have the concept of switch, so
> each core is binded with some VDEVs. Now, we still keep original logic?
>
> (a) If yes, provided that phys device could has no direct relationship
> with vdevs in other switching logics, we may need to bind those physical
> devices to cores too? In that case, switch_worker() will receiving pkts
> from all devices (phys or virtual) and dispatch, which looks like:
>
> switch_worker() {
>     FOR each port binding to this core {
>          rx(port, pkts[], count);
>          vs_lookup_n_fwd( information_needed );
>     }
> }

Since we support only one switch device running at one time. We bind the 
ports of the switchdev to the core. But The switch might have it's own 
logic to bind the port to the core. For example
VMDQ only supports one Physical port, other switch can support more than 
one Physical port. In order to take care of that i have added two ops in 
swithdev_ops:

1. vs_sched_rx_port:

struct vswitch_port *vs_sched_rx_port(struct vswitch_dev *vs_dev, enum
                                       vswitch_port_type ptype, uint16_t 
core_id)

2. vs_sched_tx_port:

struct vswitch_port *vs_sched_tx_port(struct vswitch_dev *vs_dev, enum
                                       vswitch_port_type ptype, uint16_t
core_id)

The idea of providing these functions is that vhost/main requests the 
underlying switch implementation to schedule a port for rx or Tx, the 
current running core_id is also passed as parameter. So the switch can
take a decision which port to do rx or tx based on core id, and may be 
some other custom policy.

For example VMDQ always returns the one single Physical port in response 
to these functions called from any core. The other switch
can return the ports bound to the cores.

Similarly there are two port operations (vs_port->get_rxq and 
vs_port->get_txq), here also we pass core_id as parameter so that
the underlying switch implementation can schedule the rx or tx queue 
based on the core_id.

The above mentioned ops are used in drain_eth_rx() and 
do_drain_mbuf_table() (main.c) currently, and they leave binding of 
physical port and the queues to the underlying implementation. This
way we can accommodate VMDQ which uses only one physical port and 
rxqueues based on VMDQ, OR any other switch which uses multiple physical 
port and rx/tx queue scheduling based on some other logic.

Please suggest if this way of scheduling ports and tx/rx queues is fine 
or not?
>
> (b) If no, we may bind each core with n switches? If so, switch_worker()
> also need to be reworked to keep receiving pkts from all ports of the
> switch, and dispatch.
>
> switch_worker() {
>     FOR each switch binding to this core {
>          FOR each port of switch {
>              rx(port, pkts[], count);
>              vs_lookup_n_fwd( information_needed );
>         }
>     }
> }
Since we currently support only one switch_dev in a running instance of
vhost_switch() we can use binding of ports to core as you suggested in #(a).

>
> In all, (1) we'd better not use vdev to find phys dev in switch_worker
> any more;

I agree with you. I have tried to do following in drain_eth_rx ():


         core_id = rte_lcore_id();
         rx_port = vs_sched_rx_port(vswitch_dev_g, VSWITCH_PTYPE_PHYS,
                                     core_id);
         if (unlikely(!rx_port))
             goto out;

        rxq = rx_port->get_rxq(rx_port, vdev, core_id);
         rx_count = rx_port->do_rx(rx_port, rxq, NULL, pkts, 
MAX_PKT_BURST);

Here we don't assume any relation between vdev and the physical device 
or rxq.  But pass vdev to the underlying switch so that it can choose
the rxq for this vdev. In case of VMDQ it will return VMDQ queue number 
for this vdev. In case of any other switch implementation it can have 
their own logic to decide rxq.

(2) we'd better not differentiate phys device and virtual
> device in generic framework (it's just an attribute of vswitch_port.
>
> What do you think?

I agree with your thought that given the current API in this patchset we
should aim for making switch_worker agnostic of the port type. Ideally 
it should look something like this:

switch_worker() {

		rx_port mask = VSWITCH_PTYPE_PHYS | VSWITCH_PTYPE_PHYS;

		rx_port = vs_sched_rx_port(vswit_dev_g, rx_port_mask, core_id)
		rx_q = rx_port->get_rxq(vs_port, vdev, code_id);
		rx_port->do_rx(rx_port, rxq, NULL, pktss, MAX_PKT_BURST);

		vs_lookup_n_fwd(rx_port, pkts, rx_count, rxq);

}

Here i have two problems in achieving switch_worker as mentioned above:

1. while doing Rx from physical port, we need the associated vdev to 
know the VMDQ rx_q. That was the reason i kept current logic of keeping 
vdevs bound to the core in the switch_worker.  If we don't keep list of 
vdevs assigned to the core, how we' will get the rxq on phsyical device 
in case of VMDQ switch implementation.

2. drain_mbuf_table() currently assumes that there is only one physical 
device on which we have to transmit. We may have to rethink where to 
keep the tx queues (&lcore_tx_queue[core_id]), i case we have multiple 
physical ports we might have to move tx queues to each port.

Since i didn't have good solution currently for #1 and #2 mentioned 
above i kept assumption of one single physical port as it is and tried 
to integrate VMDQ in the framework with leaving these assumptions (thus
the main structure of switch_worker) as it it.

Please suggest if you have some solution to above mentioned points in 
your mind. Ideally i agree we should have switch_worker() as i mentioned 
in pseudo-code i gave above.
>
> Thanks,
> Jianfeng
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework
  2016-09-11 12:21   ` Yuanhan Liu
@ 2016-09-12 10:59     ` Pankaj Chauhan
  0 siblings, 0 replies; 22+ messages in thread
From: Pankaj Chauhan @ 2016-09-12 10:59 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: dev, hemant.agrawal, jianfeng.tan, maxime.coquelin, Thomas Monjalon

On 9/11/2016 5:51 PM, Yuanhan Liu wrote:
> On Mon, Sep 05, 2016 at 04:24:29PM +0530, Pankaj Chauhan wrote:
>> Introduce support for a generic framework for handling of switching between
>> physical and vhost devices. The vswitch framework introduces the following
>> concept:
>>
>> 1. vswitch_dev: Vswitch device is a logical switch which can have physical and
>> virtio devices. The devices are operated/used using standard rte_eth API for
>> physical devices and lib_vhost API for vhost devices, PMD API is not used
>> for vhost devices.
>>
>> 2. vswitch_port: Any physical or virtio device that is added to vswitch. The
>> port can have its own tx/rx functions for doing data transfer, which are exposed
>> to the framework using generic function pointers (vs_port->do_tx/do_rx). This way
>> the generic code can do tx/rx without understanding the type of device (Physical or
>> virtio). Similarly each port has its own functions to select tx/rx queues. The framework
>> provides default tx/rx queue selection functions to the port when port is added
>> (for both physical and vhost devices). But the framework allows the switch implementation
>> to override the queue selection functions (vs_port->get_txq/rxq) if required.
>>
>> 3. vswitch_ops: The ops is set of function pointers which are used to do operations
>> like learning, unlearning, add/delete port, lookup_and_forward. The user of vswitch
>> framework (vhost/main.[c,h])uses these function pointers to perform above mentioned
>> operations, thus it remains agnostic of the underlying implementation.
>>
>> Different switching logics can implement their vswitch_device and vswitch_ops, and
>> register with the framework. This framework makes vhost-switch application scalable
>> in terms of:
>>
>> 1. Different switching logics (one of them is vmdq, vhost/vmdq.[c,h]
>> 2. Number of ports.
>> 3. Policies of selecting ports for rx and tx.
>>
>> Signed-off-by: Pankaj Chauhan <pankaj.chauhan@nxp.com>
>
> Hi,
>
> FYI, my testrobot caught some errors when this patch is applied.
> (And sorry for the late review; hopefully I can make it next week).
>
>         --yliu

Hi YLiu,

Thanks for the review. I am sorry it was my mistake in this patchset, 
the header defining few structures and macros used in patch 0001 are 
defined in patch 0003. Because of this 0001 was not individually 
compilable but all three are compliable as set, it was mistake on my 
part , will fix it and will take care of it in further patchsets.

Do you want me to send another version for review, or i can fix it
in the next version i send after review of v2 is complete?

Thanks,
Pankaj
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework
  2016-09-12 10:55     ` Pankaj Chauhan
@ 2016-09-13  6:51       ` Tan, Jianfeng
  2016-09-15  9:00         ` Pankaj Chauhan
  2016-09-19 14:43         ` Yuanhan Liu
  0 siblings, 2 replies; 22+ messages in thread
From: Tan, Jianfeng @ 2016-09-13  6:51 UTC (permalink / raw)
  To: Pankaj Chauhan, dev; +Cc: hemant.agrawal, yuanhan.liu, maxime.coquelin

Hi Pankaj,


On 9/12/2016 6:55 PM, Pankaj Chauhan wrote:
> On 9/9/2016 2:26 PM, Tan, Jianfeng wrote:
>> Hi Pankaj,
>>
>> Thanks for the massive and great work.
>
> Hi Jianfeng,
>
> Thanks for the review.
>>
>> On 9/5/2016 6:54 PM, Pankaj Chauhan wrote:
>>> Introduce support for a generic framework for handling of switching
>>> between
>>> physical and vhost devices. The vswitch framework introduces the
>>> following
>>> concept:
>>>
>>> 1. vswitch_dev: Vswitch device is a logical switch which can have
>>> physical and
>>> virtio devices. The devices are operated/used using standard rte_eth
>>> API for
>>> physical devices and lib_vhost API for vhost devices, PMD API is not 
>>> used
>>> for vhost devices.
>>>
>>> 2. vswitch_port: Any physical or virtio device that is added to
>>> vswitch. The
>>> port can have its own tx/rx functions for doing data transfer, which
>>> are exposed
>>> to the framework using generic function pointers
>>> (vs_port->do_tx/do_rx). This way
>>> the generic code can do tx/rx without understanding the type of device
>>> (Physical or
>>> virtio). Similarly each port has its own functions to select tx/rx
>>> queues. The framework
>>> provides default tx/rx queue selection functions to the port when port
>>> is added
>>> (for both physical and vhost devices). But the framework allows the
>>> switch implementation
>>> to override the queue selection functions (vs_port->get_txq/rxq) if
>>> required.
>>>
>>> 3. vswitch_ops: The ops is set of function pointers which are used to
>>> do operations
>>> like learning, unlearning, add/delete port, lookup_and_forward. The
>>> user of vswitch
>>> framework (vhost/main.[c,h])uses these function pointers to perform
>>> above mentioned
>>> operations, thus it remains agnostic of the underlying implementation.
>>>
>>> Different switching logics can implement their vswitch_device and
>>> vswitch_ops, and
>>> register with the framework. This framework makes vhost-switch
>>> application scalable
>>> in terms of:
>>>
>>> 1. Different switching logics (one of them is vmdq, vhost/vmdq.[c,h]
>>> 2. Number of ports.
>>> 3. Policies of selecting ports for rx and tx.
>>>
>>> Signed-off-by: Pankaj Chauhan <pankaj.chauhan@nxp.com>
>>
>> After this patch set, how's mapping relationship between cores and
>> vswitch_dev? Old vhost example does not have the concept of switch, so
>> each core is binded with some VDEVs. Now, we still keep original logic?
>>
>> (a) If yes, provided that phys device could has no direct relationship
>> with vdevs in other switching logics, we may need to bind those physical
>> devices to cores too? In that case, switch_worker() will receiving pkts
>> from all devices (phys or virtual) and dispatch, which looks like:
>>
>> switch_worker() {
>>     FOR each port binding to this core {
>>          rx(port, pkts[], count);
>>          vs_lookup_n_fwd( information_needed );
>>     }
>> }
>
> Since we support only one switch device running at one time. We bind 
> the ports of the switchdev to the core. But The switch might have it's 
> own logic to bind the port to the core. For example
> VMDQ only supports one Physical port, other switch can support more 
> than one Physical port. In order to take care of that i have added two 
> ops in swithdev_ops:
>
> 1. vs_sched_rx_port:
>
> struct vswitch_port *vs_sched_rx_port(struct vswitch_dev *vs_dev, enum
>                                       vswitch_port_type ptype, 
> uint16_t core_id)
>
> 2. vs_sched_tx_port:
>
> struct vswitch_port *vs_sched_tx_port(struct vswitch_dev *vs_dev, enum
>                                       vswitch_port_type ptype, uint16_t
> core_id)
>
> The idea of providing these functions is that vhost/main requests the 
> underlying switch implementation to schedule a port for rx or Tx, the 
> current running core_id is also passed as parameter. So the switch can
> take a decision which port to do rx or tx based on core id, and may be 
> some other custom policy.
>
> For example VMDQ always returns the one single Physical port in 
> response to these functions called from any core. The other switch
> can return the ports bound to the cores.
>
> Similarly there are two port operations (vs_port->get_rxq and 
> vs_port->get_txq), here also we pass core_id as parameter so that
> the underlying switch implementation can schedule the rx or tx queue 
> based on the core_id.
>
> The above mentioned ops are used in drain_eth_rx() and 
> do_drain_mbuf_table() (main.c) currently, and they leave binding of 
> physical port and the queues to the underlying implementation. This
> way we can accommodate VMDQ which uses only one physical port and 
> rxqueues based on VMDQ, OR any other switch which uses multiple 
> physical port and rx/tx queue scheduling based on some other logic.
>
> Please suggest if this way of scheduling ports and tx/rx queues is 
> fine or not?

According to above explanation, in VMDQ switch, we cannot schedule two 
queues (belongs to the same port) on the same core, right?


>>
>> (b) If no, we may bind each core with n switches? If so, switch_worker()
>> also need to be reworked to keep receiving pkts from all ports of the
>> switch, and dispatch.
>>
>> switch_worker() {
>>     FOR each switch binding to this core {
>>          FOR each port of switch {
>>              rx(port, pkts[], count);
>>              vs_lookup_n_fwd( information_needed );
>>         }
>>     }
>> }
> Since we currently support only one switch_dev in a running instance of
> vhost_switch() we can use binding of ports to core as you suggested in 
> #(a).

OK.

>
>>
>> In all, (1) we'd better not use vdev to find phys dev in switch_worker
>> any more;
>
> I agree with you. I have tried to do following in drain_eth_rx ():
>
>
>         core_id = rte_lcore_id();
>         rx_port = vs_sched_rx_port(vswitch_dev_g, VSWITCH_PTYPE_PHYS,
>                                     core_id);
>         if (unlikely(!rx_port))
>             goto out;
>
>        rxq = rx_port->get_rxq(rx_port, vdev, core_id);
>         rx_count = rx_port->do_rx(rx_port, rxq, NULL, pkts, 
> MAX_PKT_BURST);
>
> Here we don't assume any relation between vdev and the physical device 
> or rxq.  But pass vdev to the underlying switch so that it can choose
> the rxq for this vdev. In case of VMDQ it will return VMDQ queue 
> number for this vdev. In case of any other switch implementation it 
> can have their own logic to decide rxq.

Thanks for explanation.

>
> (2) we'd better not differentiate phys device and virtual
>> device in generic framework (it's just an attribute of vswitch_port.
>>
>> What do you think?
>
> I agree with your thought that given the current API in this patchset we
> should aim for making switch_worker agnostic of the port type. Ideally 
> it should look something like this:
>
> switch_worker() {
>
>         rx_port mask = VSWITCH_PTYPE_PHYS | VSWITCH_PTYPE_PHYS;
>
>         rx_port = vs_sched_rx_port(vswit_dev_g, rx_port_mask, core_id)
>         rx_q = rx_port->get_rxq(vs_port, vdev, code_id);
>         rx_port->do_rx(rx_port, rxq, NULL, pktss, MAX_PKT_BURST);

Can we hide queues inside struct vswitch_port? I mean:
For VMDQ switch, treat (port_id, queue_id) as a vswitch_port, so far 
you've already stored "struct vhost_dev *" into vswitch_port.priv when 
it's a virtual port, how about store queue_id into vswitch_port.priv 
when it's a physical port.
For arp_learning switch, make (port_id, all_enabled_queues) as a 
vswitch_port.
Summarize above two: we treat (port_id, all_enabled_queues[]) as a 
vswitch_port.

How about it?

>
>         vs_lookup_n_fwd(rx_port, pkts, rx_count, rxq);
>
> }
>
> Here i have two problems in achieving switch_worker as mentioned above:
>
> 1. while doing Rx from physical port, we need the associated vdev to 
> know the VMDQ rx_q. That was the reason i kept current logic of 
> keeping vdevs bound to the core in the switch_worker.  If we don't 
> keep list of vdevs assigned to the core, how we' will get the rxq on 
> phsyical device in case of VMDQ switch implementation.

I think my proposal above can solve this problem.

>
> 2. drain_mbuf_table() currently assumes that there is only one 
> physical device on which we have to transmit. We may have to rethink 
> where to keep the tx queues (&lcore_tx_queue[core_id]), i case we have 
> multiple physical ports we might have to move tx queues to each port.

This function, drain_mbuf_table(), and its related data structures, are 
used for cache on tx direction. But it's not only useful for physical 
port, but also virtual port (vhost_dev). I suggest to make such cache a 
field of per vswitch_port. And each again, all vswitch_ports are 
connected to a core, each core will periodically clean those vswitch_ports.

Thanks,
Jianfeng

>
> Since i didn't have good solution currently for #1 and #2 mentioned 
> above i kept assumption of one single physical port as it is and tried 
> to integrate VMDQ in the framework with leaving these assumptions (thus
> the main structure of switch_worker) as it it.
>
> Please suggest if you have some solution to above mentioned points in 
> your mind. Ideally i agree we should have switch_worker() as i 
> mentioned in pseudo-code i gave above.
>>
>> Thanks,
>> Jianfeng
>>
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH V2 2/3] examples/vhost: Add vswitch command line options
  2016-09-04 10:23 ` [RFC][PATCH V2 2/3] examples/vhost: Add vswitch command line options Pankaj Chauhan
@ 2016-09-13 12:14   ` Yuanhan Liu
  2016-09-15  9:11     ` Pankaj Chauhan
  0 siblings, 1 reply; 22+ messages in thread
From: Yuanhan Liu @ 2016-09-13 12:14 UTC (permalink / raw)
  To: Pankaj Chauhan; +Cc: dev, hemant.agrawal, jianfeng.tan, maxime.coquelin

On Mon, Sep 05, 2016 at 04:24:30PM +0530, Pankaj Chauhan wrote:
> Add command line options for selecting switch implementation
> and maximum ports for the vswitch.following are two new command
> line options:
> 
> --switch <switch_name> [char string, Selects the switch imlementation]
> --max-ports<port_count> [int, selects maximum number of ports to support]
> 
> For example:
> 
> $ ./vhost-switch -c 3 -n 2 --socket-mem 1024 --huge-dir /hugetlbfs -- -p
> 0x1 --dev-basename sock1

That means you were basing on the master branch. You should base on
next-virtio instead: http://dpdk.org/browse/next/dpdk-next-virtio/

> --switch "vmdq" --max-ports 3

Normally, we should keep the old behaviour first. Say, making the vmdq
as the default switch mode.

However, actually, I don't quite like to make the vhost-switch to bind
to a hardare feature that tightly, that you may want to make a standalone
patch as the last one in this patchset to make the "switch" mode be the
default one.

> 
> Signed-off-by: Pankaj Chauhan <pankaj.chauhan@nxp.com>
> ---
>  examples/vhost/main.c | 43 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 43 insertions(+)
> 
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> index c949df4..a4e51ae 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -142,6 +142,10 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
>  /* Character device basename. Can be set by user. */
>  static char dev_basename[MAX_BASENAME_SZ] = "vhost-net";
>  
> +/* vswitch device name and maximum number of ports */
> +static char switch_dev[MAX_BASENAME_SZ] = "vmdq";

First of all, limiting it with MAX_BASENAME_SZ makes no sense here.

Secondly, you don't have to represent the switch mode with string,
you could simply use some numeric macros, or enum.

Besides, I just had a quick glimplse of your patches (still couldn't
make a detailed look so far), and I'd ask you to do few more things
for v3:

- I'm hoping you could split this patchset further, say **maybe**
  one patch to introduce vswitch_port, one to introduce vswitch_ops
  and another one to introduce vswitch_dev. This helps review.

- make sure each commit is buldable.


And few more generic comments to the whole set:

- use "__rte_unused" but not "__attribute__((unused))"
  And we normally use it after but not before the key word.

- follow the DPDK prefered way to define a function, the return type
  and function name takes two lines.

- run scripts/{checkpatches.sh,check-git-log.sh} and fix real warnings
  if any before sending them out.

  This would at least help you catch "line over 80 chars" issue. 

Thanks.
	--yliu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework
  2016-09-13  6:51       ` Tan, Jianfeng
@ 2016-09-15  9:00         ` Pankaj Chauhan
  2016-09-19 12:42           ` Tan, Jianfeng
  2016-09-19 14:43         ` Yuanhan Liu
  1 sibling, 1 reply; 22+ messages in thread
From: Pankaj Chauhan @ 2016-09-15  9:00 UTC (permalink / raw)
  To: Tan, Jianfeng, dev; +Cc: hemant.agrawal, yuanhan.liu, maxime.coquelin

On 9/13/2016 12:21 PM, Tan, Jianfeng wrote:
Hi Jianfeng,

>>>
>>> On 9/5/2016 6:54 PM, Pankaj Chauhan wrote:
>>>> Introduce support for a generic framework for handling of switching
>>>> between
>>>> physical and vhost devices. The vswitch framework introduces the
>>>> following
>>>> concept:
>>>>
>>>> 1. vswitch_dev: Vswitch device is a logical switch which can have
>>>> physical and
>>>> virtio devices. The devices are operated/used using standard rte_eth
>>>> API for
>>>> physical devices and lib_vhost API for vhost devices, PMD API is not
>>>> used
>>>> for vhost devices.
>>>>
>>>> 2. vswitch_port: Any physical or virtio device that is added to
>>>> vswitch. The
>>>> port can have its own tx/rx functions for doing data transfer, which
>>>> are exposed
>>>> to the framework using generic function pointers
>>>> (vs_port->do_tx/do_rx). This way
>>>> the generic code can do tx/rx without understanding the type of device
>>>> (Physical or
>>>> virtio). Similarly each port has its own functions to select tx/rx
>>>> queues. The framework
>>>> provides default tx/rx queue selection functions to the port when port
>>>> is added
>>>> (for both physical and vhost devices). But the framework allows the
>>>> switch implementation
>>>> to override the queue selection functions (vs_port->get_txq/rxq) if
>>>> required.
>>>>
>>>> 3. vswitch_ops: The ops is set of function pointers which are used to
>>>> do operations
>>>> like learning, unlearning, add/delete port, lookup_and_forward. The
>>>> user of vswitch
>>>> framework (vhost/main.[c,h])uses these function pointers to perform
>>>> above mentioned
>>>> operations, thus it remains agnostic of the underlying implementation.
>>>>
>>>> Different switching logics can implement their vswitch_device and
>>>> vswitch_ops, and
>>>> register with the framework. This framework makes vhost-switch
>>>> application scalable
>>>> in terms of:
>>>>
>>>> 1. Different switching logics (one of them is vmdq, vhost/vmdq.[c,h]
>>>> 2. Number of ports.
>>>> 3. Policies of selecting ports for rx and tx.
>>>>
>>>> Signed-off-by: Pankaj Chauhan <pankaj.chauhan@nxp.com>
>>>
>>> After this patch set, how's mapping relationship between cores and
>>> vswitch_dev? Old vhost example does not have the concept of switch, so
>>> each core is binded with some VDEVs. Now, we still keep original logic?
>>>
>>> (a) If yes, provided that phys device could has no direct relationship
>>> with vdevs in other switching logics, we may need to bind those physical
>>> devices to cores too? In that case, switch_worker() will receiving pkts
>>> from all devices (phys or virtual) and dispatch, which looks like:
>>>
>>> switch_worker() {
>>>     FOR each port binding to this core {
>>>          rx(port, pkts[], count);
>>>          vs_lookup_n_fwd( information_needed );
>>>     }
>>> }
>>
>> Since we support only one switch device running at one time. We bind
>> the ports of the switchdev to the core. But The switch might have it's
>> own logic to bind the port to the core. For example
>> VMDQ only supports one Physical port, other switch can support more
>> than one Physical port. In order to take care of that i have added two
>> ops in swithdev_ops:
>>
>> 1. vs_sched_rx_port:
>>
>> struct vswitch_port *vs_sched_rx_port(struct vswitch_dev *vs_dev, enum
>>                                       vswitch_port_type ptype,
>> uint16_t core_id)
>>
>> 2. vs_sched_tx_port:
>>
>> struct vswitch_port *vs_sched_tx_port(struct vswitch_dev *vs_dev, enum
>>                                       vswitch_port_type ptype, uint16_t
>> core_id)
>>
>> The idea of providing these functions is that vhost/main requests the
>> underlying switch implementation to schedule a port for rx or Tx, the
>> current running core_id is also passed as parameter. So the switch can
>> take a decision which port to do rx or tx based on core id, and may be
>> some other custom policy.
>>
>> For example VMDQ always returns the one single Physical port in
>> response to these functions called from any core. The other switch
>> can return the ports bound to the cores.
>>
>> Similarly there are two port operations (vs_port->get_rxq and
>> vs_port->get_txq), here also we pass core_id as parameter so that
>> the underlying switch implementation can schedule the rx or tx queue
>> based on the core_id.
>>
>> The above mentioned ops are used in drain_eth_rx() and
>> do_drain_mbuf_table() (main.c) currently, and they leave binding of
>> physical port and the queues to the underlying implementation. This
>> way we can accommodate VMDQ which uses only one physical port and
>> rxqueues based on VMDQ, OR any other switch which uses multiple
>> physical port and rx/tx queue scheduling based on some other logic.
>>
>> Please suggest if this way of scheduling ports and tx/rx queues is
>> fine or not?
>
> According to above explanation, in VMDQ switch, we cannot schedule two
> queues (belongs to the same port) on the same core, right?
>

Yes. This would be a limitation i believe which will not be acceptable.
The method that you suggested further in the email (tread port + 
queue_id as a vs_port) can solve this.

>
>>>
>>> (b) If no, we may bind each core with n switches? If so, switch_worker()
>>> also need to be reworked to keep receiving pkts from all ports of the
>>> switch, and dispatch.
>>>
>>> switch_worker() {
>>>     FOR each switch binding to this core {
>>>          FOR each port of switch {
>>>              rx(port, pkts[], count);
>>>              vs_lookup_n_fwd( information_needed );
>>>         }
>>>     }
>>> }
>> Since we currently support only one switch_dev in a running instance of
>> vhost_switch() we can use binding of ports to core as you suggested in
>> #(a).
>
> OK.
>
>>
>>>
>>> In all, (1) we'd better not use vdev to find phys dev in switch_worker
>>> any more;
>>
>> I agree with you. I have tried to do following in drain_eth_rx ():
>>
>>
>>         core_id = rte_lcore_id();
>>         rx_port = vs_sched_rx_port(vswitch_dev_g, VSWITCH_PTYPE_PHYS,
>>                                     core_id);
>>         if (unlikely(!rx_port))
>>             goto out;
>>
>>        rxq = rx_port->get_rxq(rx_port, vdev, core_id);
>>         rx_count = rx_port->do_rx(rx_port, rxq, NULL, pkts,
>> MAX_PKT_BURST);
>>
>> Here we don't assume any relation between vdev and the physical device
>> or rxq.  But pass vdev to the underlying switch so that it can choose
>> the rxq for this vdev. In case of VMDQ it will return VMDQ queue
>> number for this vdev. In case of any other switch implementation it
>> can have their own logic to decide rxq.
>
> Thanks for explanation.
>
>>
>> (2) we'd better not differentiate phys device and virtual
>>> device in generic framework (it's just an attribute of vswitch_port.
>>>
>>> What do you think?
>>
>> I agree with your thought that given the current API in this patchset we
>> should aim for making switch_worker agnostic of the port type. Ideally
>> it should look something like this:
>>
>> switch_worker() {
>>
>>         rx_port mask = VSWITCH_PTYPE_PHYS | VSWITCH_PTYPE_PHYS;
>>
>>         rx_port = vs_sched_rx_port(vswit_dev_g, rx_port_mask, core_id)
>>         rx_q = rx_port->get_rxq(vs_port, vdev, code_id);
>>         rx_port->do_rx(rx_port, rxq, NULL, pktss, MAX_PKT_BURST);
>
> Can we hide queues inside struct vswitch_port? I mean:
> For VMDQ switch, treat (port_id, queue_id) as a vswitch_port, so far
> you've already stored "struct vhost_dev *" into vswitch_port.priv when
> it's a virtual port, how about store queue_id into vswitch_port.priv
> when it's a physical port.
> For arp_learning switch, make (port_id, all_enabled_queues) as a
> vswitch_port.
> Summarize above two: we treat (port_id, all_enabled_queues[]) as a
> vswitch_port.
>
> How about it?

Thanks for wonderful suggestion! Yes it should be possible, instead of 
having one vs_port for physical device we could create one vs_port for 
each phys device + queue_id combination.

Then we can bind the vs_ports to the cores, and do away with binding 
vdevs to the cores.

In order to add extra vs_ports (port + queue_id) there are two methods:

1. vhost/main.c (or vswitch_common.c) queries the underlying switch 
about how many rx queues to support thourgh a new accessor. VMDQ will 
return the number of vmdq queues, in this case. After getting the number 
of rx queues, vhost/main.c creates the extra ports i.e one vs_port for 
each physical port and queue id combination.

2. Second method is that the switch implementation for example vmdq.c 
create the extra ports (port + queue_id) as part of vs_port->add_port 
operation for the physical port.

Which method do you think we should follow? I think #1 should be done so 
that same logic can be used for other switches also.
>
>>
>>         vs_lookup_n_fwd(rx_port, pkts, rx_count, rxq);
>>
>> }
>>
>> Here i have two problems in achieving switch_worker as mentioned above:
>>
>> 1. while doing Rx from physical port, we need the associated vdev to
>> know the VMDQ rx_q. That was the reason i kept current logic of
>> keeping vdevs bound to the core in the switch_worker.  If we don't
>> keep list of vdevs assigned to the core, how we' will get the rxq on
>> phsyical device in case of VMDQ switch implementation.
>
> I think my proposal above can solve this problem.

Yes thanks!
>
>>
>> 2. drain_mbuf_table() currently assumes that there is only one
>> physical device on which we have to transmit. We may have to rethink
>> where to keep the tx queues (&lcore_tx_queue[core_id]), i case we have
>> multiple physical ports we might have to move tx queues to each port.
>
> This function, drain_mbuf_table(), and its related data structures, are
> used for cache on tx direction. But it's not only useful for physical
> port, but also virtual port (vhost_dev). I suggest to make such cache a
> field of per vswitch_port. And each again, all vswitch_ports are
> connected to a core, each core will periodically clean those vswitch_ports.
>

okay. So we add similar tx_q structures (one per core) to each 
vswitch_port, correct?

Thanks,
Pankaj

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH V2 2/3] examples/vhost: Add vswitch command line options
  2016-09-13 12:14   ` Yuanhan Liu
@ 2016-09-15  9:11     ` Pankaj Chauhan
  0 siblings, 0 replies; 22+ messages in thread
From: Pankaj Chauhan @ 2016-09-15  9:11 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, hemant.agrawal, jianfeng.tan, maxime.coquelin

On 9/13/2016 5:44 PM, Yuanhan Liu wrote:
> On Mon, Sep 05, 2016 at 04:24:30PM +0530, Pankaj Chauhan wrote:
>> Add command line options for selecting switch implementation
>> and maximum ports for the vswitch.following are two new command
>> line options:
>>
>> --switch <switch_name> [char string, Selects the switch imlementation]
>> --max-ports<port_count> [int, selects maximum number of ports to support]
>>
>> For example:
>>
>> $ ./vhost-switch -c 3 -n 2 --socket-mem 1024 --huge-dir /hugetlbfs -- -p
>> 0x1 --dev-basename sock1
>
> That means you were basing on the master branch. You should base on
> next-virtio instead: http://dpdk.org/browse/next/dpdk-next-virtio/

Yes i were basing it on master, i will base the next version on 
dpdk-next-virtio. Sorry for this, i am bit new to dpdk upstream 
development .
>
>> --switch "vmdq" --max-ports 3
>
> Normally, we should keep the old behaviour first. Say, making the vmdq
> as the default switch mode.

Yes if we don't give the '--switch' option then default is vmdq.
>
> However, actually, I don't quite like to make the vhost-switch to bind
> to a hardare feature that tightly, that you may want to make a standalone
> patch as the last one in this patchset to make the "switch" mode be the
> default one.

Sure i will split the patch to take care of this.
>
>>
>> Signed-off-by: Pankaj Chauhan <pankaj.chauhan@nxp.com>
>> ---
>>  examples/vhost/main.c | 43 +++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 43 insertions(+)
>>
>> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
>> index c949df4..a4e51ae 100644
>> --- a/examples/vhost/main.c
>> +++ b/examples/vhost/main.c
>> @@ -142,6 +142,10 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
>>  /* Character device basename. Can be set by user. */
>>  static char dev_basename[MAX_BASENAME_SZ] = "vhost-net";
>>
>> +/* vswitch device name and maximum number of ports */
>> +static char switch_dev[MAX_BASENAME_SZ] = "vmdq";
>
> First of all, limiting it with MAX_BASENAME_SZ makes no sense here.
>

I will fix it, define a different constant for switch_dev name.

> Secondly, you don't have to represent the switch mode with string,
> you could simply use some numeric macros, or enum.
>

I used the string because the registration function (vs_register_switch) 
uses the switch name as string to register the switches and find the 
matching switch_dev given in command line. If we use macro or enum then 
we will have to add the MACRO for the switch id in common code every 
time a new switch implementation is added. Keeping a string as name, 
saves us from touching the common code for adding a new switch 
implementation.

> Besides, I just had a quick glimplse of your patches (still couldn't
> make a detailed look so far), and I'd ask you to do few more things
> for v3:
>
> - I'm hoping you could split this patchset further, say **maybe**
>   one patch to introduce vswitch_port, one to introduce vswitch_ops
>   and another one to introduce vswitch_dev. This helps review.
>

Do you want me to split the vswitch_ops, vswitch_dev and vswitch_port 
introduction (mainly vswitch_common.h) in three patches with the patch 
body explaining them in bit detail, correct ?

Their usage i.e vswitch_common.c can be kept in separate patch (4th 
one), is this fine? otherwise it fill be tricky to make vswitch_common.c 
to compilable without complete definition of vswitch_port, vswitch_dev 
and vswitch_ops

> - make sure each commit is buldable.
>
I will take care of this in v3
>
> And few more generic comments to the whole set:
>
> - use "__rte_unused" but not "__attribute__((unused))"
>   And we normally use it after but not before the key word.

I will take care of this in v3
>
> - follow the DPDK prefered way to define a function, the return type
>   and function name takes two lines.
I will take care of this in v3

>
> - run scripts/{checkpatches.sh,check-git-log.sh} and fix real warnings
>   if any before sending them out.
>
>   This would at least help you catch "line over 80 chars" issue.

Sure, will take care of this in v3 and onwards.

Thanks,
Pankaj
>
> Thanks.
> 	--yliu
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework
  2016-09-15  9:00         ` Pankaj Chauhan
@ 2016-09-19 12:42           ` Tan, Jianfeng
  2016-09-27 11:26             ` Pankaj Chauhan
  0 siblings, 1 reply; 22+ messages in thread
From: Tan, Jianfeng @ 2016-09-19 12:42 UTC (permalink / raw)
  To: Pankaj Chauhan, dev; +Cc: hemant.agrawal, yuanhan.liu, maxime.coquelin

Hi Pankaj,

>> Can we hide queues inside struct vswitch_port? I mean:
>> For VMDQ switch, treat (port_id, queue_id) as a vswitch_port, so far
>> you've already stored "struct vhost_dev *" into vswitch_port.priv when
>> it's a virtual port, how about store queue_id into vswitch_port.priv
>> when it's a physical port.
>> For arp_learning switch, make (port_id, all_enabled_queues) as a
>> vswitch_port.
>> Summarize above two: we treat (port_id, all_enabled_queues[]) as a
>> vswitch_port.
>>
>> How about it?
>
> Thanks for wonderful suggestion! Yes it should be possible, instead of 
> having one vs_port for physical device we could create one vs_port for 
> each phys device + queue_id combination.
>
> Then we can bind the vs_ports to the cores, and do away with binding 
> vdevs to the cores.
>
> In order to add extra vs_ports (port + queue_id) there are two methods:
>
> 1. vhost/main.c (or vswitch_common.c) queries the underlying switch 
> about how many rx queues to support thourgh a new accessor. VMDQ will 
> return the number of vmdq queues, in this case. After getting the 
> number of rx queues, vhost/main.c creates the extra ports i.e one 
> vs_port for each physical port and queue id combination.
>
> 2. Second method is that the switch implementation for example vmdq.c 
> create the extra ports (port + queue_id) as part of vs_port->add_port 
> operation for the physical port.
>
> Which method do you think we should follow? I think #1 should be done 
> so that same logic can be used for other switches also.

Although VMDQs are created at initialization, we will not connect all of 
them to switch until a virtio device is added. So how about creating 
extra vs_ports (port + queue_id) as part of vmdq_add_port_virtio()?

Another thing, why we need the accessor, port_start()? For physical 
ports, it's initialized and started at port_init(); for virtual ports, 
no need to get started, right?

Thanks,
Jianfeng

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework
  2016-09-13  6:51       ` Tan, Jianfeng
  2016-09-15  9:00         ` Pankaj Chauhan
@ 2016-09-19 14:43         ` Yuanhan Liu
  2016-09-20  8:58           ` Pankaj Chauhan
  1 sibling, 1 reply; 22+ messages in thread
From: Yuanhan Liu @ 2016-09-19 14:43 UTC (permalink / raw)
  To: Tan, Jianfeng; +Cc: Pankaj Chauhan, dev, hemant.agrawal, maxime.coquelin

Firstly, sorry for being late on this discussion: I just got a chance
to follow what you guys were talking about.

On Tue, Sep 13, 2016 at 02:51:31PM +0800, Tan, Jianfeng wrote:
> >(2) we'd better not differentiate phys device and virtual

Agreed.

> >>device in generic framework (it's just an attribute of vswitch_port.
> >>
> >>What do you think?
> >
> >I agree with your thought that given the current API in this patchset we
> >should aim for making switch_worker agnostic of the port type. Ideally it
> >should look something like this:
> >
> >switch_worker() {
> >
> >        rx_port mask = VSWITCH_PTYPE_PHYS | VSWITCH_PTYPE_PHYS;
> >
> >        rx_port = vs_sched_rx_port(vswit_dev_g, rx_port_mask, core_id)
> >        rx_q = rx_port->get_rxq(vs_port, vdev, code_id);
> >        rx_port->do_rx(rx_port, rxq, NULL, pktss, MAX_PKT_BURST);
> 
> Can we hide queues inside struct vswitch_port? I mean:
> For VMDQ switch, treat (port_id, queue_id) as a vswitch_port, so far you've
> already stored "struct vhost_dev *" into vswitch_port.priv when it's a
> virtual port, how about store queue_id into vswitch_port.priv when it's a
> physical port.

Well, note that vhost-user also supports multiple queue; it's just
haven't been enabled yet. So, storing "vdev" for virtio port and
"queue_id" for phys port doesn't make too much sense.

> For arp_learning switch, make (port_id, all_enabled_queues) as a
> vswitch_port.
> Summarize above two: we treat (port_id, all_enabled_queues[]) as a
> vswitch_port.
> 
> How about it?

Sorry, I don't quite like the idea. It's weird to use "vswitch_port + queue_id"
combination to represent a port. A vswitch_port should be just a port: let's
keep the logic that simple.

	--yliu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework
  2016-09-19 14:43         ` Yuanhan Liu
@ 2016-09-20  8:58           ` Pankaj Chauhan
  2016-09-26  3:56             ` Yuanhan Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Pankaj Chauhan @ 2016-09-20  8:58 UTC (permalink / raw)
  To: Yuanhan Liu, Tan, Jianfeng; +Cc: dev, hemant.agrawal, maxime.coquelin

On 9/19/2016 8:13 PM, Yuanhan Liu wrote:
> Firstly, sorry for being late on this discussion: I just got a chance
> to follow what you guys were talking about.
>
> On Tue, Sep 13, 2016 at 02:51:31PM +0800, Tan, Jianfeng wrote:
>>> (2) we'd better not differentiate phys device and virtual
>
> Agreed.
>
>>>> device in generic framework (it's just an attribute of vswitch_port.
>>>>
>>>> What do you think?
>>>
>>> I agree with your thought that given the current API in this patchset we
>>> should aim for making switch_worker agnostic of the port type. Ideally it
>>> should look something like this:
>>>
>>> switch_worker() {
>>>
>>>        rx_port mask = VSWITCH_PTYPE_PHYS | VSWITCH_PTYPE_PHYS;
>>>
>>>        rx_port = vs_sched_rx_port(vswit_dev_g, rx_port_mask, core_id)
>>>        rx_q = rx_port->get_rxq(vs_port, vdev, code_id);
>>>        rx_port->do_rx(rx_port, rxq, NULL, pktss, MAX_PKT_BURST);
>>
>> Can we hide queues inside struct vswitch_port? I mean:
>> For VMDQ switch, treat (port_id, queue_id) as a vswitch_port, so far you've
>> already stored "struct vhost_dev *" into vswitch_port.priv when it's a
>> virtual port, how about store queue_id into vswitch_port.priv when it's a
>> physical port.
>
> Well, note that vhost-user also supports multiple queue; it's just
> haven't been enabled yet. So, storing "vdev" for virtio port and
> "queue_id" for phys port doesn't make too much sense.
>
>> For arp_learning switch, make (port_id, all_enabled_queues) as a
>> vswitch_port.
>> Summarize above two: we treat (port_id, all_enabled_queues[]) as a
>> vswitch_port.
>>
>> How about it?
>
> Sorry, I don't quite like the idea. It's weird to use "vswitch_port + queue_id"
> combination to represent a port. A vswitch_port should be just a port: let's
> keep the logic that simple.
>

We wanted to take that approach to make vhost/main.c agnostic port type 
and have common code for rx/tx processing. The current version of 
patchset (v2) takes care of multiqueue, as it calls 
vs_port->get_txq/get_rxq to get the queue on which rx/tx has to be 
performed. This way the underlying switch can decide the queue based on 
core_id and vs_port.

But in the v2 patchset we still bind vhost_dev to the cores, and pass it 
to vs_port->get_rxq() to get the rx_queue corresponding to vhost_dev. 
Jianfeng had suggested to remove vhost_dev to core binding, and bind 
vs_port to the cores. Creating one vswitch_port for a physical port + 
queue_id was a step in that direction, thus creating very generic code 
in vhost/main.c.

YLiu/Jianfeng,

Please suggest what approach we should take here? Should we keep the 
logic of binding vhost_dev to core (as in V2 patchset), thus leaving 
some intelligence about vhost_dev in vhost/main.c.

Or What other options do you suggest if we want to achieve port type 
agnostic vhost/main.c

Thanks,
Pankaj

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework
  2016-09-20  8:58           ` Pankaj Chauhan
@ 2016-09-26  3:56             ` Yuanhan Liu
  2016-09-27 11:35               ` Pankaj Chauhan
  0 siblings, 1 reply; 22+ messages in thread
From: Yuanhan Liu @ 2016-09-26  3:56 UTC (permalink / raw)
  To: Pankaj Chauhan; +Cc: Tan, Jianfeng, dev, hemant.agrawal, maxime.coquelin

On Tue, Sep 20, 2016 at 02:28:17PM +0530, Pankaj Chauhan wrote:
> On 9/19/2016 8:13 PM, Yuanhan Liu wrote:
> >Firstly, sorry for being late on this discussion: I just got a chance
> >to follow what you guys were talking about.
> >
> >On Tue, Sep 13, 2016 at 02:51:31PM +0800, Tan, Jianfeng wrote:
> >>>(2) we'd better not differentiate phys device and virtual
> >
> >Agreed.
> >
> >>>>device in generic framework (it's just an attribute of vswitch_port.
> >>>>
> >>>>What do you think?
> >>>
> >>>I agree with your thought that given the current API in this patchset we
> >>>should aim for making switch_worker agnostic of the port type. Ideally it
> >>>should look something like this:
> >>>
> >>>switch_worker() {
> >>>
> >>>       rx_port mask = VSWITCH_PTYPE_PHYS | VSWITCH_PTYPE_PHYS;
> >>>
> >>>       rx_port = vs_sched_rx_port(vswit_dev_g, rx_port_mask, core_id)
> >>>       rx_q = rx_port->get_rxq(vs_port, vdev, code_id);
> >>>       rx_port->do_rx(rx_port, rxq, NULL, pktss, MAX_PKT_BURST);
> >>
> >>Can we hide queues inside struct vswitch_port? I mean:
> >>For VMDQ switch, treat (port_id, queue_id) as a vswitch_port, so far you've
> >>already stored "struct vhost_dev *" into vswitch_port.priv when it's a
> >>virtual port, how about store queue_id into vswitch_port.priv when it's a
> >>physical port.
> >
> >Well, note that vhost-user also supports multiple queue; it's just
> >haven't been enabled yet. So, storing "vdev" for virtio port and
> >"queue_id" for phys port doesn't make too much sense.
> >
> >>For arp_learning switch, make (port_id, all_enabled_queues) as a
> >>vswitch_port.
> >>Summarize above two: we treat (port_id, all_enabled_queues[]) as a
> >>vswitch_port.
> >>
> >>How about it?
> >
> >Sorry, I don't quite like the idea. It's weird to use "vswitch_port + queue_id"
> >combination to represent a port. A vswitch_port should be just a port: let's
> >keep the logic that simple.
> >
> 
> We wanted to take that approach to make vhost/main.c agnostic port type and
> have common code for rx/tx processing. The current version of patchset (v2)
> takes care of multiqueue, as it calls vs_port->get_txq/get_rxq to get the
> queue on which rx/tx has to be performed. This way the underlying switch can
> decide the queue based on core_id and vs_port.
> 
> But in the v2 patchset we still bind vhost_dev to the cores, and pass it to
> vs_port->get_rxq() to get the rx_queue corresponding to vhost_dev. Jianfeng
> had suggested to remove vhost_dev to core binding, and bind vs_port to the
> cores. Creating one vswitch_port for a physical port + queue_id was a step
> in that direction, thus creating very generic code in vhost/main.c.
> 
> YLiu/Jianfeng,
> 
> Please suggest what approach we should take here? Should we keep the logic
> of binding vhost_dev to core (as in V2 patchset), thus leaving some
> intelligence about vhost_dev in vhost/main.c.
> 
> Or What other options do you suggest if we want to achieve port type
> agnostic vhost/main.c

Hi Pankaj,

Again, apologize for late response: you see I was busy ;) Besides, I
need some time to think about it.

Generally, I think your ideal proposal looks good to me (well, I don't
see the need of port mask):

    switch_worker() {
           rx_port = vs_sched_rx_port(vswit_dev_g, core_id)
           rx_q = rx_port->get_rxq(vs_port, vdev, code_id);
           rx_port->do_rx(rx_port, rxq, NULL, pktss, MAX_PKT_BURST);

           vs_lookup_n_fwd(rx_port, pkts, rx_count, rxq);
    }

The issue is, as you stated, VMDq it's bit tricky to handle. How about
the following proposal then?

We don't have to register the nic queues while VMDq is used, since a
phys queue is bond to a virtio queue in this mode. That means only
virtio queues will be scheduled.

The virtio do_rx might look like below then:

    vmdq_rx() {
            rte_eth_rx_burst(port, queue_bond_to_this_virtio_queue, ...);
            rte_vhost_enqueue_burst(...) if any;
            
            rte_vhost_dequeue_burst(...);
    }

	--yliu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework
  2016-09-04 10:23 ` [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework Pankaj Chauhan
  2016-09-09  8:56   ` Tan, Jianfeng
  2016-09-11 12:21   ` Yuanhan Liu
@ 2016-09-26  4:12   ` Yuanhan Liu
  2016-09-27 11:44     ` Pankaj Chauhan
  2 siblings, 1 reply; 22+ messages in thread
From: Yuanhan Liu @ 2016-09-26  4:12 UTC (permalink / raw)
  To: Pankaj Chauhan; +Cc: dev, hemant.agrawal, jianfeng.tan, maxime.coquelin

Besides the VMDq proposal, I got few more comments for you.

On Mon, Sep 05, 2016 at 04:24:29PM +0530, Pankaj Chauhan wrote:
> Introduce support for a generic framework for handling of switching between
> physical and vhost devices. The vswitch framework introduces the following
> concept:
> 
> 1. vswitch_dev: Vswitch device

It looks a bit confusing to me, to claim it as a "device": it's neither a
physical nic device nor a virtio net device. Something like "vswitch_unit",
or even "vswitch" is better and enough.

> Signed-off-by: Pankaj Chauhan <pankaj.chauhan@nxp.com>
> ---
>  examples/vhost/Makefile         |   2 +-
>  examples/vhost/main.c           | 128 +++++++++--
>  examples/vhost/vswitch_common.c | 499 ++++++++++++++++++++++++++++++++++++++++
>  examples/vhost/vswitch_common.h | 186 +++++++++++++++
>  examples/vhost/vswitch_txrx.c   |  97 ++++++++
>  examples/vhost/vswitch_txrx.h   |  71 ++++++

Seems that you forgot to include the file to implment all those ops for
"switch" vswitch mode? I mean, I just see a vs_lookup_n_fwd implmentation
of VMDq.

> @@ -1241,7 +1296,7 @@ static int
>  new_device(int vid)
>  {
>  	int lcore, core_add = 0;
> -	uint32_t device_num_min = num_devices;
> +	uint32_t device_num_min;
>  	struct vhost_dev *vdev;
>  
>  	vdev = rte_zmalloc("vhost device", sizeof(*vdev), RTE_CACHE_LINE_SIZE);
> @@ -1252,6 +1307,16 @@ new_device(int vid)
>  		return -1;
>  	}
>  	vdev->vid = vid;
> +	device_num_min = vs_get_max_vdevs(vswitch_dev_g);
> +	RTE_LOG(INFO, VHOST_PORT, "max virtio devices %d\n", device_num_min);
> +
> +	vs_port = vs_add_port(vswitch_dev_g, vid, VSWITCH_PTYPE_VIRTIO, vdev);

Note that "vid" does not equal "port". They are two different counters
and both start from 0. That means, you will get unexpected results from
following piece of code ---->

> +struct vswitch_port *vs_add_port(struct vswitch_dev *vs_dev, int port_id,
> +		enum vswitch_port_type type, void *priv)
> +{
> +	int rc = 0;
> +	struct vswitch_port *vs_port = NULL;
> +	struct vswitch_ops *vs_ops = vs_dev->ops;
> +
> +	vs_port = vs_get_free_port(vs_dev);
> +	if (!vs_port) {
> +		RTE_LOG(DEBUG, VHOST_CONFIG, "Failed get free port in \
> +			vswitch %s\n", vs_dev->name);
> +		rc = -EBUSY;
> +		goto out;
> +	}
> +
> +	vs_port->port_id = port_id;
> +	vs_port->type = type;
> +	vs_port->priv = priv;
> +
> +	/* Initialize default port operations. It should be noted that
> +	 * The switch ops->add_port can replace them with switch specefic
> +	 * operations if required. This gives us more flexibility in switch
> +	 * implementations.
> +	 */
> +
> +	switch (type) {
> +	case VSWITCH_PTYPE_PHYS:
> +	       vs_port->do_tx = vs_do_tx_phys_port;
> +	       vs_port->do_rx = vs_do_rx_phys_port;
> +	       vs_port->get_txq = vs_get_txq_phys_port;
> +	       vs_port->get_rxq = vs_get_rxq_phys_port;
> +	       break;
> +	case VSWITCH_PTYPE_VIRTIO:
> +	       vs_port->do_tx = vs_do_tx_virtio_port;
> +	       vs_port->do_rx = vs_do_rx_virtio_port;
> +	       vs_port->get_txq = vs_get_txq_virtio_port;
> +	       vs_port->get_rxq = vs_get_rxq_virtio_port;
> +	       break;
> +	default:
> +		RTE_LOG(DEBUG, VHOST_CONFIG, "Invalid port [id %d, type %d]",
> +				port_id, type);
> +	       rc = -EINVAL;
> +	       goto out;
> +	}
> +
> +	if (vs_ops->add_port)
> +		rc = vs_ops->add_port(vs_port);
> +
> +	if (rc)
> +		goto out;
> +
> +	vs_port->state = VSWITCH_PSTATE_ADDED;
> +
> +	rte_eth_macaddr_get(vs_port->port_id, &vs_port->mac_addr);

<--- here.

	--yliu

> +	RTE_LOG(INFO, VHOST_PORT, "Port %u MAC: %02"PRIx8" %02"PRIx8" %02"PRIx8
> +			" %02"PRIx8" %02"PRIx8" %02"PRIx8"\n",
> +			(unsigned)port_id,
> +			vs_port->mac_addr.addr_bytes[0],
> +			vs_port->mac_addr.addr_bytes[1],
> +			vs_port->mac_addr.addr_bytes[2],
> +			vs_port->mac_addr.addr_bytes[3],
> +			vs_port->mac_addr.addr_bytes[4],
> +			vs_port->mac_addr.addr_bytes[5]);
> +
> +	RTE_LOG(DEBUG, VHOST_CONFIG, "Added port [%d, type %d] to \
> +			vswitch %s\n", vs_port->port_id, type, vs_dev->name);
> +out:
> +	if (rc){
> +		RTE_LOG(INFO, VHOST_CONFIG, "Failed to Add port [%d, type %d] to \
> +			vswitch %s\n", port_id, type, vs_dev->name);
> +		if (vs_port)
> +			vs_free_port(vs_port);
> +		vs_port = NULL;
> +	}
> +
> +	return vs_port;
> +}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework
  2016-09-19 12:42           ` Tan, Jianfeng
@ 2016-09-27 11:26             ` Pankaj Chauhan
  0 siblings, 0 replies; 22+ messages in thread
From: Pankaj Chauhan @ 2016-09-27 11:26 UTC (permalink / raw)
  To: Tan, Jianfeng, dev; +Cc: hemant.agrawal, yuanhan.liu, maxime.coquelin

On 9/19/2016 6:12 PM, Tan, Jianfeng wrote:

> Hi Pankaj,
>

Hi Jianfeng,

Sorry for delayed reply.
>>> Can we hide queues inside struct vswitch_port? I mean:
>>> For VMDQ switch, treat (port_id, queue_id) as a vswitch_port, so far
>>> you've already stored "struct vhost_dev *" into vswitch_port.priv when
>>> it's a virtual port, how about store queue_id into vswitch_port.priv
>>> when it's a physical port.
>>> For arp_learning switch, make (port_id, all_enabled_queues) as a
>>> vswitch_port.
>>> Summarize above two: we treat (port_id, all_enabled_queues[]) as a
>>> vswitch_port.
>>>
>>> How about it?
>>
>> Thanks for wonderful suggestion! Yes it should be possible, instead of
>> having one vs_port for physical device we could create one vs_port for
>> each phys device + queue_id combination.
>>
>> Then we can bind the vs_ports to the cores, and do away with binding
>> vdevs to the cores.
>>
>> In order to add extra vs_ports (port + queue_id) there are two methods:
>>
>> 1. vhost/main.c (or vswitch_common.c) queries the underlying switch
>> about how many rx queues to support thourgh a new accessor. VMDQ will
>> return the number of vmdq queues, in this case. After getting the
>> number of rx queues, vhost/main.c creates the extra ports i.e one
>> vs_port for each physical port and queue id combination.
>>
>> 2. Second method is that the switch implementation for example vmdq.c
>> create the extra ports (port + queue_id) as part of vs_port->add_port
>> operation for the physical port.
>>
>> Which method do you think we should follow? I think #1 should be done
>> so that same logic can be used for other switches also.
>
> Although VMDQs are created at initialization, we will not connect all of
> them to switch until a virtio device is added. So how about creating
> extra vs_ports (port + queue_id) as part of vmdq_add_port_virtio()?
>

Yes we can do that, it is similar to #2 which is proposed above. But I 
am little bit confused regarding this approach that we were planning to 
take, the approach of considering port_id + queue_id as one vswitch port 
because Yliu mentioned in one of his comments that we should not follow 
this approach and keep the concept of port simple.

So which approach do we take now? are we fine with keeping vswitch port 
simple and keeping some intelligence regarding vhost devices in the main 
(main.c, the generic code).

> Another thing, why we need the accessor, port_start()? For physical
> ports, it's initialized and started at port_init(); for virtual ports,
> no need to get started, right?
>

Intention of port_start was to have some control when the port starts 
doing tx/rx. For physical port it translates to rte_eth_dev_start(), and 
for vhost_port it does nothing as you said. Please note that in order to 
support this i've moved rte_eth_dev_start() from port_init() to 
vs_port->port_start.


Thanks,
Pankaj

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework
  2016-09-26  3:56             ` Yuanhan Liu
@ 2016-09-27 11:35               ` Pankaj Chauhan
  2016-09-27 12:10                 ` Yuanhan Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Pankaj Chauhan @ 2016-09-27 11:35 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: Tan, Jianfeng, dev, hemant.agrawal, maxime.coquelin

On 9/26/2016 9:26 AM, Yuanhan Liu wrote:
> On Tue, Sep 20, 2016 at 02:28:17PM +0530, Pankaj Chauhan wrote:
>> On 9/19/2016 8:13 PM, Yuanhan Liu wrote:
>>> Firstly, sorry for being late on this discussion: I just got a chance
>>> to follow what you guys were talking about.
>>>
>>> On Tue, Sep 13, 2016 at 02:51:31PM +0800, Tan, Jianfeng wrote:
>>>>> (2) we'd better not differentiate phys device and virtual
>>>
>>> Agreed.
>>>
>>>>>> device in generic framework (it's just an attribute of vswitch_port.
>>>>>>
>>>>>> What do you think?
>>>>>
>>>>> I agree with your thought that given the current API in this patchset we
>>>>> should aim for making switch_worker agnostic of the port type. Ideally it
>>>>> should look something like this:
>>>>>
>>>>> switch_worker() {
>>>>>
>>>>>       rx_port mask = VSWITCH_PTYPE_PHYS | VSWITCH_PTYPE_PHYS;
>>>>>
>>>>>       rx_port = vs_sched_rx_port(vswit_dev_g, rx_port_mask, core_id)
>>>>>       rx_q = rx_port->get_rxq(vs_port, vdev, code_id);
>>>>>       rx_port->do_rx(rx_port, rxq, NULL, pktss, MAX_PKT_BURST);
>>>>
>>>> Can we hide queues inside struct vswitch_port? I mean:
>>>> For VMDQ switch, treat (port_id, queue_id) as a vswitch_port, so far you've
>>>> already stored "struct vhost_dev *" into vswitch_port.priv when it's a
>>>> virtual port, how about store queue_id into vswitch_port.priv when it's a
>>>> physical port.
>>>
>>> Well, note that vhost-user also supports multiple queue; it's just
>>> haven't been enabled yet. So, storing "vdev" for virtio port and
>>> "queue_id" for phys port doesn't make too much sense.
>>>
>>>> For arp_learning switch, make (port_id, all_enabled_queues) as a
>>>> vswitch_port.
>>>> Summarize above two: we treat (port_id, all_enabled_queues[]) as a
>>>> vswitch_port.
>>>>
>>>> How about it?
>>>
>>> Sorry, I don't quite like the idea. It's weird to use "vswitch_port + queue_id"
>>> combination to represent a port. A vswitch_port should be just a port: let's
>>> keep the logic that simple.
>>>
>>
>> We wanted to take that approach to make vhost/main.c agnostic port type and
>> have common code for rx/tx processing. The current version of patchset (v2)
>> takes care of multiqueue, as it calls vs_port->get_txq/get_rxq to get the
>> queue on which rx/tx has to be performed. This way the underlying switch can
>> decide the queue based on core_id and vs_port.
>>
>> But in the v2 patchset we still bind vhost_dev to the cores, and pass it to
>> vs_port->get_rxq() to get the rx_queue corresponding to vhost_dev. Jianfeng
>> had suggested to remove vhost_dev to core binding, and bind vs_port to the
>> cores. Creating one vswitch_port for a physical port + queue_id was a step
>> in that direction, thus creating very generic code in vhost/main.c.
>>
>> YLiu/Jianfeng,
>>
>> Please suggest what approach we should take here? Should we keep the logic
>> of binding vhost_dev to core (as in V2 patchset), thus leaving some
>> intelligence about vhost_dev in vhost/main.c.
>>
>> Or What other options do you suggest if we want to achieve port type
>> agnostic vhost/main.c
>
> Hi Pankaj,
>
> Again, apologize for late response: you see I was busy ;) Besides, I
> need some time to think about it.
>

Hi YLiu,

No issues with delayed response :)

> Generally, I think your ideal proposal looks good to me (well, I don't
> see the need of port mask):

The idea of port mask was to give ability to the caller to choose which 
type of port to do rx from, Physical port or vhost port.
>
>     switch_worker() {
>            rx_port = vs_sched_rx_port(vswit_dev_g, core_id)
>            rx_q = rx_port->get_rxq(vs_port, vdev, code_id);
>            rx_port->do_rx(rx_port, rxq, NULL, pktss, MAX_PKT_BURST);
>
>            vs_lookup_n_fwd(rx_port, pkts, rx_count, rxq);
>     }
>
> The issue is, as you stated, VMDq it's bit tricky to handle. How about
> the following proposal then?
>
> We don't have to register the nic queues while VMDq is used, since a
> phys queue is bond to a virtio queue in this mode. That means only
> virtio queues will be scheduled.
>
> The virtio do_rx might look like below then:
>
>     vmdq_rx() {
>             rte_eth_rx_burst(port, queue_bond_to_this_virtio_queue, ...);
>             rte_vhost_enqueue_burst(...) if any;
>
>             rte_vhost_dequeue_burst(...);
>     }
>

Okay so in that case, we won't do any rte_eth_rx_burst() when 
physical_port->do_rx is called, Correct?. If yes then in vmdq.c we'll 
overwrite vs_port->do_rx of physical port with a vmdq_do_rx_phys() which 
does nothing. Or we can even have an option that vmdq.c doesn't return 
the physical port when vs_sched_rx_port() is called, i think this later 
option is better to save some CPU cycles.

I think it is possible but i would prefer to overwrite vs_port->do_rx() 
for vmdq (in vmdq.c) with the implementation that you suggested. The 
framework provides this option, i.e the switch implementation can 
overwrite the vs_port->do_rx/do_tx if required to handle any special 
cases for example the case of vmdq <> vdev boding.

Thanks,
Pankaj
> 	--yliu
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework
  2016-09-26  4:12   ` Yuanhan Liu
@ 2016-09-27 11:44     ` Pankaj Chauhan
  2016-09-27 12:03       ` Yuanhan Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Pankaj Chauhan @ 2016-09-27 11:44 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, hemant.agrawal, jianfeng.tan, maxime.coquelin

On 9/26/2016 9:42 AM, Yuanhan Liu wrote:
> Besides the VMDq proposal, I got few more comments for you.
>
> On Mon, Sep 05, 2016 at 04:24:29PM +0530, Pankaj Chauhan wrote:
>> Introduce support for a generic framework for handling of switching between
>> physical and vhost devices. The vswitch framework introduces the following
>> concept:
>>
>> 1. vswitch_dev: Vswitch device
>
> It looks a bit confusing to me, to claim it as a "device": it's neither a
> physical nic device nor a virtio net device. Something like "vswitch_unit",
> or even "vswitch" is better and enough.
>

Yes we can change it to 'vswitch' it suites better, i'll do that in v3.

>> Signed-off-by: Pankaj Chauhan <pankaj.chauhan@nxp.com>
>> ---
>>  examples/vhost/Makefile         |   2 +-
>>  examples/vhost/main.c           | 128 +++++++++--
>>  examples/vhost/vswitch_common.c | 499 ++++++++++++++++++++++++++++++++++++++++
>>  examples/vhost/vswitch_common.h | 186 +++++++++++++++
>>  examples/vhost/vswitch_txrx.c   |  97 ++++++++
>>  examples/vhost/vswitch_txrx.h   |  71 ++++++
>
> Seems that you forgot to include the file to implment all those ops for
> "switch" vswitch mode? I mean, I just see a vs_lookup_n_fwd implmentation
> of VMDq.
>

No i didn't forget to include the file but wanted to implement, get 
reviewed and included (hopefully :)) the implementation of following first:

1. vswitch framework
2. vmdq implementation plugged into the vswitch framework.

After above two i am planning to send the 'software switch' 
implementation in a separate patch, i hope that is fine.

>> @@ -1241,7 +1296,7 @@ static int
>>  new_device(int vid)
>>  {
>>  	int lcore, core_add = 0;
>> -	uint32_t device_num_min = num_devices;
>> +	uint32_t device_num_min;
>>  	struct vhost_dev *vdev;
>>
>>  	vdev = rte_zmalloc("vhost device", sizeof(*vdev), RTE_CACHE_LINE_SIZE);
>> @@ -1252,6 +1307,16 @@ new_device(int vid)
>>  		return -1;
>>  	}
>>  	vdev->vid = vid;
>> +	device_num_min = vs_get_max_vdevs(vswitch_dev_g);
>> +	RTE_LOG(INFO, VHOST_PORT, "max virtio devices %d\n", device_num_min);
>> +
>> +	vs_port = vs_add_port(vswitch_dev_g, vid, VSWITCH_PTYPE_VIRTIO, vdev);
>
> Note that "vid" does not equal "port". They are two different counters
> and both start from 0. That means, you will get unexpected results from
> following piece of code ---->
>
Sorry i didn't get the inconsistency completely, please help me 
understand it.

I agree both port_id and vid counters start from zero. But when we add 
these as vswitch_port we'll pass different port type 
(VSWITCH_PTYPE_VIRTIO or VSWITCH_PTYPE_PHYS). And while searching for 
any vswitch port we use vs_port->port_id && vs_port->type as the key, 
thus we'll not get confused between ports even when both have same port_id.

Can you please help me understand the inconsistency that you thought we 
may have?

Thanks,
Pankaj
>> +struct vswitch_port *vs_add_port(struct vswitch_dev *vs_dev, int port_id,
>> +		enum vswitch_port_type type, void *priv)
>> +{
>> +	int rc = 0;
>> +	struct vswitch_port *vs_port = NULL;
>> +	struct vswitch_ops *vs_ops = vs_dev->ops;
>> +
>> +	vs_port = vs_get_free_port(vs_dev);
>> +	if (!vs_port) {
>> +		RTE_LOG(DEBUG, VHOST_CONFIG, "Failed get free port in \
>> +			vswitch %s\n", vs_dev->name);
>> +		rc = -EBUSY;
>> +		goto out;
>> +	}
>> +
>> +	vs_port->port_id = port_id;
>> +	vs_port->type = type;
>> +	vs_port->priv = priv;
>> +
>> +	/* Initialize default port operations. It should be noted that
>> +	 * The switch ops->add_port can replace them with switch specefic
>> +	 * operations if required. This gives us more flexibility in switch
>> +	 * implementations.
>> +	 */
>> +
>> +	switch (type) {
>> +	case VSWITCH_PTYPE_PHYS:
>> +	       vs_port->do_tx = vs_do_tx_phys_port;
>> +	       vs_port->do_rx = vs_do_rx_phys_port;
>> +	       vs_port->get_txq = vs_get_txq_phys_port;
>> +	       vs_port->get_rxq = vs_get_rxq_phys_port;
>> +	       break;
>> +	case VSWITCH_PTYPE_VIRTIO:
>> +	       vs_port->do_tx = vs_do_tx_virtio_port;
>> +	       vs_port->do_rx = vs_do_rx_virtio_port;
>> +	       vs_port->get_txq = vs_get_txq_virtio_port;
>> +	       vs_port->get_rxq = vs_get_rxq_virtio_port;
>> +	       break;
>> +	default:
>> +		RTE_LOG(DEBUG, VHOST_CONFIG, "Invalid port [id %d, type %d]",
>> +				port_id, type);
>> +	       rc = -EINVAL;
>> +	       goto out;
>> +	}
>> +
>> +	if (vs_ops->add_port)
>> +		rc = vs_ops->add_port(vs_port);
>> +
>> +	if (rc)
>> +		goto out;
>> +
>> +	vs_port->state = VSWITCH_PSTATE_ADDED;
>> +
>> +	rte_eth_macaddr_get(vs_port->port_id, &vs_port->mac_addr);
>
> <--- here.
>
> 	--yliu
>
>> +	RTE_LOG(INFO, VHOST_PORT, "Port %u MAC: %02"PRIx8" %02"PRIx8" %02"PRIx8
>> +			" %02"PRIx8" %02"PRIx8" %02"PRIx8"\n",
>> +			(unsigned)port_id,
>> +			vs_port->mac_addr.addr_bytes[0],
>> +			vs_port->mac_addr.addr_bytes[1],
>> +			vs_port->mac_addr.addr_bytes[2],
>> +			vs_port->mac_addr.addr_bytes[3],
>> +			vs_port->mac_addr.addr_bytes[4],
>> +			vs_port->mac_addr.addr_bytes[5]);
>> +
>> +	RTE_LOG(DEBUG, VHOST_CONFIG, "Added port [%d, type %d] to \
>> +			vswitch %s\n", vs_port->port_id, type, vs_dev->name);
>> +out:
>> +	if (rc){
>> +		RTE_LOG(INFO, VHOST_CONFIG, "Failed to Add port [%d, type %d] to \
>> +			vswitch %s\n", port_id, type, vs_dev->name);
>> +		if (vs_port)
>> +			vs_free_port(vs_port);
>> +		vs_port = NULL;
>> +	}
>> +
>> +	return vs_port;
>> +}
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework
  2016-09-27 11:44     ` Pankaj Chauhan
@ 2016-09-27 12:03       ` Yuanhan Liu
  0 siblings, 0 replies; 22+ messages in thread
From: Yuanhan Liu @ 2016-09-27 12:03 UTC (permalink / raw)
  To: Pankaj Chauhan; +Cc: dev, hemant.agrawal, jianfeng.tan, maxime.coquelin

On Tue, Sep 27, 2016 at 05:14:44PM +0530, Pankaj Chauhan wrote:
> On 9/26/2016 9:42 AM, Yuanhan Liu wrote:
> >Besides the VMDq proposal, I got few more comments for you.
> >
> >On Mon, Sep 05, 2016 at 04:24:29PM +0530, Pankaj Chauhan wrote:
> >>Introduce support for a generic framework for handling of switching between
> >>physical and vhost devices. The vswitch framework introduces the following
> >>concept:
> >>
> >>1. vswitch_dev: Vswitch device
> >
> >It looks a bit confusing to me, to claim it as a "device": it's neither a
> >physical nic device nor a virtio net device. Something like "vswitch_unit",
> >or even "vswitch" is better and enough.
> >
> 
> Yes we can change it to 'vswitch' it suites better, i'll do that in v3.
> 
> >>Signed-off-by: Pankaj Chauhan <pankaj.chauhan@nxp.com>
> >>---
> >> examples/vhost/Makefile         |   2 +-
> >> examples/vhost/main.c           | 128 +++++++++--
> >> examples/vhost/vswitch_common.c | 499 ++++++++++++++++++++++++++++++++++++++++
> >> examples/vhost/vswitch_common.h | 186 +++++++++++++++
> >> examples/vhost/vswitch_txrx.c   |  97 ++++++++
> >> examples/vhost/vswitch_txrx.h   |  71 ++++++
> >
> >Seems that you forgot to include the file to implment all those ops for
> >"switch" vswitch mode? I mean, I just see a vs_lookup_n_fwd implmentation
> >of VMDq.
> >
> 
> No i didn't forget to include the file but wanted to implement, get reviewed
> and included (hopefully :)) the implementation of following first:
> 
> 1. vswitch framework
> 2. vmdq implementation plugged into the vswitch framework.
> 
> After above two i am planning to send the 'software switch' implementation
> in a separate patch, i hope that is fine.

It's better to shipt them together, so that we could do review once.
Besides, it helps to understand your framework design.

> 
> >>@@ -1241,7 +1296,7 @@ static int
> >> new_device(int vid)
> >> {
> >> 	int lcore, core_add = 0;
> >>-	uint32_t device_num_min = num_devices;
> >>+	uint32_t device_num_min;
> >> 	struct vhost_dev *vdev;
> >>
> >> 	vdev = rte_zmalloc("vhost device", sizeof(*vdev), RTE_CACHE_LINE_SIZE);
> >>@@ -1252,6 +1307,16 @@ new_device(int vid)
> >> 		return -1;
> >> 	}
> >> 	vdev->vid = vid;
> >>+	device_num_min = vs_get_max_vdevs(vswitch_dev_g);
> >>+	RTE_LOG(INFO, VHOST_PORT, "max virtio devices %d\n", device_num_min);
> >>+
> >>+	vs_port = vs_add_port(vswitch_dev_g, vid, VSWITCH_PTYPE_VIRTIO, vdev);
> >
> >Note that "vid" does not equal "port". They are two different counters
> >and both start from 0. That means, you will get unexpected results from
> >following piece of code ---->
> >
> Sorry i didn't get the inconsistency completely, please help me understand
> it.
> 
> I agree both port_id and vid counters start from zero. But when we add these
> as vswitch_port we'll pass different port type (VSWITCH_PTYPE_VIRTIO or
> VSWITCH_PTYPE_PHYS). And while searching for any vswitch port we use
> vs_port->port_id && vs_port->type as the key, thus we'll not get confused
> between ports even when both have same port_id.
> 
> Can you please help me understand the inconsistency that you thought we may
> have?
...
> >>+struct vswitch_port *vs_add_port(struct vswitch_dev *vs_dev, int port_id,
...
> >>+	rte_eth_macaddr_get(vs_port->port_id, &vs_port->mac_addr);

Will a virtio port invoke above function in your logic?

	--yliu
> >
> ><--- here.
> >
> >	--yliu
> >
> >>+	RTE_LOG(INFO, VHOST_PORT, "Port %u MAC: %02"PRIx8" %02"PRIx8" %02"PRIx8
> >>+			" %02"PRIx8" %02"PRIx8" %02"PRIx8"\n",
> >>+			(unsigned)port_id,
> >>+			vs_port->mac_addr.addr_bytes[0],
> >>+			vs_port->mac_addr.addr_bytes[1],
> >>+			vs_port->mac_addr.addr_bytes[2],
> >>+			vs_port->mac_addr.addr_bytes[3],
> >>+			vs_port->mac_addr.addr_bytes[4],
> >>+			vs_port->mac_addr.addr_bytes[5]);
> >>+
> >>+	RTE_LOG(DEBUG, VHOST_CONFIG, "Added port [%d, type %d] to \
> >>+			vswitch %s\n", vs_port->port_id, type, vs_dev->name);
> >>+out:
> >>+	if (rc){
> >>+		RTE_LOG(INFO, VHOST_CONFIG, "Failed to Add port [%d, type %d] to \
> >>+			vswitch %s\n", port_id, type, vs_dev->name);
> >>+		if (vs_port)
> >>+			vs_free_port(vs_port);
> >>+		vs_port = NULL;
> >>+	}
> >>+
> >>+	return vs_port;
> >>+}
> >
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework
  2016-09-27 11:35               ` Pankaj Chauhan
@ 2016-09-27 12:10                 ` Yuanhan Liu
  0 siblings, 0 replies; 22+ messages in thread
From: Yuanhan Liu @ 2016-09-27 12:10 UTC (permalink / raw)
  To: Pankaj Chauhan; +Cc: Tan, Jianfeng, dev, hemant.agrawal, maxime.coquelin

On Tue, Sep 27, 2016 at 05:05:41PM +0530, Pankaj Chauhan wrote:
> >Hi Pankaj,
> >
> >Again, apologize for late response: you see I was busy ;) Besides, I
> >need some time to think about it.
> >
> 
> Hi YLiu,
> 
> No issues with delayed response :)

Thanks!

> 
> >Generally, I think your ideal proposal looks good to me (well, I don't
> >see the need of port mask):
> 
> The idea of port mask was to give ability to the caller to choose which type
> of port to do rx from, Physical port or vhost port.

What's the need of it? If you register a port, don't you need pull
packets from it?

And you don't have to distinguish whether it's a physical port or vhost
port or not. If a port is registered, just pull it. Simple, right?

> >
> >    switch_worker() {
> >           rx_port = vs_sched_rx_port(vswit_dev_g, core_id)
> >           rx_q = rx_port->get_rxq(vs_port, vdev, code_id);
> >           rx_port->do_rx(rx_port, rxq, NULL, pktss, MAX_PKT_BURST);
> >
> >           vs_lookup_n_fwd(rx_port, pkts, rx_count, rxq);
> >    }
> >
> >The issue is, as you stated, VMDq it's bit tricky to handle. How about
> >the following proposal then?
> >
> >We don't have to register the nic queues while VMDq is used, since a
> >phys queue is bond to a virtio queue in this mode. That means only
> >virtio queues will be scheduled.
> >
> >The virtio do_rx might look like below then:
> >
> >    vmdq_rx() {
> >            rte_eth_rx_burst(port, queue_bond_to_this_virtio_queue, ...);
> >            rte_vhost_enqueue_burst(...) if any;
> >
> >            rte_vhost_dequeue_burst(...);
> >    }
> >
> 
> Okay so in that case, we won't do any rte_eth_rx_burst() when
> physical_port->do_rx is called, Correct?.

The physical port do_rx will not be called at all, since it will
not registered. In the VMDq case, only virtio port will be registered
(by your vs_add_port function).

> If yes then in vmdq.c we'll
> overwrite vs_port->do_rx of physical port with a vmdq_do_rx_phys() which
> does nothing. Or we can even have an option that vmdq.c doesn't return the
> physical port when vs_sched_rx_port() is called,

As stated, if you don't register it, then vs_sched_rx_port will return no
physical port.

	--yliu

> i think this later option
> is better to save some CPU cycles.
> 
> I think it is possible but i would prefer to overwrite vs_port->do_rx() for
> vmdq (in vmdq.c) with the implementation that you suggested. The framework
> provides this option, i.e the switch implementation can overwrite the
> vs_port->do_rx/do_tx if required to handle any special cases for example the
> case of vmdq <> vdev boding.
> 
> Thanks,
> Pankaj
> >	--yliu
> >
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-09-27 12:10 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-04 10:23 [RFC][PATCH V2 0/3] example/vhost: Introduce Vswitch Framework Pankaj Chauhan
2016-09-04 10:23 ` [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework Pankaj Chauhan
2016-09-09  8:56   ` Tan, Jianfeng
2016-09-12 10:55     ` Pankaj Chauhan
2016-09-13  6:51       ` Tan, Jianfeng
2016-09-15  9:00         ` Pankaj Chauhan
2016-09-19 12:42           ` Tan, Jianfeng
2016-09-27 11:26             ` Pankaj Chauhan
2016-09-19 14:43         ` Yuanhan Liu
2016-09-20  8:58           ` Pankaj Chauhan
2016-09-26  3:56             ` Yuanhan Liu
2016-09-27 11:35               ` Pankaj Chauhan
2016-09-27 12:10                 ` Yuanhan Liu
2016-09-11 12:21   ` Yuanhan Liu
2016-09-12 10:59     ` Pankaj Chauhan
2016-09-26  4:12   ` Yuanhan Liu
2016-09-27 11:44     ` Pankaj Chauhan
2016-09-27 12:03       ` Yuanhan Liu
2016-09-04 10:23 ` [RFC][PATCH V2 2/3] examples/vhost: Add vswitch command line options Pankaj Chauhan
2016-09-13 12:14   ` Yuanhan Liu
2016-09-15  9:11     ` Pankaj Chauhan
2016-09-04 10:24 ` [RFC][PATCH V2 3/3] examples/vhost: Add VMDQ vswitch device Pankaj Chauhan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.